Amateur typesetting enthusiast.
Oftentimes, for words especially of Latin origin, German will adopt the English term, perhaps slightly fitting it to the language. This type of term (in my experience) has tended to become the favored variant, such as Compiler for the English compiler. However, there is typically a more German-like variant of the English (or, ultimately Latin), as evidenced by Kompilierer, or a straight translation of the term into something more easily understandable, whereby compiler becomes Übersetzer.
The internet age, international communication needs, and the prevalence of the latest documentation being available first (or only) in English is likely to blame for this trend. Books especially use either a German-like Latin derivation or (preferably) a native term.
This is cursory illustration of the situation on the more technical side of things. No one would think to use a term like user interface over the well-established Benutzeroberfläche, or memory over Arbeitsspeicher.
Ultimately, both English and German, as West Germanic languages, operate similarly enough that the friction due to terminology is minimal.
I would fully agree that other internet protocols are much better suited to information not meant to be broadcast publicly.
Civility is great, and should be highly encouraged. That’s largely why I like Lemmy. Each instance can guide its community in line with its values, whatever those may be, block offenders, and generally forge the space it wishes.
However, I think Besse’s comments on setting the correct expectations in the public sphere are worth considering.
For a different internet example: all the messages I send in any chatroom on an IRC server will inevitably be logged by someone, especially in popular rooms. Any assumption to the contrary would be naïve, and demanding that people not keep a log any of my publicly broadcast messages would be laughed at by the operators. It’s a public space, and sending anything to that space necessarily means I forgo my ability to control who sees, aggregates, archives, or shares that information. My choice to put the information into that space is the opt-in mechanism, just how books or interviews do the same offline in print.
It’s not so much the protocol as it is how making things public fundamentally works.
I think Besse makes a great point here:
I think blurring the lines between public and private spaces is the opposite of informing consent. Cultivating unrealistic expectations of “privacy” and control in what are ultimately public spaces is actually bad.
I tried to single out the world wide web, as opposed to the internet at large, because the two are not synonymous. It’s rather absurd to publicly serve webpages to any querying IP address and maintain that the receiving computer is not to save said pages to disk.
All this to say: I find it difficult to argue that web publications should or could be exempt from aggregation and archival (or scraping, to put it another way). I understand that the ease with which bots do this can be disconcerting, however.
If we stay with the cafe bulletin board, getting a detailed overview of all the postings on the board is akin to scraping the whole thing. If we extend our analogy instead to a somewhat more significant example, library catalogs do the same with books, magazines, and movies.
This is the cost of publishing, be that in print or online. It must be expected that some person has a copy of every- and anything one has ever written or posted publicly, and perhaps even catalogued it. A way around this might be to move away from the web to another part of the internet, like Matrix, as alma suggested.
I assume the non-consensual collection of various (meta-)data is what you refer to when talking about intrusion and money making. Lemmy, like many projects, seeks to offer an alternative to corporate, data-gobbling social media sites, but doesn’t eliminate the ability to search through its webpages.
And here’s the point at which we go off the rails (towards the end of the thread; the earlier section is quite well expressed):
Most people in tech do not want to hear this, because it invalidates the vast majority of their business models, AI/ML training data, business intel operations, and so forth. Anything that’s based on gathering data that is ‘public’ suddenly becomes suspect, if the above is applied.
And yes, that includes internet darlings like the Internet Archive, which also operates on a non-consensual, opt-out model.
It’s the Western Acquisition, claiming ownership without permission.
It’s so ingrained in white, Western internet culture that there are now whole generations who consider anything that can be read by the crawler they wrote in a weekend to be fair game, regardless or what the user’s original intent was.
Republishing, reformatting, archiving, aggregating, all without the user being fully aware, because if they were, they would object.
It’s dishonest as fuck, and no different from colonial attitudes towards natural resources.
“It’s there, so we can take it.”
We then have some reasonable responses from others in the thread:
Rich Felker @email@example.com
Re: Internet Archive, I think many of us don’t believe/accept that businesses, organizations, genuine public figure politicians, etc. have a right to control how their publications of public relevance are archived & shared. The problem is that IA isn’t able to mechanically distinguish between those cases and teenagers’ personal diary-like blogs (chosen as example at opposite end of spectrum).
Arne Babenhauserheide @ArneBab@rollenspiel.social
This is the difference between the internet archive and an ML model: the archive does not claim ownership.
Finally, a thought of mine own:
Sindarina seems to fundamentally miss the central idea of the world wide web, that is, publically sharing information. This does not mean the work may be used for any purpose whatsoever, as the content of many websites is either copyrighted or CC-BY-SA. But publishing anything on the www or in print, opens it by necessity to aggregation and archival. I routinely save webpages to disk.
To run with the cafe analogy that has been brought up, one cannot post a note to the cafe’s bulletin board and at the same time expect that no one else may take a photo of it, then perhaps share it with some acquaintances.
This is a far cry from the data harvesting done by Google, Microsoft, Apple & co., or the dubiously collected data used to train “automated plagiarism engine[s],” as Arthur Besse put it not too long ago.
None of these pique mine interest enough to try them, but I was surprised that the oil shell didn’t make an appearance. Besides fish and nushell, it was the only alternative shell I’d heard of.
There is likely a way to stream local (public) radio stations using a browser, granted one likes the music of at least one station that does so. I find this provides excellent recommendations and tons of helpful information about the picks for the playlist, which itself is typically logged by time.
This provides no built-in download option, though if great recommendations are the focus, nothing beats public radio.
It’s already difficult enough for me to use keyboards that don’t have Caps Lock act as another Control, not to mention all the changed special character locations on a German QWERTZ keyboard (cf. US standard layout), that I don’t wish to make my life any more painful by moving the letters around too.
One solution to the revenue issue for musicians is freely distributing the digital music and selling merch, physical copies, and concert tickets for income, much how Run the Jewels operates.
This doesn’t work, however, if one’s work is largely copied by larger figures early on, such that building a following and steady income is difficult to impossible because people first and foremost encounter soullessly copied derivatives of one’s music and the original artist is now “just another copy.”
Hence the discussion on how much of a work must be original.
To add a helpful link, this question about sampling is similar to how Fair Dealing works, often termed “Fair Use” in the U.S. How much is sampled, and how it’s changed and integrated into the new work is a vital component when looking at whether someone is merely copying or innovating.
If one has an e-reader, standardebooks.org is an excellent place for English language texts.