- cross-posted to:
- programmer_humor@programming.dev
- cross-posted to:
- programmer_humor@programming.dev
This is not new for compagies to make money on the internet using the work and information others made without compensation for the community
Google search or stackoverfow did it long before LLMs, it’s just that the price is different, you pay with your attention spans instead of money
I’m pretty sure that we’ll have freely accessible LLMs with ads baked in a way you can’t even see them at first glance (if they aren’t doing it already)What’s new with LLMs is that the information source is hidden, no credit is attributed for people that contribute to the community anymore
Google search or stackoverfow did it long before LLMs
These examples had very obvious attribution and were an easy path to you. AI on the other hand serves as the source of truth in almost all scenarios. I consider myself a savvy user and even then I find myself never actually clicking on the cited sites - just looking at their titles, most people don’t even bother with that.
If you did it for free, what boss is replacing you?
Why can’t you just keep doing it?
They don’t want to replace the you that is producing free content for them to ingest, they want to replace the you that earns enough money to live.
There is a license that says that all derivatives must also be open source.
But also AI companies don’t care about the law, they stole all there data, engage in insider trading, circular trading, and generating all manner of illegal content, they don’t give a fuck. And the US government is doing anything to hold them accountable, infact the president is getting in on it.
And the US government is doing anything to hold them accountable
You probably meant “nothing” instead of “anything”?
Yes, all this is very much the product of the current US admin.
they stole all there data
Obviously fuck the capitalists and AI scammers. But reading and learning from a library, then writing and selling your own book based on that is NOT stealing. It’s the wrong argument.
The answer obviously is to keep the actual source material and libraries and book archives open and just run open source AI models at home. You can run smaller versions on a solar powered PC no problem.
The issue with trying to make “AI is just stolen” happen is that it will make open source AI models illegal. AI companies would love that because they can afford to license and pay or work around or obscure or whatever. The “intellectual property” argument is always a disgusting capitalist one. Knowledge is either free or nothing is.
Except these people are turning around and burningdown the libraries once they’ve read all the books.
Well yeah, they suck. I’m a fan of anna’s archive.
But this doesn’t change the fact that AI models will continue to improve, and the tactical question is if we give them munition to monopolize it using “intellectual property” rights. I want open source/weights models to use locally without paying some license to meta or reddit or some publisher cartel.
Library implies consent that was not there and access to the public that was not there.
Every book ever published, every article anybody ever wrote, every comment anyone ever posted on a public internet is “consent” to read and learn from it.
I’m pro piracy as you can tell. The idea that something can be out there publicly on the internet but it’s “not consent” to read is the intellectual property one. Look at how they try to gatekeep publicly owned scientific papers. Big AI is clearly hypocritical doing this, but corporations are just soulless, amoral programs executed by sentient humans.
But the RESULT of all that (e.g. deepseek) should belong to all people. And THAT is why these IP arguments by fuckAI are dangerous, because it is only a threat to open models. The answer is open source (or weight) AI models and with advances in computing to run them locally.
Every gps ping your phone sells to ad companies? Every nanny cam that is connected to the cloud?
Public information vs private information
Are you a troll, you just said their is no such thing.
I said no such thing. Private is private. I’m saying there is no such thing as public information, information that you put on the public internet, meaning for the public to see, then being somehow not “free”.
You can argue about copyright a little, I’m against it, but this has nothing to do with private messages or private information or data protection.
You’re just arguing a strawman and trying to make the discussion about something else and an ad hominem calling me a troll.
But reading and learning from a library, then writing and selling your own book based on that is NOT stealing.
It could be argued that these AIs are not actually learning but collecting and rearranging. That’s still stealing in my book, especially if it happens on a massive industrial scale and by a megacorp instead of a person.
That is incorrect though, it follows the fallacy that it’s just like a big database where all that (much larger) data is being copied and compressed into. It’s called machine learning and denying the reality of how it works is just not useful.
Imagine you study as an engineer in whatever field, but now laws have been passed that you only licensed the knowledge from university and publishers. If you work you have to demonstrate who you learned it from and then pay royalty fees. Obviously that would be insane for humans, but I do forsee that they will try to do this for machine learning. Because of the argument you made.
So any open source / weight model you find and could run locally (like e.g. deepseek) will now be illegal because you can’t prove where “dey tuerg dur dartae” from.
Thus all potential future gains from AI will be monopolized, while the costs socialized.
Large amounts of data were pirated, which is apparently legal if you’re training an AI. Source: Meta lawsuit
And the US government is doing anything
isn’t doing anything
Yeah that’s what I meant
Tech guy discovers enclosure.
Using copyleft licenses for closed models is clearly against the spirit of the licenses if the users don’t have access to the source code that includes the original copyleft works. Even open weight models aren’t really the source code, and are more akin to a compiled binary. The source code is all the training data and code used to train the model such that anyone can build on it and train new models.
I’m not a lawyer and am not sure how well existing copyleft licenses like GPL or CC-SA would stand up in court to enforce this, but if they don’t, then stronger licenses that explicitly cover works being used as training data need to become more common.
I’ve seen the argument that the models are just learning from the data in the same way a human would. That’s nonsense. It’s not like they’re creating a sentient being with its own agency that can tell them to fuck off if it wants. These companies are running a software pipeline against copyrighted IP to convert it into a derivative work that is now supposedly wholly owned by said company, but the reality is that it’s collectively owned by everyone who contributed to the copyleft training data.
Coincidentally, OOP just explained academic publishing to a T.
A parasite can take different shapes but only one form.
Sort of. Unless you go to a private university taxes go to the public schools to fun facilities and wages for the educators. While you may pay tuition, the overall cost of that education and the services needed for one to do research doesn’t come wholely out of your pocket.
Now I agree you should be compensated more, as someone who tried to get published academically and has filed patents I can see why there is a split of compensation.
You are missing the point, it’s not about education, but publishing. Read about Elsevier, and how Aaron Swartz died
Wdym? Scientists usually don’t get paid to publish. The person you replied to, probably meant academic publishing as in:
- Scientist does research and compiles manuscript, usually via public money, even in shithole countries like the US
- Scientist submits manuscript to for profit journal
- Journal outsources proofreading to other scientists, who do it for free
- Manuscript is accepted or revised on scientists time and money
- Scientist pays for publishing
- For Profit journal either charges extra for “open” publication or charges scientist and other scientists for access, usually by agreements with the respective library
- Profit! (On the journals part)
Where is the split of compensation? For patents there is, but for academic publishing usually not.
You also forgot how scientist is required to publish regularly to keep their job, and to find new jobs. So this process is far from optional.
That aside, that was an excellent write up. You should publish that to a journal or something. 😏
Thanks for the peer review!
clenches fist
Apparently you had to sign an agreement that you will not use internet for business/commercial purposes in the NSFNET days.
https://en.wikipedia.org/wiki/National_Science_Foundation_Network#Acceptable_use_policy
Turns out distilling all the world’s information into a friendly interface is worth something
Then how come none of the AI companies are actually profitable?
So only profitable things have value?
To capitalists? Yeah
Search engines did that decades ago. AI presents it’s results as fact without the ability to provide a source, it’s an aggressively unfriendly interface.
It’s not the same. The ability to converse about a topic is pretty dope. Compared to ring to find relevant information about a very unique case for instance.
Oh, they provide sources. At least, some of them do. It’s just that the sources don’t always back them up. It’s going to gin up sources whether it’s correct or not, because that’s what it’s programmed to do.
I haven’t used the latest models, I am really fucking over it, but yeah when I would ask for sources on why the model provided the answer it did every time they would jist hallucinate a web address.
It’s presented as friendly but yeah it’s actually incredibly hostile and predatory
The interface is friendly. Agreed.
Oh yea 25 years ago search engines were amazing
And it only makes sense that they’d get worse over time because technology has gotten worse.
Errr, wait…
Not to mention those same companies obsessed with AI are the ones who run the search engines. They made finding all those tutorials and other good resources harder. They ruin search results with ads and easily gamed algorithms that they stopped trying to improve. All that made people more willing to let the AI find the answer.
I have indeed noticed Google (and Google-based search engines like Startpage) has got worse in the past months. Even DuckDuckGo is better now (which as a long time ddg user is wild)
Honestly ddg has also gotten worse (as it’s bing in a condom), it’s just that Google has shit itself even harder
Kagi is a paid service and feels weird to pay for a search engine, but things have felt so much better since I tried them out months ago.
I pay for Metager and i do really recommend it. You pay a pittance per search while being free of the crap that infests the net. Kagi comes with it’s own set of issues.
I’ll look into Metager as this is the first I’ve heard of it.
What set of issues do you see with Kagi? It’s the best I’ve encountered as of late, but if there’s more I should consider; I’d like to learn.
I mainly find their CEO problematic, and their focus on AI does not bode well for the future.
Thank you for sharing. I do now recall a couple of the items you addressed, and I’ll have to keep others in mind.
We can’t have anything nice… google, digg, Reddit, github, Kagi, proton mail. —at one point in time these were good. The rot or trajectory of rot seems inevitable.
I’m not anti AI, but it doesn’t need to be in everything all the time. It shouldn’t obfuscate data sources. It shouldn’t be allowed to consume and gather everything breaking laws that would apply to any individual and ought to be enforced for any corporate entity.
Pointing an AI at a larger company’s documentation or feeding a local one a largish manual and using it to figure out how or why 2, 3, or 4 parts work together has been useful for me in the past. Again where I can then get to the data to learn for myself.
Letting the pattern recognization machine (AI) assess a logfile or 3 that are intertwined to help find issues has been helpful.
Injecting it into every internet search where I never asked is wasteful and stupid.
I don’t know if you use their Assistant, but you can limit it to specific sources. The first option after the entire web is the fediverse. They also have the small web, which just shows you things made by actual humans, not something trying to sell you something.
I started noticing ddg search with the “” operator is wonky. Also ecosia seems to have a lot of sponsored results?
It all makes sense when you realize that AI isn’t the product, control is.
When everyone depends on cloud services, especially storage, because they can’t afford hard drives or RAM anymore. Do you think the average normie is going to “stand up for principles of privacy and freedom of computing” or are they gonna say “it is what it is” and buy a tablet with 8GB of RAM and an office suite in the cloud?
Do you think these companies are above scanning everyone’s stuff to find out who is against them? Who is developing some great new idea? Who dissents the government?
Do you think these companies are above editing all saved copies of a news article and replacing it with something AI generated that looks real enough to memory hole something? (Copies of things in the cloud are already de-duplicated)
They don’t want us to be able to point out their flaws anymore. They want us to be submissive to them.
that sounds a bit too distopian, maybe in some 20 years but reakly at this point anything can happen.
Where have you been the last 20 years?
They’ve already broadcast their intentions to push cloud compute for home use. These data centers train AI - but chips are improving rapidly. Amazon and others have already stated they plan to use these for cloud compute services as they become obsolete for bleeding edge AI. Microsoft has a low local resource client to cloud version of Windows they are releasing. They want all compute to be subscription based and it will definitely lack any real privacy protections as long as they can keep corporate capture of congress.
To be fair, Google has been fighting a war against SEO and spam basically since it was started.
I don’t think they intentionally degraded their search engine. I think they just diverted resources away from fighting spam and SEO and instead dedicated those resources to AI stuff. Intentionally degrading their search results would require work. They’d have to convince their high-paid employees that for some reason they should make the results worse. But, just letting the stuff rot naturally as SEOs kept up their attacks, that’s free.
I’m at a point where I gladly pay for my search engine just to get good results.
One could argue we were always paying, with our data. But now we get less in return.
And the people charging admission are also hemorrhaging money.
Weird fucking timeline.
I think their hope is if they pull it off they won’t need money anymore, because they’ll have destroyed the last lever of power the working class still have: the ability to withhold their labour. LLMs are trying to turn labour into just another tool (that you rent).
Are those people so detached that they think industry is people writing emails to each other?
I’m just waiting for some RTO’d workers to be told they’re being mass laid off, and instead they just beat the manager to death. The irony of it only being possible because they were forced back into the office will be delicious.
Have them kill the CEOs and Boards of Directors, and I will bring the popcorn.
it’s called a walled garden for a reason.
I never had a license for having a big dick til they made it a requirement requestement that make it do what it as it do be it for cuz that’s what it be.
The internet will never be open again because of this.
Not under capital. No.
The same people who abuse community resources (bulldoze public green spaces, kill rivers, pollute oceans, despoil lands, toxify the air, extinct animals etc.), do the exact same to information and the sources of it. They take for themselves and give nothing back. They cannot coexist with community or communal anything (which is why they hate community and socialism). They cannot give and will compulsively take. Their lives are why greed is considered a sin, insatiable and cruel, and it was wise to believe so.
Never share anything of value with the world. Do not allow them to know value exists or they will come for it. Giving freely and openly has empowered those who advantage themselves, by stealing every idea that is shared. They are the enemy, they are the threat, their psychology is a hazard to the community.
Share only with your communities and close those communities to these thieves. The only solution to private power is community power. We need to build back our communities and abandon everything possible that they control.
It’s crazy that the tragedy of the commons is only a problem because the commons are also free for people to close up and start charging for.
















