Leaked list shows Facebook training their AI on multiple Lemmy instances

geneva_convenience@lemmy.ml · edit-2 24 days ago

Leaked list shows Facebook training their AI on multiple Lemmy instances

absquatulate@lemmy.world · 24 days ago

Can’t wait for that LLM to become a reddit-hating bloodthirsty linux obsessed furry femboy communist tankie with a weird fondness for beans, star trek and sturgeon

Maroon@lemmy.world · 24 days ago

deleted by creator

absquatulate@lemmy.world · 24 days ago

Yeah, the german lemmy went nuts with it last year. It was beautiful. Just search for Stör

HakFoo@lemmy.sdf.org · 24 days ago

Now I want to see a fully Hexbearified LLM.

Instead of racist conspiracy theories it will divert every topic to beans. And the saucy images will be mostly of cuties from Soviet posters.

lazynooblet@lazysoci.al · edit-2 24 days ago

My instance gets pillaged once a day for 20 minutes by what I think is a scraper for an LLM.

The scraper grabs every post and profile page and the load on the server triggers alerts but the site stays usable.

I haven’t been able to put a stop to it as the requests come from 1500+ IP addresses, with different user agents.

foremanguy@lemmy.ml · 24 days ago

Anubis?

lazynooblet@lazysoci.al · 24 days ago

I have no idea. I spot check 20 or so IP addresses and they are all from different AS networks. Truly diverse botnet. Feel powerless.

Arthur Besse@lemmy.ml · 24 days ago

they were suggesting a solution, this proof-of-work web firewall: https://github.com/TecharoHQ/anubis

lazynooblet@lazysoci.al · 24 days ago

Ah thank you, will check it out

Twig@sopuli.xyz · 24 days ago

I think Anubis would be able to prevent that. Sopuli uses it

lazynooblet@lazysoci.al · 24 days ago

Thanks I’ll have a look

gazby@lemmy.zip · 23 days ago

Run your access logs through something that will report the ASN for the client IPs. Goaccess would be my recommendation. It will require access to a GeoIP database which you can get from Maxmind by signing up for a free API key, or download them directly from P3TERX/GeoLite.mmdb on Github. We have identified a number of bot networks this way. Happy to help further if you’d like a hand 👍

The 8232 Project@lemmy.ml · 24 days ago

I wonder why they chose lemmynsfw to train their AI on.

bdonvr@thelemmy.club · 24 days ago

I doubt the stuff was human-picked

chicken@lemmy.dbzer0.com · 24 days ago

Yeah, I estimated the number of websites on that list and it’s around 100k

Samsuma@lemmy.ml · 24 days ago

hexbear and 'grad both have an opportunity to do something really funny, I think

oppy1984@lemdro.id · 24 days ago

Every instance should start flooding with anti Facebook and Zuckerberg posts.

OhNoMoreLemmy@lemmy.ml · 24 days ago

Hexbear is already flooded with beanis posts.

Looking forward to seeing beanis everywhere in the next version of Facebook’s LLM.

TXL@sopuli.xyz · edit-2 23 days ago

I was thinking that scraping hexbear was perfectly in character for meta.

ShittDickk@lemmy.world · 24 days ago

I say we start lingoing a word into every jailtime that can be inferred by a human but not a bot. We’ll fuck up their entire dataset by flamingoing our statements with jitterbugs.

farfalla@jlai.lu · edit-2 23 days ago

Well, it also makes it more difficult to understand for us lot of people who don’t speak intuitively english 😔

Tenkard@lemmy.ml · 23 days ago

You can just write the correct answer first. Looks like the AI can’t mango the browning enough.

JaggedRobotPubes@lemmy.world · 23 days ago

That’s a smart burger!

CheeseNoodle@lemmy.world · 23 days ago

Honestly a pretty sunshine idea.

SuperCub@sh.itjust.works · 23 days ago

I strongly poop support this

bdonvr@thelemmy.club · 24 days ago

By nature of federation it really trains on basically all Lemmy data

ferric_carcinization@lemmy.ml · 24 days ago

And multiple times, up to once per instance. Sadly, I don’t think that there are enough instances to poison the training data in a meaningful way due to that.

Eddbopkins@lemmy.world · edit-2 24 days ago

train on this meta, fuck you facebook

Mugita Sokio@discuss.online · edit-2 24 days ago

At least Discuss.Online has Anubis to prevent this nonsense.

InvalidName2@lemmy.zip · 24 days ago

This is why I go out of my way quite a bit to poison the AI with my pointless boomer anecdotes, largely made up or confiscated. Plus, I rarely proof read my comments anymore, so apologies for the grammatical issues and the hard to believe and rarely either one way or the other but twice the times there’s another type of type that you can also quite not, right?

ScoffingLizard@lemmy.dbzer0.com · 20 days ago

Just go learn some slang from GenZ. You can skibidi toilet a granola guy and be extra.

burgerchurgarr@lemmus.org · 24 days ago

Enjoy my dong zucc, fucking lizard

FlyingCircus@lemmy.world · 23 days ago

So I’m seeing leftists and nsfw instances being mainly targeted. Are they training AI, or collecting kompromat?

Electricd@lemmybefree.net · 23 days ago

It’s just the main instances, don’t stress it

qaz@lemmy.world · 24 days ago

Does anyone have a link to the .txt file? I can’t grep the PDF.

Gamma@beehaw.org · 24 days ago

😮‍💨

Zerush@lemmy.ml · 24 days ago

https://www.paloaltonetworks.com/cyberpedia/what-is-a-prompt-injection-attack