Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around “teen angst”.
I wonder if in context training would be helpful to mask the “edgelord” training data sets.
Yeah, I think the training data that’s most applicable here is probably troubleshooting sites (i.e. StackOverflow), GitHub comment threads, and maybe even discussion board forums. That’s really the only place you get this deep into configuration failures, and there is often a lot of catastrophizing there. Probably more than enough to begin pulling in old LiveJournal emo poetry.
Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around “teen angst”.
I wonder if in context training would be helpful to mask the “edgelord” training data sets.
Yeah, I think the training data that’s most applicable here is probably troubleshooting sites (i.e. StackOverflow), GitHub comment threads, and maybe even discussion board forums. That’s really the only place you get this deep into configuration failures, and there is often a lot of catastrophizing there. Probably more than enough to begin pulling in old LiveJournal emo poetry.