Statistically we are cooked
📰 Reddit r/artificial
In order for an LLM to identify harmful content, that harmful content must be included in the model's weights. If you train a model on data that omits this information, then it may naively regurgitate harmful content provided by a human users without knowing that it is harmful. If harmful content is encoded in LLMs, and it is also true that jailbreaking LLMs is always technically possible (because LLMs are not deterministic). Then in theory every
DeepCamp AI