
OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on non-public books it didn't license to train more sophisticated AI models. AI models are essentially complex prediction engines. Trained on a lot of data ' books, movies, TV shows, and so on ' they learn patterns and novel ways to extrapolate from a simple prompt. When a model 'writes' an essay on a Greek tragedy or 'draws' Ghibli-style images, it's simply pulling from its vast knowledge to approximate. It isn't arriving at anything new. While a number of AI labs, including OpenAI, have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That's likely because training on purely synthetic data comes with risks, like worsening a model's performance. The new paper, out of the AI...
learn more