Posted by Alumni from TechCrunch
April 2, 2025
OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on non-public books it didn't license to train more sophisticated AI models. AI models are essentially complex prediction engines. Trained on a lot of data ' books, movies, TV shows, and so on ' they learn patterns and novel ways to extrapolate from a simple prompt. When a model 'writes' an essay on a Greek tragedy or 'draws' Ghibli-style images, it's simply pulling from its vast knowledge to approximate. It isn't arriving at anything new. While a number of AI labs, including OpenAI, have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That's likely because training on purely synthetic data comes with risks, like worsening a model's performance. The new paper, out of the AI... learn more

WE USE COOKIES TO ENHANCE YOUR EXPERIENCE
Unicircles uses cookies to personalize content, provide certain advanced features, and to analyze traffic. Per our privacy policy, we WILL NOT share information about your use of our site with social media, advertising, or analytics companies. If you continue using Unicircles by clicking below link, you agree to our use of Cookies while using Unicircles.
I AGREELearn more
x