We dive into safety evals as part of our series about benchmarking. Research cover's Sakana AI groundbreaking paper about self-evolving models. Our opinion section focuses on the case for spatial intelligence and world models. Engineering will discuss another cool AI framework. This week featured two standout papers that reveal complementary frontiers of AI development: one that pushes the limits of open-ended, self-improving systems, and another that rigorously quantifies how much information large language models can retain. The first, Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents, presents one of the most credible instantiations yet of recursive self-modifying agents. The second, How Much Do Language Models Memorize', introduces a principled and practically measurable framework for assessing the memorization capacity of modern LLMs. Both contributions illuminate core dynamics of how AI systems evolve, learn, and remember'and together, they paint a vivid...
learn more