Laude Institute Logo
Sign in

Laude Apps

Announcing Slingshots // TWO

February 26th, 2026

By Team Laude

These 14 projects from Stanford, Berkeley, MIT, CMU, UIUC, and Michigan are tackling production deployment, energy constraints, and continual learning. Several are building on infrastructure from Slingshots // ONE. Together, they show what happens when the right researchers get the right resources at the right time: research that ships, gets adopted, and moves the field forward.

// Slingshots // TWO

Today we're announcing Slingshots // TWO: 14 research projects taking on the deployment problems, efficiency constraints, and learning challenges that define where AI research is heading right now. We've been supporting these projects over the last several months and are proud to count this group of researchers and scientists as part of the Laude community.

This batch reflects an increased focus on making AI systems work well in the real world. Several teams are building directly on projects from Slingshots // ONE, extending benchmarks and tools that have already seen significant traction and community adoption.

The Slingshots program supports researchers and scientists with no-strings-attached resources to help move promising work forward. The only requirement is that work ships into the world as open source. Support is bespoke to each project and can include funding, compute, engineering resources, product guidance, and mentorship; resources are deployed immediately to unblock teams and move work forward quickly.

The work in this batch ranges from continual learning benchmarks to agent evaluation, energy measurement to data infrastructure, and more. Much of the work in Batch 2 is already making significant waves; projects include Recursive Language Models that enable models to better handle massive workloads; Harbor, which built on Terminal-Bench (a Batch 1 project) to create a framework for measuring and improving agent performance; QED-Nano, a 4B model trained with SFT and RL can match Gemini 3 Pro on Olympiad math proofs at 3x lower cost; and SREGym, a benchmark for testing AI agents for Site Reliability Engineering (SRE) on high-stakes infrastructure work.

The work in this batch continues to build out shared evaluation and infrastructure that began in Batch 1 and is only accelerating across the broader ecosystem. We are proud to see the momentum building in this group.

Now, let's meet Slingshots // TWO.

Alex Shaw, Ryan Marten, Ludwig Schmidt, Andy Konwinski

An agent evaluation framework for defining and working with environment-based tasks. Laude Institute

Alex Zhang, Omar Khattab

A language model system that offloads input processing to a symbolic environment (e.g. REPL) with access to language model calls. MIT

Etash Guha, Ryan Marten, Ben Feuer, Negin Raoof, Richard Zhuang, Tyler Griggs, Charlie Ruan, Alex Shaw, Mike Merrill, Ludwig Schmidt, Alex Dimakis

An end-to-end open setup for training and evaluating terminal agents, with curated data, real environments, and RL loops that make agent research reproducible for everyone. UC Berkeley

Omar Shaikh, Valentin Teutschbein, Kanishk Gandhi, Michael Bernstein, Diyi Yang

A personal model that learns to predict what you'll do next from everything you see and do on your computer (e.g. multimodal interaction traces), enabling assistants to understand context and help proactively without constant re-briefing. Stanford University

Amrith Setlur, Aviral Kumar

A compact 4B model post-trained to write Olympiad-level mathematical proofs, showing that a 4B model trained with SFT and RL can match Gemini 3 Pro on Olympiad math proofs at 3x lower cost, proving specialization and test-time compute can compete with parameter scaling. Carnegie Mellon University

Jackson Clark, Yiming Su

A benchmarking system for designing, developing, and evaluating AI agents for Site Reliability Engineering (SRE) on high-stakes infrastructure work. University of Illinois Urbana-Champaign

Jon Saad-Falcon, Avanika Narayan, John Hennessy, Christopher Ré, Azalia Mirhoseini

A unified metric for measuring intelligence efficiency, capturing both the LM capabilities delivered and the energy required to power the AI stack to enable better understanding how we scale local and cloud LLMs. Stanford University

Jae-Won Chung, Mosharaf Chowdhury

A systems initiative that precisely quantifies and optimizes energy in ML workloads and creates open-source tools, making energy a first-class metric in ML systems design. University of Michigan

Milind Srivastava, Alan Zaoxing Liu, Zeying Zhu, Vyas Sekar

A drop-in system to accelerate analytics queries by 100x by replacing expensive "scan everything" data analysis with lightweight summaries. Carnegie Mellon University

Continual Learning Benchmark
Parth Asawa, Matei Zaharia

A benchmark that evaluates how well models and systems improve over time, explicitly testing continuous learning rather than point-in-time static capability. UC Berkeley

Dilara Soylu, Christopher Potts, Omar Khattab

An open source infrastructure to enable provider-agnostic prompt and weight optimization for compound AI systems. Stanford University

Noah Ziems, Omar Khattab, Meng Jiang

An open source reinforcement learning framework for agentic workflows in DSPy, allowing for fine-grained GPU optimization at scale. MIT

Open Inference
Markian Rybchuk

An infrastructure that lowers barriers for developers to try models while creating a living dataset of real-world interactions, helping the community measure progress and discover failure modes. UC Berkeley

Melissa Pan, Matei Zaharia

A comprehensive study of production agents across 26 domains revealing that reliability remains the top challenge, currently mitigated through systems-level design, suggesting underexplored research directions for future agent systems. UC Berkeley

Researchers: Tell us what you're working on and what you need to get it into the world. We accept projects year-round, deploy resources immediately, and announce new batches every few months. Apply here.