Friday November 21, 2025

How we ensure AI delivers for students and educators

Illustration showing a document connected by arrows to code windows, representing a pipeline from text input to software output.
Illustration representing Evaluators

In a recent op-ed in The 74, Jessamy Almquist from Learning Commons and Rose Wang at OpenAI explain why rigorous evaluation must become essential infrastructure for AI in education. The piece highlights how our Evaluators tool — built with Student Achievement Partners and Achievement Network — helps assess whether AI-generated content meets pedagogical standards, using text complexity as a concrete example.

Artificial Intelligence products are transforming how teachers work by offering support with everything from lesson planning to personalized instruction. Their potential to streamline tasks and expand access to tailored learning has rightly sparked excitement across the education ecosystem. But as more classrooms begin to use AI-powered tools, we have to ask: How do we know if these tools are any good?

The truth is, educators often don’t know. In the AI world, models are rarely evaluated on meaningful educational tasks. Common benchmarks tend to focus on mathematical accuracy and coding problems, not on open-ended tasks that require knowledge of pedagogical best practices. Unlike traditional curricula, content generated by AI-powered tools rarely undergoes expert review, with rare exceptions like Google LearnLM.

As a result, it’s not always clear whether AI-generated lesson plans are designed with best practices, whether AI-generated passages build the right skills and knowledge and ultimately whether AI-generated feedback is effective for supporting student growth. This leaves educators as the last line of defense in determining what content is good, and students are even more at risk of getting less rigorous content. The lack of shared, transparent evaluation tools means we’re flying blind — not because AI technologies are inherently flawed, but because the infrastructure to assess them hasn’t caught up.

Evaluation is the missing layer that can help education technology expand with trust and impact.

Read the full piece in The 74