← Back to Evaluating AI Output

Tomorrow Ready · Evaluating AI Output

Why One Output Fits Better: Same Prompt Two Outputs, Learning Languages Years 9 to 10

Subject adaptation · Years 9 to 10 · Learning Languages · Field-Based STEM · Tony Jones

AI does not produce one answer to a language task. It produces a range of plausible answers. The student who cannot say why one answer fits better than another has not yet exercised the disciplinary judgement the course is designed to build.

Generate
Two outputs, one prompt
Annotate
Name two differences
Justify
Why one fits better
Submit
Justification assessed

The Strategy

Same Prompt Two Outputs makes comparison the task. Two outputs from the same prompt, one annotated comparison, one justified preference: the justification is what is marked, not the chosen output. Evaluating language requires applying a criterion, and applying a criterion requires disciplinary knowledge.

  1. Students submit the same target-language prompt to a tool twice, or to two different tools, and collect both outputs in full.
  2. Students annotate the comparison, identifying at least two differences: register, vocabulary choice, cultural appropriateness, grammatical structure, or audience fit.
  3. Students write a justified preference, in the target language or in English as appropriate, naming why one output serves the task better than the other.
  4. The justification is submitted alongside or instead of the chosen output. The justification is the assessed artefact.

In Practice

Year 9

Differences are identified in English. Justification is written in English. Teacher provides a short criterion list (register, audience fit, accuracy) for students to select from before annotating. The focus is on the habit of comparison, not target-language metalanguage.

Year 10

Differences are identified partly in the target language where the student has the metalanguage. Criterion is selected independently. Justification uses target-language vocabulary for linguistic concepts where the student has it.

Implementation

Decision Checkpoint

A justification that says "Output A sounds more natural" without a reason is not sufficient. Return it with one prompt: "More natural for which audience, and what specific feature shows that?"

Teacher Judgement Note

AI output can be grammatically accurate but culturally misaligned. Prompt students explicitly to consider cultural appropriateness as a criterion alongside accuracy, particularly for contexts involving formality, relationships, or social roles.

Related Frameworks

Evaluation Gate · Context Triage · Verification Slip

Tony Jones · Founder, Field-Based STEM · Tomorrow Ready Resources · Free to use and share