← Evaluating AI Output
Tomorrow Ready  ·  Evaluating AI Output

Before the Method, the Judgement: The Evaluation Gate

Subject adaptation  ·  Years 11–13  ·  Mathematics and Statistics  ·  Field-Based STEM  ·  Tony Jones
Tools can now select statistical methods and produce correctly formatted solutions at speed. The risk is not wrong answers. It is correct ones reached without any evaluation of why a particular method fits this data and this question. When method selection is automated, statistical reasoning has no work to do — and statistical reasoning is what the assessment is designed to measure.
Name candidatesTwo or three approaches
Name the criterionWhat makes one fit
Justify the choiceBefore any tool or calculation
Gate travels with submissionTimestamped process evidence
The strategy — five steps
1
Present the task. No tools yet.

Before students open any tool or begin any calculation, ask them to write down two or three approaches they could use to address the statistical task or problem. This is not a test of prior knowledge — it is a prompt to think before selecting.

2
Name one criterion.

Ask students to name one criterion that would make an approach appropriate for this specific context — the type of data, the number of variables, what the question is actually asking, or the size of the sample. Students choose the criterion most relevant to their problem, not the easiest one to write.

3
Compare and justify.

Students compare their candidate approaches against that criterion and select one. They write two to three sentences justifying the choice: why this method fits the data and the question, and what a different approach would miss or get wrong. The justification must name the specific context — not a general rule.

4
Collect the Gate record before tool use begins.

The Evaluation Gate record is a timestamped record of reasoning that predates the solution. Collect it before any tool use or calculation begins. A correct solution reached by any means cannot provide this evidence on its own — the Gate exists because the solution cannot.

5
The Gate travels with the final submission.

Students complete the task using their chosen method. The Gate record is submitted alongside the solution — not as additional work, but as the evidence that the method was evaluated rather than selected automatically.

Year-level examples
Years 9–10 — Introduction

Before any data analysis task, ask students to name two ways they could display this data and write one sentence explaining why one is better for the specific question being asked. This is the Gate at its simplest — two candidates, one criterion, one sentence. It builds the habit of evaluating before selecting without the full formal record required at senior level.

Years 11–12 — Statistical methods

Before any internally assessed statistical investigation, students complete a Gate record: two or three candidate methods, one criterion written in the student's own words (not copied from a definition), and two to three sentences of justification naming the specific data type, variable count, and question purpose. The record is submitted with the investigation as process evidence of statistical reasoning.

Year 13 — NCEA Level 3

For Level 3 statistical inference or time series standards, the Evaluation Gate record addresses NZQA's authenticity expectations directly. A student who can write that they chose a particular inference method because the data is numerical, the groups are independent, and the sample size exceeds the threshold for the relevant test has demonstrated statistical literacy that no solution alone can evidence. The Gate record is the visible, assessable proof.

Why it holds up
Decision checkpoint

The Gate record is produced before any tool can generate a solution, so it cannot be reverse-engineered from a correct answer. A student who can name why a method is appropriate for this specific data and question, and what an alternative approach would miss, has engaged in the evaluative thinking that underpins statistical literacy. That thinking is visible, assessable, and secure.

Teacher judgement note — NCEA alignment

For students working towards internally assessed achievement standards, the Evaluation Gate record functions as process evidence of statistical reasoning, which aligns directly with NZQA's authenticity expectations for work at this level. The record does not add assessment burden — it makes visible the thinking that the standard assumes is present but that a correct solution alone cannot confirm.