AI Interpretation Scoring Prompt Master Guide
AI Interpretation is designed to support assessments that involve scoring rich, qualitative inputs such as written responses, uploaded documents, and open-ended reasoning. Instead of manually reading and scoring each submission, you can use AI Interpretation to automate the process using clear, well-structured prompts. It will generate a ‘derived score’, which can then be used in your feedback reports and charts.
This guide walks you through practical techniques for prompting the AI to generate reliable scores and troubleshoot the Interpretation. Let’s jump in.
Your AI Interpretation Scoring Prompt Checklist
The reliability (and repeatability) of AI Interpretation scoring depends heavily on how precisely you define the criteria and communicate what each score represents. So, how do you guide the AI to score consistently, meaningfully, and in a way that supports your assessment goals?
Add clear instructions
Tell AI Interpretation directly what you want to score.
Define the scale for the score
Provide a scale for scoring so AI Interpretation knows what good looks like and what each end of the scale represents.
Use merge strings
Bring the assessment data using merge strings into your prompt wherever possible, rather than listing all the details manually. This is what keeps the score specific to a respondent, and means your prompt will be repeatable.
Read the reasoning
Troubleshoot your scoring by checking the reasoning behind the score.
Set the max score
Ensure that you define the maximum score for your interpretation so that the AI Interpretation scores have a defined ceiling.
(Cohorts Only) Split Types and Filters
For cohort interpretations that require multiple responses for scoring, a cohort split must be added. This can also be filtered by classifier.
Now let’s get into the details of how to approach these checklist items.
Add Clear Instructions
It’s easy to assume the AI will infer what you want, but it won’t be accurate without clear direction. If you want a score that’s repeatable across responses, you need to be specific about what is being scored.
Here’s what a good instruction includes:
What to score
Tell the AI exactly what aspect of the response should be evaluated:
“Score the respondent’s explanation of their go-to-market strategy here: [Answer].”
“Evaluate the clarity of the revenue model description attached here: [Answer].”
“Assess how well the uploaded pitch deck communicates product-market fit. Here it is: [Answer]”
Example Prompt
“Score the response to [Question] for clarity and completeness on a scale of 0 to 5, where 0 means no clarity and 5 means exceptionally clear and thorough.
Define the Scale for the Score
Once you’ve told the AI what to score, the next step is to define how it should score. That means giving it a scale or a range of values with a clear definition of what each end (and ideally the middle) of that scale represents.
If you don’t define the scale, the AI may make an incorrect guess.
Choose a consistent scale
Common options include:
0 to 5 – often used in rubrics or evaluation frameworks.
0 to 10 – useful for more granularity across levels.
0 to 100 – ideal for expressing levels of risk, confidence, or fit and translating this into a percentage.
Pick a scale that aligns with how you intend to use the score and use it consistently across the assessment unless there’s a strong reason to vary it.
Define the anchors
It’s not enough to say “0 to 5”. You should define what those numbers mean. This helps the AI interpret borderline cases more reliably and supports clearer, more justifiable feedback.
Example Prompts
“0 = no evidence of strategic thinking, 5 = strong, well-reasoned strategy backed by examples.”
“0 = high risk, 100 = no risk.”
“1 = vague or incomplete response, 10 = thorough and insightful.”
Use Merge Strings
Merge strings are the key to making your prompts dynamic, respondent-specific, and scalable across assessments. Rather than copying and pasting answers or context into your prompt manually, you can use merge strings to automatically pull in the responses from each submission.
This makes your prompt repeatable and ensures that the AI is scoring the right content for the right person every time.
Example
If you want to score a respondent’s answer to Question 5, which asks about their business model. The answer may be a selected answer(s) from a single or multiple-choice list, a Text field, a long Text field, or an uploaded document.
Instead of writing:
“Score the business model description.”
Use:
“Score the following business model description from the respondent on a scale of 0–5: [Merge String for Question 6’s Answer Text].”
Now the AI will see the actual answer provided by that individual, without you needing to retype or customize anything.
Read the Reasoning
AI doesn’t always get the score right on the first try, but it always shows its working. That makes it easier to:
Spot misunderstandings of the question
Identify whether the wrong scale was applied
Adjust your prompt to guide better future results
If a score doesn’t look quite right, that’s your cue to check the reasoning.
What to look for in the explanation
Is the reasoning aligned with your scoring rubric?
Did the AI misinterpret the respondent’s intent or terminology?
Is the justification missing any required factors?
You can also use the resolved prompt to see exactly what the AI interpretation used as context.
If it looks off, check:
The prompt wording — Was the instruction too vague?
The merge string — Did the correct response get pulled in?
The scale — Was it defined clearly in the prompt?
Set the Max Score
Even if you’re using a defined scale, it’s still a good idea to explicitly set the maximum score. This is especially important when using non-standard ranges (like 100-point scales) or scoring across multiple criteria. This is also a requirement for pulling your scores into feedback reports.
Don’t forget to associate scores with a segmentation
If you want to add your score to the report (which generally is the reason for creating it), you will need to associate the score with a segmentation. Segmentations are easy to create, and instructions on how to do so can be found here.
Use split types and filters to target responses in a cohort
If your AI Interpretation prompt is evaluating cohort answers (that is, comparing multiple submissions as part of a group), you need to add cohort splits to your answer merge strings.
A cohort split instructs the AI to treat the group of responses as a unit. This allows the AI Interpretation to:
Use all answers within a group
Score group-level performance
Surface trends across individuals.
You can use classifiers or filters to narrow which responses are considered. It’s also possible to bring in wider cohort split variables using a table.
Bonus: Bring in external context
Even with clear instructions and a defined scale, AI scoring improves dramatically when you give it examples of what “great” looks like. AI Interpretation performs best when it can anchor its judgment to reference materials, templates, or model answers. If you want top scores to reflect a certain standard, make that standard available in the prompt. This is also where merging in your rating text and benchmarks can be helpful.
Example Prompts
“Score the written responses using the DISC assessment model as a reference."
“Score based on alignment to the following government standards [URL]."
As always, test.
AI Interpretation is powerful, but it’s not a seasoned evaluator with domain expertise. It’s a tool that follows instructions. You’ll typically need a few test runs to strike the right balance between precision, fairness, and depth.
Tip: Start with sample responses
If you don’t have live data yet, create a few realistic example responses that represent:
A strong/high-scoring answer
A mid-range or average response
A weak or minimal submission
Then run your scoring prompt against each one. See if the scores make sense and if the reasoning matches your expectations.