| This guide helps you choose a testing method based on the kind of AI experience you’re evaluating, such as customer support chatbots, agentic assistants, and machine-learning-driven personalization. While AI changes what we pay attention to, these methods remain flexible tools you can adapt to assess trust, understanding, control, and outcomes. |
On this page:
Terminology and framework
Define you AI
Before you choose a method, clarify what kind of artificial intelligence experience you’re testing. Teams often use “AI” as an umbrella term, but study design changes depending on what you’re evaluating.
Common categories:
- Conversational AI / customer support chatbot: A dialog-based experience meant to answer questions or resolve issues.
- Agentic AI: An experience designed to reduce user action by doing work on the user’s behalf (intent-centric rather than interface-centric).
- Machine learning personalization: Recommendations and content personalization based on patterns of behavior.
Tip: Consider adding a short glossary to your study plan (or internal documentation) so stakeholders and observers share the same definitions.
Testing framework
UserTesting has identified four components critical to test AI experiences. This is our recommended testing framework:
-
Understanding: Users feel they can “read” the AI’s thinking, intent, and boundaries, just like they would with a knowledgeable coworker or partner.
- Especially important for both chatbots and agents.
-
Trust: The user has an accurate “trust meter” for the AI. They rely on the AI when it deserves it, and step in when it doesn’t.
- This pillar is often one of the strongest predictors of long-term adoption and risk.
-
Control: The user’s perception of the AI is as a collaborative partner that respects human authority, never “taking over” or making the user feel helpless.
- Loss of perceived control is often a major driver of rejection of AI tools, especially agents.
-
Outcome: The partnership delivers better outcomes than the human could achieve alone, and those outcomes feel relevant and aligned with the user’s intent.
- Poor outcomes negatively impact trust and adoption.
- Poor outcomes negatively impact trust and adoption.
Quick decision guide
Use this as a starting point:
Note: These methods do not fundamentally change in an AI context. Keep in mind that these methods are flexible, abstract tools, not fixed solutions tied to specific use cases.
| You need to understand trust, comfort, and expectations. | Start with a Discovery interview |
| You need to see how people complete an AI-supported task. | Consider a Usability test |
| You need to evaluate how trust changes after repeated exposure. | Look into a Longitudinal study |
| You want to see how users handle uncertainty with the AI and if it impacts trust. | Think about a Live Conversation or a Think-Out-Loud test |
| You need to track trust/credibility improvements over releases. | Consider a Benchmark test |
| You’re early and still shaping the AI experience. | Explore a Concept test or Content test |
| You’re choosing between variations (tone, disclosure, UI pattern, guardrails). | Think about a Preference test |
| You need broad directional input fast. | Pilot a Survey |
| Your experience spans multiple channels (web + app + support). | Look into an Omnichannel study |
Applying methodologies
Discovery interview
- Discovery interviews uncover a participant’s background, goals, expectations, and attitudes. They are especially useful when AI trust and stigma may shape behavior.
-
Use discovery interviews to learn:
- Prior experiences with AI (positive/negative) and baseline comfort
- When participants want an AI option vs a human option
- What makes an AI response feel credible and safe
- What “agency” looks like to them (opt out, escalation paths, control)
-
When to use:
- Early in planning (before you finalize tasks and flows)
- When stakeholders disagree on what “success” should mean for the AI experience
- When you need to design guardrails and disclosures that align with user expectations
Usability test
- Usability tests show which parts of a design help or hinder goal completion.
- For AI experiences, it’s often helpful to look beyond task completion and consider:
- Spectrum of success: Did the AI get the user 50%, 75%, or 100% of the way?
- Error recovery: Can the user correct the AI and still reach their goal?
- Feedback and transparency: Does the user understand what happened and why?
- Trust signals: What increases or decreases willingness to continue?
-
When to use:
- Conversational support experiences (answering questions, resolving issues)
- Agentic experiences where the user delegates steps to the system
- AI-assisted workflows in a product (summaries, recommendations, drafting, routing)
-
How to make it AI-appropriate
- Include tasks that intentionally produce:
- A “good” prompt outcome
- A “messy” or ambiguous prompt outcome
- A correction moment (“That’s not what I meant. I meant ___”)
- Include tasks that intentionally produce:
- Track not just “did they complete it,” but “did they feel comfortable continuing.”
Live Conversation
- Live Conversations allow you to follow up, redirect, or introduce new conditions in real time to explore how users and the AI respond together. This method is especially useful for evaluating trust calibration and perceived control in dynamic interactions.
-
Use Live Conversations to learn:
- How participants interpret and respond to AI outputs in the moment
- Where they hesitate, second-guess, or seek reassurance
- What follow-up questions they naturally want to ask the AI
- How they recover when the AI makes an error or provides an unclear response
-
When to use:
- When testing complex or dynamic AI interactions that may require clarification or follow-up
- When you want to explore emotional reactions like trust, frustration, or surprise more deeply
- When you need to adapt the session in real time based on participant behavior
Think-Out-Loud Test
- Think-Out-Loud tests capture participants’ thoughts as they interact with an AI experience independently. Because there’s no moderator guidance, this method reveals how users make sense of the AI on their own. It is particularly helpful for assessing understanding and early trust signals.
-
Use think-out-loud tests to learn:
- How participants interpret AI responses and decide whether to trust or act on them
- Where expectations don’t match the AI’s behavior or outputs
- How clearly the AI communicates its capabilities, limitations, and next steps
- What signals (tone, wording, structure) influence confidence in the response
-
When to use:
- When you want unbiased, natural reactions to AI behavior without moderator influence
- When testing at scale to identify patterns across multiple participants
- When evaluating clarity, usability, and trust signals in AI-driven interactions
Longitudinal study
- Longitudinal studies involve interacting with the same participants over time.
- AI experiences (especially agentic tools) often require repeated use for participants to:
- Learn how the AI behaves
- Calibrate trust appropriately
- Develop a stable preference for AI vs non-AI options
-
When to use:
- You’re testing relationship dynamics (trust, reliance, abandonment)
- You want to see how prompts, expectations, and confidence evolve
- You expect behavior to change after the “novelty phase”
Tip: If longitudinal study isn’t the right fit, you can still run a multi-touchpoint sequence (for example, 3–5 short sessions across a week).
Benchmark test
- Benchmark tests measure a baseline experience so you can compare against future iterations.
- For AI experiences, benchmark on metrics tied to adoption:
- Trust and confidence ratings
- Perceived credibility and safety
- Perceived control/agency (ability to opt out, escalate, correct)
- Satisfaction with error handling and recovery
-
When to use:
- You’re shipping frequent iterations and need an ongoing scorecard
- You want to compare different AI UX patterns over time
- You need evidence that changes improve confidence, not just speed
Concept test
- Concept tests gather feedback on an idea before you build too much detail.
- Useful for early decisions such as:
- Where AI should appear (and where it shouldn’t)
- What the AI should promise (and not promise)
- Whether users want AI suggestions or fully agentic automation
-
When to use:
- Early roadmap and design exploration
- Testing “should we build this?” before investing in implementation
Content test
- Content tests focus on words and messaging rather than interaction.
- For AI experiences, content is often the product:
- Disclosure language (is it clear it’s AI?)
- Microcopy for limitations and uncertainty
- Confirmation steps and “proof” moments (itinerary-style summaries, review screens)
- “How to get help” escalation copy
-
When to use:
- You’re refining AI responses, guidance, and warnings
- You need to reduce confusion, over-trust, or fear
Preference test
- Preference tests provide feedback on one design compared to other designs.
- Use for AI UX decisions such as:
- Disclosure style (explicit vs subtle)
- Confirmation patterns (confirm vs auto-execute)
- Response formats (bullets vs narrative; citations vs no citations)
- Guardrail presentation (inline vs modal vs help link)
Tip: Preference tests may not yield a single winner, but they can clarify strengths/risks of each approach.
Survey
- Surveys gather feedback through self-completed forms and should be piloted with a small unmoderated test first.
- Surveys are useful for:
- Baseline AI attitudes in a target audience
- Self-reported trust and comfort
- Preference for AI vs human options in specific scenarios
-
When to use:
- You need broader directional input to complement qualitative sessions
- You want to size trust concerns before deeper studies
Omnichannel study
- Omnichannel studies collect insights across different modes of interaction as participants work toward a single goal.
- AI experiences often span channels:
- App + web + support chat
- AI assistant + knowledge base + escalation to human
- Notifications + follow-up actions + account changes
-
When to use:
- The AI is only one part of a full journey, and success depends on handoffs
Related content
|
Want to learn more? Check out these Knowledge Base articles... |
Interested in growing your skills? Check out our University courses... |
|
Need hands-on training?
|
Can't find your answer?
|