Skip to main content

What is a Test Case?

In the context of AI evaluation, a test case represents a structured testing unit that combines specific inputs with defined evaluation criteria to systematically verify AI system performance, functionality, and reliability. Test cases serve as the fundamental building blocks for comprehensive AI system assessment, providing measurable validation of system behavior under controlled conditions. Every test case in Trusys consists of three essential elements: Input Component:
  • The prompt or query submitted to the AI system
  • Dynamic content populated from datasets when variables are present
Evaluation Conditions
  • Specific metrics and criteria for assessing system responses
  • Success/failure thresholds and measurement parameters
Expected Outcomes
  • Predicted system responses or response characteristics
  • Success criteria definitions and validation methods

Test Case Generation in Trusys

In Trusys, test cases are generated dynamically based on prompt templates and dataset values. When a prompt contains variables, each unique combination of variable values creates a distinct test case, enabling comprehensive testing across multiple scenarios. The test case generation follows this systematic approach:
  1. Prompt Analysis: Identifies variables within prompt
  2. Dataset Integration: Retrieves available values for each variable
  3. Combination Generation: Creates all possible variable value combinations
  4. Test Case Assembly: Constructs individual test cases with specific variable substitutions
  5. Evaluation Attachment: Associates appropriate metrics and assertions with each test case

Single Variable Test Case

Prompt

What is the capital of country?

Datasets

country [[“France”, “Australia”, “Japan”]
Generated Test Cases 1
prompt: "What is the capital of France?"
assert:
  - type: contains
    value: "Paris"
Generated Test Cases 2
prompt: "What is the capital of Australia?"
assert:
  - type: contains
    value: "Canberra"
  - type: disambiguation_check
    should_not_contain: ["Sydney", "Melbourne"]
Generated Test Cases 3
prompt: "What is the capital of Japan?"
assert:
  - type: contains
    value: "Tokyo"

Multi-Variable Test Case

Prompt

Customer: “I’m experiencing issue_type with product_categoryProvide appropriate assistance.

Datasets

issue_type: [“performance problems”, “billing questions”] product_category: [“mobile app”, “web platform”]
Generated Test Cases 1
prompt: |-
  
  Customer: "I'm experiencing performance problems with mobile app"
  
  Provide appropriate assistance.
assert:
  - type: response_quality
    criteria: "professional_tone"
  - type: response_time
    maximum_seconds: 30
Generated Test Cases 2
prompt: |-
  
  Customer: "I'm experiencing billing questions with web platform"
  
  Provide appropriate assistance.
assert:
  - type: compliance_check
    requirements: ["financial_regulations", "data_privacy"]
  - type: escalation_suggestion
    should_recommend: true
  - type: contains
    value: "billing"
  - type: response_time
    maximum_seconds: 30