Functional evaluation is a critical aspect of ensuring your AI applications perform as expected and deliver accurate, reliable, and safe outputs. Trusys provides a robust Prompt Library feature that allows you to systematically test and evaluate your AI models using a variety of prompts and predefined metrics.

Key Steps

Connect your AI application or LLM model

Create a prompt library with datasets and variables

Run functional evaluation using the prompt library

Trusys enables you to evaluate your AI applications with structured prompt libraries, datasets, and functional evaluations. This helps you assess accuracy, reliability, and safety before deployment. A Functional Evaluation runs prompts or datasets against your connected AI applications or models, scoring responses based on defined metrics.

Run a New Evaluation

To initiate a new evaluation, navigate to the ‘Test Run’ section and follow these steps:

Select an Application

Choose the AI application or LLM model you wish to evaluate from your list of connected applications. This is the target for your test run.

Select a Prompt Library

Select the Prompt Library that contains the prompts you want to use for this evaluation. The prompts within this library will serve as the inputs for your chosen application.

Run Evaluation

Review the selected application(s) and prompt libraries. Once confirmed, click Run Evaluation to initiate the test run. Trusys will then execute the prompts, collect responses, and evaluate against the defined metrics.

Trusys executes prompts, collects responses, and evaluates them against metrics.

Evaluation Run List

The list view shows:

Status (Pending, Running, Completed, Failed)
Application evaluated
Prompt Library used
Start Time of the run
Total Prompts executed
Passed/Failed Metrics Count

This list helps you monitor the progress of ongoing evaluations and quickly identify completed or failed test runs.

Evaluation Run Details

Click a run to see detailed results:

Overall Summary – Pass/fail rate, metric performance
Prompt-by-Prompt Analysis – Input Prompt: The exact prompt that was sent to the AI application.
- AI Response: The response received from your AI application.
- Metric Results: The individual scores for each metric applied to that specific prompt-response pair, along with whether it passed or failed its expected value.
- Variable Values: If variables were used, the specific values that were substituted for that test case.
Metric Report – Average scores, distributions, pass/fail counts
Comparison View (optional) – Compare multiple runs across applications

Use both prompt libraries (for targeted tests) and datasets (for comprehensive benchmarking) to get the most reliable evaluation results.

Components

Get Started

TRU SCOUT

TRU EVAL

TRU PULSE

TRU GUARD

Feature & Tools

Workspace & Account

Functional Evaluation

Run a New Evaluation

Evaluation Run List

Evaluation Run Details

Components

Get Started

TRU SCOUT

TRU EVAL

TRU PULSE

TRU GUARD

Feature & Tools

Workspace & Account

​Run a New Evaluation

​Evaluation Run List

​Evaluation Run Details

Run a New Evaluation

Evaluation Run List

Evaluation Run Details