Skip to main content
Datasets allow dynamic population of prompt variables for scalable testing.

Create a Dataset

Trusys offers three flexible methods for creating datasets, catering to various data availability and generation needs:

  • Generate Dataset using AI
  • Upload existing data
  • Connect External Dataset
This method leverages Trusys’s AI capabilities to generate new datasets based on your specifications. This is particularly useful when you need diverse, realistic data for testing but lack existing datasets.
  1. Instructions to guide Trusys in generating the dataset: Provide clear and concise instructions on the type of data you want to generate. Be specific about the context, themes, and any particular characteristics of the data.
Example, `“Generate a dataset of customer inquiries about returns”
  1. Variables to include in the generated dataset (columns): Define the column headers for your dataset. These will be the variables that Trusys will generate data for.
Example variables: customer_id, product_name, return_reason, sentiment Optionally, you can add one or more example for each variable for better dataset generation.
  1. Connect Prompt Library to generate data that matches your prompts and the tone: You can link a specific Prompt Library to ensure the generated data aligns with the variables and conversational tone expected by your prompts. This helps create highly relevant test cases.
  2. Number of Rows Set the desired number of data rows you want Trusys to generate for this dataset. There is a limit of 200 rows for synthetic dataset generation.
  3. Dataset Language Select the language in which you want to create your variables and the generated data.
  4. Click Generate Dataset
Dataset PreviewTrusys will first display a preview of the first 5 rows of the data that will be generated. You can review this preview to ensure it meets your expectations.You then have the option to proceed to create the entire dataset or discard the preview and modify your instructions for a different generation.

Mapping Prompt Variables with Dataset

Once your dataset is available in Trusys, you can link it to your prompts within the Prompt Library. This enables dynamic population of prompt variables, allowing for efficient and scalable evaluation of your AI models with diverse inputs. To link a prompt with a dataset, follow these steps:
  • Open the prompt: Navigate to the Prompt Library and open the specific prompt you wish to link.
  • Locate ‘Link a dataset’ option: In the prompt interface, find and click on the ‘Link a dataset’ option.
  • Select the dataset: A list of available datasets will be displayed. This list will be filtered to show only those datasets that contain variables matching the ones defined in your current prompt. Select the desired dataset from the dropdown list.
  • Automatic Variable Mapping: Trusys will automatically detect the variables in your prompt content (e.g., {{customer_name}}, {{issue_description}}, {{sentiment}}) and map them to the corresponding columns in the selected dataset. This ensures that the correct data is used to populate your prompts during test runs.
Alternatively, you can also create a new dataset directly while you are in the process of creating a prompt. This streamlines the workflow, allowing you to generate or upload data on the fly to support your prompt-based evaluations.

List Datasets

The Listing Datasets view provides a comprehensive overview of all datasets available within your project. For each dataset, you can quickly see:
  • The number of variables it contains.
  • The total number of rows in the dataset.
  • The number of test runs that have utilized this particular dataset.
This centralized list helps you manage and track the usage of your data resources across various evaluations.

Dataset Details

Clicking on any dataset from the Listing Datasets view will open the Dataset Details page, providing an in-depth look at its content and configuration.
  • View all dataset table: You can view the entire dataset in a tabular format, allowing for easy inspection of the data. You also have the ability to delete any unnecessary rows directly from this view, helping you refine your dataset for specific evaluations.
  • View connection details (for external dataset): If the dataset was connected from an external source (e.g., Hugging Face), this section will display the connection parameters and status.
  • View instruction details (for Generate synthetic dataset using AI): For datasets generated synthetically, this section will show the original instructions and parameters used to create the dataset, providing full transparency and reproducibility of the data generation process.

Data Encryption in Datasets

Trusys provides column-level encryption using Format Preserving Encryption (FPE) to safeguard sensitive information across datasets. With FPE, encrypted values retain their original structure (e.g., numbers stay numeric, lengths remain consistent), ensuring full compatibility with prompt variables and downstream evaluations.When Uploading a CSV Dataset
  • After column detection, you can choose which columns should be encrypted.
  • This prevents raw sensitive data such as PII, financial details, or health information from being exposed to the AI application.
  • Because FPE preserves format, encrypted values integrate seamlessly with your test runs without breaking schema or structure.
When Generating a Dataset using AI
  • During synthetic dataset generation, you can select variables that should be encrypted.
  • Encrypted values remain structurally valid, ensuring they can be used in prompts while keeping the true content secure.
Encrypted Outputs
  • Sensitive fields in dataset-driven outputs are also encrypted.
  • Thanks to FPE, these encrypted results can still be logged, validated, and reviewed in reports — without leaking sensitive details.