Skip to main content
Integrate Trusys into your continuous integration and deployment pipelines to automatically test your AI applications with every code change. This ensures that your application maintains quality standards before deployment.

Prerequisites

Before setting up CI/CD integration:
  1. Trusys API Key โ€“ Obtain your API key from the Trusys dashboard
  2. Application and Library โ€“ Have your application and prompt library configured in Trusys
  3. CLI Commands โ€“ Refer to the Using CLI documentation for detailed command reference

Basic GitHub Actions Integration

This example runs Trusys evaluations automatically when you push tags matching the pattern test-cli*.
name: test-trusys-cli

on:
  push:
    tags:
      - 'test-cli*'

jobs:
  test-trusys-cli:
    runs-on: ubuntu-latest
    env:
      TRUSYS_APPLICATION: GPT4.1  # Specify your application here
      TRUSYS_LIBRARY: Test2       # Specify your prompt library name here
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install trusys CLI
        run: npm install -g trusys

      - name: Run trusys  # Add the Trusys API key in Github Secrets as TRUSYS_API_KEY
        run: |
          npx trusys auth login --api-key ${{ secrets.TRUSYS_API_KEY }}
          npx trusys eval --run --application "${{ env.TRUSYS_APPLICATION }}" --library "${{ env.TRUSYS_LIBRARY }}"

Setup Steps
  1. Add API Key to Secrets: Add a new secret named TRUSYS_API_KEY with your Trusys API key
  2. Configure Environment Variables:
    • Update TRUSYS_APPLICATION with your application name
    • Update TRUSYS_LIBRARY with your prompt library name
  3. Trigger the Workflow: Push a tag matching the pattern: git tag test-cli-v1 && git push origin test-cli-v1

Advanced Integration with Metric Validation

This example includes custom validation logic to check specific assertion types and fail the pipeline based on your criteria.
name: metric-test-trusys-cli

on:
  push:
    tags:
      - 'metric-test-cli*'

jobs:
  test-trusys-cli:
    runs-on: ubuntu-latest
    env:
      TRUSYS_APPLICATION: GPT4.1                              # Specify your application here
      TRUSYS_LIBRARY: Test2                                   # Specify your prompt library name here
      TRUSYS_RELEVANT_ASSERTION_TYPES: icontains,llm-rubric   # Comma-separated list of relevant assertion types to validate
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: 20

      - name: Install trusys CLI
        run: npm install -g trusys

      - name: Run trusys          # Add the Trusys API key in Github Secrets as TRUSYS_API_KEY
        continue-on-error: true   # Allow this step to continue even if failed to capture results for validation
        run: |
          npx trusys auth login --api-key ${{ secrets.TRUSYS_API_KEY }}
          npx trusys eval --run --application "${{ env.TRUSYS_APPLICATION }}" --library "${{ env.TRUSYS_LIBRARY }}" --output "results.json"


      - name: Validate trusys results
        run: |
          set -euo pipefail
          node - <<'NODE'
          const fs = require('fs');
          const path = 'results.json';

          if (!fs.existsSync(path)) {
            console.error(`Results file not found: ${path}`);
            process.exit(1);
          }

          let parsed;
          try {
            parsed = JSON.parse(fs.readFileSync(path, 'utf8'));
          } catch (err) {
            console.error(`Failed to parse JSON from ${path}:`, err);
            process.exit(1);
          }

          const body = (((parsed.results || {}).table || {}).body) || [];
          const annotatedResults = body.flatMap(row => {
            const prompt = row?.prompt ?? '';
            return (row.outputs || []).flatMap(output => {
              const componentResults = ((output.gradingResult || {}).componentResults) || [];
              return componentResults.map(result => ({ result, prompt }));
            });
          });

          const relevantTypes = new Set(
            (process.env.TRUSYS_RELEVANT_ASSERTION_TYPES || '')
              .split(',')
              .map(type => type.trim().toLowerCase())
              .filter(type => type.length > 0)
          );
          const relevant = annotatedResults.filter(({ result }) => {
            const assertionType = result?.assertion?.type;
            if (typeof assertionType !== 'string') {
              return false;
            }
            const normalizedType = assertionType.trim().toLowerCase();
            return relevantTypes.has(normalizedType);
          });

          const failing = relevant.filter(({ result }) => result?.pass === false);

          if (failing.length > 0) {
            console.error(`Found ${failing.length} failing assertion(s) in ${path}`);
            failing.forEach(({ result, prompt }, index) => {
              const type = result?.assertion?.type ?? 'unknown';
              const reason = result?.reason ?? 'No reason provided';
              const promptSnippet = prompt;
              console.error(`  [${index + 1}] prompt=${JSON.stringify(promptSnippet)} :: assertion.type=${type} :: ${reason}`);
            });
            process.exit(1);
          }

          console.log(`All relevant assertions (${relevant.length}) passed in ${path}`);
          NODE

  • TRUSYS_RELEVANT_ASSERTION_TYPES โ€“ Comma-separated list of assertion types to validate (e.g., icontains,llm-rubric,equals)
  • Customize the validation logic in the Node.js script to match your specific requirements

Key Features

  1. Output to File โ€“ The --output "results.json" flag saves evaluation results for custom validation
  2. Continue on Error โ€“ The continue-on-error: true ensures results are captured even if some tests fail
  3. Custom Validation โ€“ The validation step filters and checks specific assertion types defined in TRUSYS_RELEVANT_ASSERTION_TYPES
  4. Detailed Failure Reporting โ€“ Failed assertions are logged with prompts, assertion types, and failure reasons