Argus

Argus is an AI chatbot built into Trusys that helps you analyze and understand your application traces through natural conversation. Instead of manually reviewing traces and metrics, you can ask Argus questions in plain language and get instant insights about your AI application’s performance, failures, and anomalies.

What is Argus?

Argus is your intelligent assistant for trace analysis. It can:

Explain Anomalies - Identify and explain unusual patterns or behaviors in your traces
Analyze Metrics Failures - Help you understand why specific evaluations or metrics failed
Deep Trace Analysis - Provide detailed insights into individual traces or groups of traces
Compare Traces - Analyze differences between successful and failed traces
Application-Wide Analysis - Examine patterns across all traces from one or multiple applications

Key Features

Natural Language Queries

Ask questions in plain English, such as:

“Why did this trace fail the hallucination check?”
“What’s causing the spike in latency for my chatbot application?”
“Explain the anomalies in traces from the last hour”
“Compare successful and failed traces for the recommendation system”
“What are the common patterns in traces with high token usage?”

Comprehensive Analysis

Argus can analyze:

Single Traces - Deep dive into a specific trace’s execution, spans, and evaluation results
Multiple Traces - Identify patterns across groups of traces
All Application Traces - Analyze trends and issues across your entire application
Multi-Application Analysis - Compare behavior across different applications

Contextual Understanding

Argus understands:

Span hierarchies and execution flows
Evaluation results and failure reasons
System metadata and health indicators
Token usage, costs, and performance metrics
Session-level conversational patterns

Getting Started with Argus

Chat with Argus

Navigate to your Trusys Traces tab or trace details
Click Ask Argus to start a new analysis session
Argus will greet you and ask what you’d like to analyze
Type your question or describe what you want to investigate
Argus will analyze your traces and provide insights

Example Conversations

Analyzing Failed Evaluations:

You: Why are my traces failing the relevance check?

Argus: I've analyzed your recent traces and found that 15 out of 23 
traces failed the relevance evaluation. The main issues are:

1. Context mismatch: 60% of failures show the retrieved context 
   doesn't match the user query topic
2. Incomplete responses: 30% are missing key information requested
3. Off-topic responses: 10% deviated from the original question

The pattern started 3 hours ago after deployment v2.3.1. Would you 
like me to show specific examples?

Investigating Anomalies:

You: Explain the anomalies in my application traces from yesterday

Argus: I've identified 3 anomalies in your traces from yesterday:

1. Latency Spike (2:15 PM - 2:45 PM): Average response time jumped 
   from 800ms to 3.2s. Affected 127 traces. Root cause: Vector 
   database query timeout increased.

2. High Token Usage (5:30 PM - 6:00 PM): Token consumption 3x higher 
   than baseline. 45 traces affected. Pattern shows unusually long 
   context windows being passed to the LLM.

3. Evaluation Failures (8:00 PM - 9:00 PM): Sudden increase in PII 
   detection failures. 23 traces flagged. User queries contained 
   more personal information than usual.

Would you like me to dive deeper into any of these?

Deep Trace Analysis:

You: Analyze trace ID abc-123-xyz in detail

Argus: Here's a detailed analysis of trace abc-123-xyz:

Overview:
- Total Duration: 2.3 seconds
- Status: Failed (Hallucination Check)
- Token Usage: 1,247 tokens ($0.0062)
- Spans: 7 total (5 LLM calls, 2 vector DB queries)

Execution Flow:
1. User query received: "What are the benefits of exercise?"
2. Vector DB retrieval: Found 3 relevant documents
3. LLM call 1: Context processing (340ms)
4. LLM call 2: Response generation (1.8s)
5. Hallucination check: FAILED

Failure Reason:
The response included a claim that "exercise cures diabetes" which 
is not supported by the retrieved context. The context only mentions 
exercise helps "manage" diabetes, not cure it.

Recommendation:
Add stricter grounding instructions in your prompt to ensure claims 
are directly supported by provided context.

Managing Your Chats

Access all your previous conversations with Argus:

Navigate to the Argus section on the header
Your chat history appears
Click on any chat from your history
The full conversation loads
Continue asking questions in the same context
Argus remembers the previous discussion and traces analyzed

Searching Chat History

Use the Search bar at the top of the chat history
Enter keywords, trace IDs, or topics
Argus filters chats containing your search terms
Click on relevant results to open that conversation

Search Examples:

“hallucination failures”
“trace abc-123”
“latency spike”
“production deployment”
“token usage analysis”

What Argus Can Analyze?

Trace-Level Analysis

Span Hierarchies - Understand execution flow and dependencies
Performance Metrics - Identify bottlenecks and slow operations
Evaluation Results - Explain why traces passed or failed checks
Token Usage - Analyze token consumption and costs
Error Patterns - Identify common failure modes

Application-Level Insights

Trend Analysis - Spot patterns over time
Anomaly Detection - Highlight unusual behaviors
Comparison Analysis - Compare different time periods or versions
Resource Utilization - Track token usage, costs, and API calls
Quality Metrics - Evaluate overall application performance

Multi-Application Analysis

Cross-Application Patterns - Find common issues across applications
Performance Comparison - Compare metrics between applications
Resource Distribution - Understand resource usage across your portfolio
Best Practices - Identify what works well in one app to apply elsewhere

Use Cases

Debugging Failed Evaluations When traces fail quality checks:

Ask Argus to explain the failures
Identify common patterns in failed traces
Get recommendations for fixes
Compare failed vs. successful traces

Performance Optimization To improve application performance:

Identify latency bottlenecks
Analyze token usage patterns
Find opportunities to reduce costs
Optimize prompt engineering based on insights

Production Monitoring For ongoing monitoring:

Investigate sudden changes in metrics
Explain anomalies in real-time
Analyze deployment impacts
Track quality trends over time

Root Cause Analysis When issues occur:

Trace problems to their source
Understand cascading failures
Identify external dependencies causing issues
Get actionable recommendations

Quality Assurance For testing and validation:

Analyze evaluation coverage
Identify edge cases and gaps
Compare test vs. production behavior
Validate improvements after changes

Tips for Effective Use

Be Specific Instead of: “Why are things failing?” Try: “Why are traces from my chatbot application failing the relevance check in the last 24 hours?” Provide Context Help Argus understand what you’re investigating:

Mention specific applications or trace IDs
Include time ranges when relevant
Specify which metrics or evaluations you care about
Note any recent changes (deployments, config updates)

Ask Follow-Up Questions Argus maintains conversation context:

Drill deeper into specific findings
Ask for examples or evidence
Request different analysis angles
Explore recommendations in detail

Use Trace/ Span or Session IDs For specific analysis:

Copy the Trace/ Span or Sessio ID from the Trace or Session details section
Paste it in your question to Argus
Get detailed breakdown of that specific execution

Iterate and Refine

Start with broad questions
Narrow down based on initial insights
Ask Argus to compare different scenarios
Request actionable next steps

Privacy and Security

Chat Privacy - Your chats are private to your account
Data Security - All conversations are encrypted
Trace Access - Argus only accesses traces you have permission to view
No Data Sharing - Your analysis sessions are not shared with other users

Components

Get Started

TRU SCOUT

TRU EVAL

TRU PULSE

TRU GUARD

Feature & Tools

Workspace & Account

What is Argus?

Key Features

Natural Language Queries

Comprehensive Analysis

Contextual Understanding

Getting Started with Argus

Chat with Argus

Example Conversations

Managing Your Chats

Searching Chat History

What Argus Can Analyze?

Trace-Level Analysis

Application-Level Insights

Multi-Application Analysis

Use Cases

Tips for Effective Use

Components

Get Started

TRU SCOUT

TRU EVAL

TRU PULSE

TRU GUARD

Feature & Tools

Workspace & Account

​What is Argus?

​Key Features

​Natural Language Queries

​Comprehensive Analysis

​Contextual Understanding

​Getting Started with Argus

​Chat with Argus

​Example Conversations

​Managing Your Chats

​Searching Chat History

​What Argus Can Analyze?

​Trace-Level Analysis

​Application-Level Insights

​Multi-Application Analysis

​Use Cases

​Tips for Effective Use

What is Argus?

Key Features

Natural Language Queries

Comprehensive Analysis

Contextual Understanding

Getting Started with Argus

Chat with Argus

Example Conversations

Managing Your Chats

Searching Chat History

What Argus Can Analyze?

Trace-Level Analysis

Application-Level Insights

Multi-Application Analysis

Use Cases

Tips for Effective Use