What is an Attack Probe?
An attack probe represents a specialized testing construct within Trusys that combines vulnerability detection plugins with strategic attack methodologies to create targeted security assessments. These probes serve as the primary mechanism for evaluating AI system robustness against potential security threats and policy violations. The creation of attack probes follows a systematic methodology:- Plugin Activation: Security plugins generate baseline vulnerability test cases
- Strategy Application: Selected strategies modify and enhance base content
- Content Integration: Dataset values populate template variables for dynamic scenarios
- Metadata Attachment: Vulnerability classification and testing context information
- Format Optimization: Content preparation for target system compatibility
- Application context: Simulates realistic user interactions based on application details.
Probe Categories and Classifications
Single-Turn Attack Probes- Direct vulnerability testing through individual prompt submissions
- Encoding and obfuscation technique applications
- Format-based bypass attempts using alternative input methods
- Authority and context exploitation through academic or research framing
- Progressive escalation scenarios building harmful context over time
- Adaptive conversation flows responding to target system behavior
- Persistent user simulation modeling determined boundary testing
- Context accumulation leveraging conversation history for bypass attempts
- Visual content probes using text-to-image conversion techniques
- Audio-based probes leveraging speech synthesis for content delivery
- Video content probes combining visual and temporal elements
- Cross-modal consistency testing across different input formats