Amazon Fault Injection Simulator
AWS Fault Injection Simulator (FIS) is a fully managed service for running fault injection experiments on AWS. It allows you to improve an application's performance, observability, and resiliency by identifying and fixing weaknesses through controlled chaos engineering experiments.
APIs
AWS Fault Injection Simulator API
The AWS Fault Injection Simulator API provides programmatic access to create and manage experiment templates, experiments, and actions for conducting chaos engineering experimen...
Capabilities
AWS FIS Chaos Engineering
Workflow capability for executing chaos engineering experiments using AWS FIS. Enables resilience engineers and SREs to design, execute, and monitor fault injection experiments ...
Run with NaftikoFeatures
Fully managed service requiring no agent installation with pre-built fault injection actions for EC2, RDS, ECS, EKS, and more.
Ready-to-use resilience scenarios for AZ failures, power interruptions, network disruptions, and cross-region connectivity issues.
CloudWatch alarm-based stop conditions and safety levers prevent unintended impact during live testing.
Tag-based resource targeting scopes experiments to specific environments, applications, or resource subsets.
Run experiments across multiple AWS accounts using target account configurations.
API and CLI access enables automated resilience testing in deployment pipelines.
Console and API provide real-time status of executing actions, affected resources, and triggered stop conditions.
Fine-grained IAM controls restrict which users can create, run, or view experiments and affected resources.
Use Cases
Validate application behavior under resource failures before they occur in production.
Run structured fault injection experiments following chaos engineering principles.
Verify that monitoring and alerting systems detect and respond to failures correctly.
Conduct planned game day exercises simulating failure scenarios for team readiness.
Integrate resilience testing into CI/CD pipelines for continuous validation.
Test cross-region failover mechanisms and recovery time objectives.
Integrations
Stop conditions use CloudWatch alarms to automatically halt experiments.
Task execution roles define which AWS resources experiments can affect.
Stop instances, terminate instances, and inject CPU/memory stress on EC2.
Stop ECS tasks and inject faults into containerized workloads.
Terminate Kubernetes nodes and pods running on EKS.
Trigger RDS failovers, reboot instances, and pause cluster I/O.
Inject latency and errors into Lambda function invocations.
Pause DynamoDB replication between replicas.