Why Test Failure Handling?

Services fail. Networks drop. APIs slow down. If you only test the happy path, you'll discover your error handling is broken when it matters most — in production, at 2 AM, during a traffic spike.

!

Failures Are Inevitable

Every dependency will fail eventually. The question isn't whether, but how your application handles it when it does.

Test Before Production

Chaos engineering in local development lets you verify retry logic, circuit breakers, and fallbacks work correctly before deployment.

AWS Service Chaos

Inject failures directly into your local AWS service emulations. Test how your application handles DynamoDB throttling, S3 access errors, SQS timeouts, and more — with realistic, service-aware error responses.

500

Error Injection

A configurable percentage of requests return realistic AWS errors with correct status codes and error formats. Supports weighted error selection from multiple error types.

Latency Injection

Add random delay between a minimum and maximum duration (in milliseconds). Simulates slow network conditions and overloaded services.

Timeout Simulation

A configurable percentage of requests hang indefinitely, never returning a response. Tests your application's timeout settings and cancellation logic.

Connection Reset

A configurable percentage of requests have the TCP connection dropped mid-request. Simulates network failures and infrastructure problems.

Supported AWS Services

DynamoDB SQS S3 SNS EventBridge Step Functions Cognito SSM Secrets Manager IAM

Service-Aware Error Responses

Injected errors use the correct format for each service — JSON for DynamoDB and Cognito, XML for S3 and IAM — so your application's error handling is tested against realistic responses.

Built-in Error Types

Generic AWS Errors

InternalServerError (500) ServiceUnavailableException (503) ThrottlingException (429) AccessDeniedException (403) ResourceNotFoundException (404) ValidationException (400)

Service-Specific Errors

ConditionalCheckFailedException ProvisionedThroughputExceededException NoSuchKey NoSuchBucket UserNotFoundException NotAuthorizedException

Errors are weighted — configure multiple error types with different probabilities to simulate realistic failure distributions.

CLI Commands

# Enable chaos on a service uvx --from local-web-services lws chaos enable dynamodb # Set chaos parameters uvx --from local-web-services lws chaos set dynamodb \ --error-rate 0.1 \ --latency-min 50 \ --latency-max 200 # Set timeout and connection reset rates uvx --from local-web-services lws chaos set s3 \ --timeout-rate 0.05 \ --connection-reset-rate 0.02 # Check status for all services uvx --from local-web-services lws chaos status # Disable chaos on a service uvx --from local-web-services lws chaos disable dynamodb

Runtime Toggling

Enable, disable, and adjust chaos on running services without restarting ldk dev.

1

Start Normal

Run your application with AWS services in normal mode. Verify the happy path works.

2

Enable Chaos

Turn on failure injection for specific AWS services while the application is running.

uvx --from local-web-services lws chaos enable dynamodb uvx --from local-web-services lws chaos set dynamodb --error-rate 0.3
3

Observe and Fix

Watch how your application reacts. Check retry behavior, error messages, and fallback logic. Disable and fix issues.

uvx --from local-web-services lws chaos disable dynamodb

External Service Chaos

Inject failures into your external fake servers. Test how your application handles third-party API outages, slow responses, and network failures.

500

Error Rate

A configurable percentage of requests return HTTP 500 or 503 errors instead of the normal response. Supports weighted status codes for specific error distributions.

Latency Injection

Add random delay to responses between a minimum and maximum duration. Simulates slow network conditions and overloaded external APIs.

Timeout Simulation

A configurable percentage of requests hang indefinitely. Tests whether your timeout settings and cancellation logic work correctly.

Connection Reset

A configurable percentage of requests have the TCP connection dropped mid-request. Simulates network failures and load balancer issues.

Configuration

YAML Configuration

chaos: error_rate: 0.1 # 10% of requests return errors error_status_codes: - status: 500 weight: 7 # 70% of errors are 500s - status: 503 weight: 3 # 30% of errors are 503s latency: min_ms: 200 # minimum added delay max_ms: 2000 # maximum added delay timeout_rate: 0.05 # 5% of requests hang connection_reset_rate: 0.02 # 2% of connections drop

CLI Commands

# Enable error injection on a fake server uvx --from local-web-services lws fake chaos \ --name stripe-api \ --error-rate 0.1 # Add latency injection uvx --from local-web-services lws fake chaos \ --name stripe-api \ --latency-min 200 --latency-max 2000 # Enable timeout simulation uvx --from local-web-services lws fake chaos \ --name stripe-api \ --timeout-rate 0.05 # Enable connection resets uvx --from local-web-services lws fake chaos \ --name stripe-api \ --connection-reset-rate 0.02 # Disable all chaos uvx --from local-web-services lws fake chaos \ --name stripe-api \ --disable

Use Cases

Common resilience patterns you can validate with chaos testing — for both AWS services and external dependencies.

Retry Logic

Verify that transient errors trigger retries with proper backoff. Confirm that non-retryable errors fail fast instead of wasting time.

Circuit Breakers

Test that circuit breakers open after enough failures and close again when the service recovers. Verify fallback responses during open state.

Graceful Degradation

Confirm your application still functions when a dependency is down. Non-critical features should degrade, not crash the entire system.

Timeout Handling

Ensure your HTTP clients have proper timeout settings and handle them gracefully. Hanging requests should not block the event loop or exhaust connection pools.

Dead Letter Queues

Verify that failed messages are routed to DLQs after exhausting retries. Confirm alerting and monitoring triggers on DLQ activity.

Error Reporting

Confirm that errors from services are properly logged, reported to monitoring tools, and surfaced to users with helpful messages.