Overview
TraceOps is a microservice observability simulator designed to demonstrate how distributed systems behave under normal and failure conditions. It combines a visual simulator with real observability tools to provide a complete debugging experience.
System Architecture
The system is composed of three core microservices:
- Order Service – Entry point for client requests
- User Service – Validates user identity
- Product Service – Checks product availability
These services communicate synchronously and simulate real-world service dependencies.
Request Flow
1. Request→2. Process→3. Validate User→4. Check Product
- Client sends an order request
- Order Service processes the request
- Order Service calls User Service
- If valid → calls Product Service
- If product available → order is completed
Failures at any stage will propagate back and are visualized in the simulator.
Observability Stack
- Metrics: Prometheus
- Dashboards: Grafana
- Tracing: Jaeger
- Logs: Loki
All services are instrumented using OpenTelemetry to provide unified observability.
Metrics
The system tracks key performance indicators such as:
- Requests per second (RPS)
- Latency (P95)
- Error rates
Failure Scenarios
You can simulate different conditions:
- User Service failure
- Product Service failure
- Random failures
- Artificial delays
Each scenario is reflected in:
- UI simulation
- Metrics dashboard
- Logs and traces
Debugging Workflow
- Observe anomalies in Grafana dashboards
- Inspect traces in Jaeger to locate slow services
- Check logs for detailed error messages
Design Goals
- Visualize microservice interactions
- Demonstrate real observability practices
- Simulate production-like failures
- Provide an educational debugging experience