TraceOps – System Documentation

Overview

TraceOps is a microservice observability simulator designed to demonstrate how distributed systems behave under normal and failure conditions. It combines a visual simulator with real observability tools to provide a complete debugging experience.

System Architecture

The system is composed of three core microservices:

  • Order Service – Entry point for client requests
  • User Service – Validates user identity
  • Product Service – Checks product availability

These services communicate synchronously and simulate real-world service dependencies.

Request Flow

Client
Order Service
User Service
Product Service
1. Request2. Process3. Validate User4. Check Product
  1. Client sends an order request
  2. Order Service processes the request
  3. Order Service calls User Service
  4. If valid → calls Product Service
  5. If product available → order is completed

Failures at any stage will propagate back and are visualized in the simulator.

Observability Stack

  • Metrics: Prometheus
  • Dashboards: Grafana
  • Tracing: Jaeger
  • Logs: Loki

All services are instrumented using OpenTelemetry to provide unified observability.

Metrics

The system tracks key performance indicators such as:

  • Requests per second (RPS)
  • Latency (P95)
  • Error rates

Failure Scenarios

You can simulate different conditions:

  • User Service failure
  • Product Service failure
  • Random failures
  • Artificial delays

Each scenario is reflected in:

  • UI simulation
  • Metrics dashboard
  • Logs and traces

Debugging Workflow

  1. Observe anomalies in Grafana dashboards
  2. Inspect traces in Jaeger to locate slow services
  3. Check logs for detailed error messages

Design Goals

  • Visualize microservice interactions
  • Demonstrate real observability practices
  • Simulate production-like failures
  • Provide an educational debugging experience