Platform Observability & SRE Enablement
Role: Architect / SRE Lead
Company: HHAeXchange
Duration: 2022-08-01 – Present
Project Overview
Designed and implemented platform-wide observability and SRE practices across
FMS Engine systems and supporting infrastructure.
Instrumented Linux services, systemd units, Monit-managed processes,
background workers (Resque), and ancillary dependencies to emit structured
metrics and health signals into Datadog.
Built real-time dashboards and alerting aligned to service behavior rather
than raw resource usage, enabling faster detection of degraded states,
worker backlogs, and dependency failures across multi-tenant environments.
Established a shared operational view of system health and reduced mean
time to detection and recovery for production incidents.