We are looking for Tech Ops - Production Support & Reliability Lead
Front-line production support for Braviant's AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role - not a developer role despite "Lead" in title.
Stack:
-
AWS - VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
-
PostgreSQL on Amazon RDS (~15 instances)
-
Datadog + CloudWatch (APM, logs, alerting)
-
Java microservices / API-heavy app stacks
-
Jira (ITSM) + Slack (ops channels)
-
Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane
Requirements:
Must-have:
-
3+ years production support / SRE / NOC / ops engineering
-
Hands-on AWS - EC2/ECS, VPC networking, IAM
-
Operational PostgreSQL / RDS - slow query reading, basic tuning, vacuum awareness
-
Incident triage across infra + app layers
-
Structured incident response (ITIL, NIST, or equivalent)
-
SLA management in a ticketed environment (Jira or similar)
-
Strong written English for escalation + post-incident write-ups
Nice-to-have:
-
Datadog / CloudWatch fluency
-
AWS data services (Glue, S3, Athena, EventBridge)
-
Basic IaC (CloudFormation, SAM, Terraform)
-
Financial services or other regulated-environment background
-
AWS SysOps Administrator or Solutions Architect cert
-
Scripting / automation