CommIT
Description
We are looking for Tech Ops – Production Support & Reliability Lead
Front-line production support for Braviant’s AWS multi-account stack. Monitor systems, triage alerts, execute runbooks, escalate cleanly to developers. Defensive ownership role – not a developer role despite “Lead” in title.
Stack:
- AWS – VPC, ECS, Lambda (SAM/CloudFormation), IAM, NAT, security groups
- PostgreSQL on Amazon RDS (~15 instances)
- Datadog + CloudWatch (APM, logs, alerting)
- Java microservices / API-heavy app stacks
- Jira (ITSM) + Slack (ops channels)
- Nice-to-have: AWS data services (Glue, S3, Athena, EventBridge), Metaplane
Requirements
Must-have:
- 3+ years production support / SRE / NOC / ops engineering
- Hands-on AWS – EC2/ECS, VPC networking, IAM
- Operational PostgreSQL / RDS – slow query reading, basic tuning, vacuum awareness
- Incident triage across infra + app layers
- Structured incident response (ITIL, NIST, or equivalent)
- SLA management in a ticketed environment (Jira or similar)
- Strong written English for escalation + post-incident write-ups
Nice-to-have:
- Datadog / CloudWatch fluency
- AWS data services (Glue, S3, Athena, EventBridge)
- Basic IaC (CloudFormation, SAM, Terraform)
- Financial services or other regulated-environment background
- AWS SysOps Administrator or Solutions Architect cert
- Scripting / automation
Originally posted on Himalayas
To apply for this job please visit himalayas.app.
Keep exploring on Get A Job.ai
Not quite the right fit? Your next opportunity is a click away.
- Browse all jobs
- More jobs by category
- Remote jobs you can do from anywhere
- Research typical pay for this role
- Set a job alert so new matches reach you first
- Upload your resume to apply faster
Hiring instead? Post a job and reach candidates searching right now.