Filevine
Role Summary:
Responsibilities
- Provide strong leadership, mentoring, and sound judgment as the Reliability Engineering lead on your team.
- Design and maintain autonomous systems for building, deploying, testing, and operating all Filevine products.
- Act as the authoritative voice of reliability across the full software development lifecycle (SDLC).
- Monitor, aggregate, dashboard, and alert on software/infrastructure events to ensure visibility and fast response.
- Continuously enhance CI/CD pipelines, automation scripts, playbooks, and tools to streamline processes and reduce resolution time.
- Proactively identify and resolve gaps in system availability, performance, and security while defending overall security posture.
- Document processes, architecture, procedures, and best practices; research, adopt, or build reliable tools to boost engineer productivity.
- Collaborate within your team (or independently), mentor junior engineers, participate in 24/7 on-call rotation for production support and emergency response, and communicate clearly with technical and management stakeholders.
Qualifications
- 8+ years of hands-on technical experience in software engineering, infrastructure, or operations roles, including a minimum of 4 years dedicated to Site Reliability Engineering (SRE).
- Demonstrated curiosity, self-motivation, continuous learning mindset, passion for improvement, and proactive enthusiasm to enhance systems and processes daily without needing direction.
- Strong proficiency in Python, Bash, PowerShell, and other common SRE tooling and scripting technologies.
- Expert-level experience designing, building, and maintaining autonomous systems that handle software build, deployment, testing, monitoring, and operations with minimal human intervention.
- Proficient hands-on experience with AWS (e.g., EC2, Kubernetes/EKS, CloudWatch, Lambda, S3, IAM).
- Proficiency in all core skills expected of an SRE II, including monitoring/alerting, incident response, capacity planning, performance optimization, CI/CD pipeline enhancement, and reliability engineering best practices.
- Bachelor’s degree in Computer Science, Information Systems, or a related field; equivalent certifications (e.g., Google Cloud Professional certifications, AWS certifications); or substantial comparable direct work experience.
- Proven track record of independently driving reliability improvements, reducing toil through automation, and contributing to high-availability, scalable production systems in a fast-paced environment.
Cool Company Benefits:
Privacy Policy Notice
Originally posted on Himalayas
To apply for this job please visit himalayas.app.
Working in United States
The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic consisting of 50 states and a federal capital district, Washington, D.C. The 48 contiguous states border Canada to the north and Mexico to the south, with the semi-exclave of Alaska in the northwest and the archipelago of Hawaii in the Pacific Ocean. The United States also asserts sovereignty over five major island territories and various uninhabited islands in Oceania and the Caribbean. It is a megadiverse country, with the world's th
More jobs at Filevine
Keep exploring on Get A Job.ai
Not quite the right fit? Your next opportunity is a click away.
- Browse all jobs
- More jobs by category
- Remote jobs you can do from anywhere
- Research typical pay for this role
- Set a job alert so new matches reach you first
- Upload your resume to apply faster
Hiring instead? Post a job and reach candidates searching right now.