Sr. Site Reliability Engineer - 11444

Coupa

Ann Arbor, Michigan, United States

Full-Time

Senior (7+ yrs)

Engineering & Development

Posted on April 20, 2026

The Impact of a Sr. Site Reliability Engineer at Coupa:

Coupa’s Site Reliability Engineers are part of the Cloud Operations team, owning end-to-end availability and performance of mission critical service and building automation to prevent problem recurrence. SREs provide administration of Linux machines, web servers, application servers and infrastructure support for customer environments.

What You'll Do:

Own end-to-end availability and performance of critical services, including building automation to prevent recurring issues

Administer Linux and Windows systems across web, application, and database servers

Develop and automate solutions using various programming languages

Provide application and infrastructure support, including participating in on-call rotations for emergencies

Enhance monitoring, alerting, and observability to ensure reliability and performance

Collaborate with cross-functional teams on releases, infrastructure, troubleshooting, and maintain documentation such as RCAs

What You Will Bring to Coupa:

Bachelor’s degree in Computer Science, Information Systems, or related field, with 5+ years of experience in system administration and large-scale web operations

Strong programming skills (PowerShell, Python, Bash, or OOP languages) and experience with automation and configuration management tools (Chef, Puppet, Ansible, etc.)

Hands-on experience managing cloud infrastructure (AWS, GCP) and container platforms (EKS, GKE), plus Infrastructure as Code tools like Terraform

Proficiency in CI/CD pipelines, source control (Git with complex branching), and deployment/automation tools (Jenkins, Octopus, Rundeck)

Solid understanding of networking and operations concepts (DNS, load balancing), monitoring tools (Datadog, Splunk, New Relic), and database administration (MS SQL Server)

Strong Agile/Scrum experience (JIRA), ITIL practices (incident/change management, RCA), and excellent communication, problem-solving, and ownership skills

Apply for this position