Senior Site Reliability Engineer - Ireland

Arista Networks

Ireland

Full-Time

Senior (7+ yrs)

Engineering & Development

Posted on May 19, 2026

Who You'll Work For

We are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent, remote basis from Ireland. In this role, you will be instrumental in building, deploying, and operating critical production systems with a steadfast commitment to scalability, reliability, observability, and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient, efficient, and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges.

What You'll Do

Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance, ensuring systems meet stringent security standards
Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments
Proactively monitor production systems, establish intelligent alerting strategies, and implement automated incident response mechanisms to minimise downtime
Create and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence
Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks, designing innovative solutions that enhance product deployment workflows
Manage and optimise monitoring infrastructure using industry-standard tools, ensuring comprehensive visibility across all systems
Plan, communicate, and execute maintenance windows on production systems with minimal disruption to service availability
Triage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as required
Deploy new systems and updates in a staged, risk-managed manner, ensuring safe and incremental rollouts
Survey and adopt best practices in infrastructure and platform management to maintain secure, scalable, and fault-tolerant systems
Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
Work transparently with stakeholders to communicate system status, planned maintenance, and infrastructure improvements

#LI-EO1

#automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab

Apply for this position