Remote Source

    Senior Site Reliability Engineer - Ireland

    Ireland
    Full-Time
    Senior (7+ yrs)
    Engineering & Development
    Posted on May 19, 2026

    Who You'll Work For

    We are seeking an experienced and analytically-minded Site Reliability Engineer to join our organisation on a permanent, remote basis from Ireland. In this role, you will be instrumental in building, deploying, and operating critical production systems with a steadfast commitment to scalability, reliability, observability, and security. You will work collaboratively with cross-functional teams to ensure our infrastructure remains resilient, efficient, and future-ready. This is an excellent opportunity for a detail-oriented professional who thrives in a dynamic environment and is passionate about solving complex infrastructure challenges.

    What You'll Do

    • Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance, ensuring systems meet stringent security standards
    • Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency across production environments
    • Proactively monitor production systems, establish intelligent alerting strategies, and implement automated incident response mechanisms to minimise downtime
    • Create and maintain detailed incident response runbooks; conduct thorough postmortem analyses following incidents to identify root causes and prevent recurrence
    • Collaborate with software engineering teams to identify and resolve infrastructural bottlenecks, designing innovative solutions that enhance product deployment workflows
    • Manage and optimise monitoring infrastructure using industry-standard tools, ensuring comprehensive visibility across all systems
    • Plan, communicate, and execute maintenance windows on production systems with minimal disruption to service availability
    • Triage platform and infrastructural issues with decisiveness and analytical rigour; engage with third-party vendors and support teams as required
    • Deploy new systems and updates in a staged, risk-managed manner, ensuring safe and incremental rollouts
    • Survey and adopt best practices in infrastructure and platform management to maintain secure, scalable, and fault-tolerant systems
    • Study the design and implementation details of open-source systems to enhance troubleshooting capabilities and accelerate issue resolution
    • Work transparently with stakeholders to communicate system status, planned maintenance, and infrastructure improvements

    #LI-EO1

    #automation #Ansible #Terraform #observability #Prometheus #Grafana #cloud platforms #AWS #GCP #Azure #container #orchestration #Kubernetes #Docker #CI/CD #Jenkins #GitLab

    Company:  Arista Networks

    Manufacturer of networking hardware and software for cloud data centers and enterprise environments.
    1001-5000 employees
    Hardware
    HQ: United States