Site Reliability Engineer
Overview
A growing technology organization is seeking a Senior Site Reliability Engineer with deep expertise in observability, monitoring strategy, and operational excellence to support large-scale digital platforms and customer-facing systems.
This role is ideal for someone who thrives at the intersection of reliability engineering, systems visibility, incident response, and proactive collaboration with software engineering teams.
The ideal candidate brings strong communication skills, a calm and methodical approach to operational challenges, and the ability to influence reliability practices early in the software development lifecycle.
Responsibilities
- Design and improve observability and monitoring strategies across distributed systems and digital platforms
- Partner closely with software engineering teams to improve reliability, operational readiness, and deployment visibility
- Build proactive monitoring and alerting capabilities that improve customer experience and platform stability
- Drive operational best practices across incident response, troubleshooting, and root cause analysis
- Help engineering teams improve system resilience, scalability, and supportability
- Contribute to platform modernization and monitoring tool adoption initiatives
- Participate in architecture discussions and operational planning during early phases of development
- Use AI-assisted development tools to accelerate documentation, analysis, auditing, and operational workflows
- Translate complex technical issues into clear communication for technical and non-technical stakeholders
Required Qualifications
- Strong background in Site Reliability Engineering, Production Engineering, Platform Engineering, or DevOps
- Expertise in observability, monitoring, and operational health of distributed systems
- Experience with tools such as Dynatrace, Grafana, Datadog, AppDynamics, Prometheus, New Relic, or similar platforms
- Strong troubleshooting and systems-thinking abilities
- Experience partnering directly with software engineering teams in fast-paced environments
- Ability to break down and communicate complex architectures and operational issues clearly
- Experience supporting customer-facing applications or high-availability environments
- Comfortable leveraging AI tools such as Claude, Copilot, or similar technologies to improve workflows and productivity
Preferred Qualifications
- Experience with digital commerce, travel, fintech, SaaS, or other high-scale customer platforms
- Exposure to user behavior monitoring or digital experience analytics tools
- Development or software engineering background
- Experience with cloud-native systems and modern infrastructure platforms
- Familiarity with CI/CD pipelines and deployment observability
- Experience influencing operational practices across engineering organizations
What We’re Looking For
We’re looking for someone highly collaborative, proactive, and operationally minded — an engineer who can partner effectively with development teams and help improve reliability before production issues occur.
This is not a pure infrastructure administration role. Success in this position comes from strong observability expertise, communication skills, systems thinking, and the ability to influence engineering teams toward better operational outcomes.

