Site Reliability Engineer
Site Reliability Engineer Position Description We are seeking an experienced and motivated Site Reliability Engineer (SRE) to join a high-performing team supporting multiple data product and platform groups. This role is focused on improving the reliability, scalability, observability, deployment, and operational support of critical data-driven platforms and services operating within complex production environments.The successful candidate will work closely with engineering, platform, and operational support teams to strengthen monitoring and alerting capabilities, improve logging and traceability, troubleshoot incidents, support deployments, and automate operational processes wherever possible. The environment includes Kubernetes, Helm, the ELK stack, and a broad range of modern Site Reliability Engineering and cloud platform practices.This is a hands-on technical role suited to someone who thrives in fast-paced operational environments, enjoys solving complex production issues, and is passionate about automation, platform reliability, and continuous improvement. The role requires strong collaboration with both client stakeholders and engineering teams to ensure operational excellence, platform resilience, and service availability across critical systems.Your future duties and responsibilities - Support, maintain, and improve highly available production platforms and services across cloud and containerised environments.- Manage and support Kubernetes clusters and Helm-based
Other jobs of interest...
Perform a fresh search...
-
Create your ideal job search criteria by
completing our quick and simple form and
receive daily job alerts tailored to you!