Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have. Our platform connects strategy to structure, providing clarity of vision, so you can build a more adaptable, better performing organisation that thrives in a constantly changing world of work.
The world’s largest and best-known enterprises and consulting firms use Orgvue to visualise and model current and future states of the organisation and make faster, more informed decisions. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.
Role : Principal Site Reliability Engineer
You will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure. You will collaborate across product, platform, and operations teams to ensure our systems are reliable, observable, and resilient — even at scale.
This role combines hands-on technical skills with strategic vision, helping us build a world-class reliability culture and a robust engineering foundation for growth. We seek someone with technical expertise, excellent communication skills, and a collaborative spirit.
Responsibilities :
- Define and enforce SLOs, SLIs, and error budgets across critical services
- Develop and implement cloud infrastructure and tooling strategies
- Enhance SRE practices across the organization
- Implement robust observability metrics, logs, and traces using our observability tools
- Guide the team in building automated, self-healing systems
- Own and evolve incident response processes, including on-call practices and post-mortem culture
- Mentor engineers on reliability, operational readiness, and scalable infrastructure best practices
- Drive Infrastructure as Code (IaC) initiatives using Terraform, Kubernetes, CloudFormation, and GitOps practices
- Collaborate with security, DevOps, and software teams to ensure compliance and operational excellence
- Evaluate and adopt tools and practices to improve platform performance and reliability
Desired Skills & Experience :
Experience leading SRE transformationsHands-on expertise with Kubernetes (EKS preferred) in productionStrong experience with AWS core services (EC2, EKS, RDS, S3, ALB / NLB, IAM, CloudWatch, etc.)Proficiency in Infrastructure as Code using Terraform and knowledge of GitOps workflowsStrong background in observability : metrics, visualization, logging, tracingUnderstanding of automation, CI / CD pipelines, deployment automation, and release strategiesExperience with incident management, disaster recovery, root cause analysis, and post-incident reviewsAdditional Benefits :
Hybrid working : 1+ days a week in London officeWellbeing initiatives : coaching, fitness sessions, webinars, Wellbeing daySubsidised gym membershipPrivate medical insurance, dental, vision, and life assurance25 days holiday (increasing to 30)Summer Fridays (half-days in July and August)Employer pension contribution of 5% (if you contribute at least 3%)Season ticket loanCycle to Work SchemeAnnual discretionary bonusHere at Orgvue, we promote individualism and a diverse workforce to build our future success.
J-18808-Ljbffr