Senior DevOps Engineer
Department : Engineering
Employment Type : Full Time
Location : London, UK
Description
The impact you will have :
You will have a transformative impact across Elliptic by evangelising DevOps, security, and reliability principles and fostering a culture of efficiency and autonomy. You will join a growing team of experienced and passionate engineers who are not afraid to fail and enjoy tackling difficult problems head-on. Openness is one of our core values at Elliptic, and nowhere is this more evident than in our engineering teams. We strongly encourage engineers to challenge convention and find unique and innovative solutions to our customers' problems.
Key Responsibilities
What you will do :
- Provide senior DevOps expertise and leadership across Engineering at all layers of the stack
- Evangelise DevOps, security and reliability engineering across the Engineering team-at-large
- Provision resilient infrastructure across multiple regions and AZs
- Build compliant, reliable and featureful developer platforms centered on container orchestration.
- Enable Continuous Delivery and Deployment capabilities using CICD pipelines and GitOps tooling
- Enable shifting left on security and testing, and facilitate progressive delivery in production
- Measure and improve developer productivity and experience across the SDLC
- Cost optimise infra through rightsizing and creating ephemeral, on-demand environments
- Proactively monitor for security and reliability using Observability tooling including SIEM, APM, tracing, infrastructure metrics, logs and dashboards
- Durably engineer away toil
Skills, Knowledge & Expertise
You will be a great fit here if you :
Are passionate about DevOps and, Security and Reliability engineeringAre passionate about helping engineering teams become high performing teamsAre passionate about helping build reliable and performant cloud-native applicationsAct like an owner in every decision you takeTake challenges as opportunitiesEnable our customers to achieve their missionAre open with success, failure and learningsContinuously test, learn and improveOur ideal candidate has production experience with most of the following :
Kubernetes container orchestration platform with an understanding of the cluster lifecycle, API, Operator pattern, Helm charts, addons and components from the CNCF landscapeWriting Infrastructure-as-Code using Terraform against AWS, GCP considering internal and external modularisation, multi-environment branching strategies, state manipulation, working to multiple versionsProvisioning cloud infrastructure to AWS, GCP, through Kubernetes controllers such as Crossplane, ACK, KROBuilding cloud-native container and serverless applications in distributed architectures on Kubernetes and Service Meshes such as Istio, Linkerd, App MeshWriting Helm charts to package and deliver application stacks into productionDelivering applications and infrastructure continuously using CI / CD tools such as GitlabCI, CircleCI, Github Actions, and GitOps using ArgoCD, FluxCDTroubleshooting and debugging applications using Observability tooling across microservices and serverless applications such as Splunk, DataDogManaging ephemeral secrets and credentials using Hashicorp VaultManaging least privileged access to cloud resources using TPAM solutions such as Hashicorp BoundaryBonus Points for experience with :
Production experience architecting provisioning and testing strategies for multi-cluster Kubernetes deploymentsDevelopment and testing strategies for Terraform for ProductionProduction experience with advanced multi-cluster Service Mesh capabilities on KubernetesProduction experience developing Kubernetes OperatorsProduction experience with OSS API Gateways and Ingress controllersProduction Software Engineering using a high level language like Go, Java, JavaScript, PythonDistributed Software Architecture exposure in high volume production scenariosWorking with Data Mesh, BigData technologies such as EMR, Spark, DatabricksDesigning, tracking and testing to SLOs and Chaos Engineering to Error BudgetsImplementing Business Continuity (BCP) and Disaster Recovery (DRP) plans including tracking RTO and RPOCryptoCurrency domain knowledge and infrastructure experience running Currency Daemons, integrating provider APIsJob Benefits
How we work :
Hybrid working and the option to work from almost anywhere for up to 90 days per year500 Remote working budget to set up your home office spaceLearning & Development :
1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and developmentVacation / Leave :
Holidays : 25 days of annual leave + bank holidaysAn extra day for your birthdayEnhanced parental leave : we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave and leave.Benefits :
Private Health Insurance - we use Vitality!Full access to Spill Mental Health SupportLife Assurance : we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries100 cryptocurrency for you!Cycle to Work SchemeJ-18808-Ljbffr