Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

UnitaryLondon, London, United Kingdom
6 days ago
Job type
  • Full-time
Job description

SRE (Unitary AI)

Check you match the skill requirements for this role, as well as associated experience, then apply with your CV below.

Description

The company

We are a rapidly growing startup developing solutions that blend human expertise and AI agents to handle manual customer and marketplace operations tasks. Our unique approach combines the strengths of human expertise (high accuracy and nuanced decision-making) with the advantages of AI automation (speed and cost efficiency). This cutting-edge technology helps businesses solve real-world challenges in trust & safety and beyond without complex technical integration. We believe in an online world free from harm, where we can trust AI to make safe and fair decisions.

We have raised about $25M in VC funding from top tier funds including Creandum and Plural, and operate at significant scale - analysing millions of daily images and videos. But we are just at the beginning of our journey - and we are very excited about our plans for growth over the coming year and beyond!

The role

We are now looking for a Site Reliability Engineer to ensure our systems run smoothly and reliably at scale. Your expertise in monitoring, observability, and system automation will help maintain the high availability and performance our customers depend on. You will work at the intersection of development and operations, using your technical skills to build robust infrastructure and streamline deployment processes.

Your mission will be to proactively identify and resolve system issues before they impact our customers. You will collaborate closely with development teams to implement monitoring solutions, create comprehensive alerting systems, and develop the tools needed to maintain system reliability. Initially, you will focus on enhancing our existing monitoring and alerting infrastructure, then gradually build self-healing systems and self-service capabilities that empower teams to diagnose and resolve issues independently.

As part of this role, you will :

  • Design and implement comprehensive alerting systems that detect issues early and provide actionable insights to streamline the resolution of these issues.
  • Collaborate with our development teams to ensure our observability stack provides clear visibility into system health and performance.
  • Optimise on-call processes, including creating and maintaining detailed runbooks that enable efficient incident response and knowledge sharing across teams.
  • Build self-healing systems using AI tools that automatically resolve common issues before they require human intervention.
  • Develop automation tools and diagnostic capabilities that help teams quickly identify and resolve issues when manual investigation is required.
  • Ensure secure and reliable code deployment processes through robust CI / CD pipelines and infrastructure automation.
  • Join our 24 / 7 support rotation, which provides first-level platform support to ensure a great customer experience.

Requirements

We are looking for someone who is excited about building innovative solutions and wants to have a large impact in a smaller company; you will be a key part of defining Unitary’s future during this early stage of our new product strategy. We need versatile people who are happy to get stuck into whatever needs doing, and are ready to learn and grow with the company.

For this particular role, we need a collaborative engineer who excels at working across teams and can translate complex technical concepts into actionable solutions. You should be comfortable balancing your time between fixing urgent issues and investing in proactive system improvements. Communication is crucial, as you'll be working closely with multiple engineers and may need to coordinate during high-stress incident situations.

We would love to hear from you if :

  • Have worked with visualisation tools such as Grafana for creating and maintaining dashboards that provide meaningful insights into system performance
  • Are proficient with metrics platforms such as Prometheus, InfluxDB, or OpenTelemetry for collecting and analysing system data
  • Have experience with incident management tools such as Incident.io for coordinating response efforts and recording follow-up learnings and actions
  • Can demonstrate strong problem-solving skills and the ability to work autonomously
  • Are confident in writing production code in languages such as Go or Python
  • Thrive in a collaborative environment where group output and team achievements weigh heavier than individual input
  • It would be even better, but not essential, if you have :

  • Experience working in a fully remote, international team
  • Previous startup experience
  • Built Slack bots or similar automation tools to streamline team workflows
  • Experience with CI / CD platforms for building reliable deployment pipelines (e.g. GitLab CI, ArgoCD)
  • Worked with Kubernetes and infrastructure as code tools such as Terraform for scalable system deployment
  • Are familiar with MLOps practices and tools, and monitoring machine learning systems in production
  • This role will report to the VP of Engineering and can be based anywhere within a 3-hour time zone of the UK.

    Benefits

    About us

    The team

    Unitary is a remote-first team of c. 20 people spread across Europe and North America who are fiercely passionate about making the internet a safer place, and deeply motivated to become a force for good. We have an ambition to create a company filled with happy, kind and collaborative people who achieve extraordinary things together. Our culture is built around the power of trust, transparency and self-leadership.

    Working at Unitary

    We are committed to creating a positive and inclusive culture built on genuine interest in each other's well-being. We offer progressive and market-leading benefits, including :

  • Flexible hours and location
  • Competitive salary and equity package
  • Occupational pension
  • Generous paid parental leave
  • Generous paid sick leave
  • Annual budget for your professional development and growth
  • Annual budget for your individual health and wellness
  • Three team offsites to London or other exciting destinations in Europe
  • Create a job alert for this search

    Site Reliability Engineer • London, London, United Kingdom

    Related jobs
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    NinjaOne, LLCLondon, England, United Kingdom
    Full-time
    At NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work.We are currently looking for a. SRE team in the Platform Engineering organization and he...Show moreLast updated: 8 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    AnnapurnaLondon, London, United Kingdom
    Full-time
    Is this the next step in your career Find out if you are the right candidate by reading through the complete overview below. Location : London Hybrid (3 days office).Annapurna is working on behalf of...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ZipRecruiterLondon, England, United Kingdom
    Full-time
    Site Reliability Engineer (SRE) - Kubernetes, Observability, Prometheus, Dynatrace, OpenTelemetry.This is a fantastic opportunity with a consulting company seeking to fill multiple SRE roles.You wi...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Palantir TechnologiesLondon, United Kingdom
    Full-time
    Palantir builds the world’s leading software for data-driven decisions and operations.By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    GSRLondon, England, United Kingdom
    Full-time
    Founded in 2013, GSR is a leading market maker and programmatic trading firm in the fast-evolving world of cryptocurrency trading. With over 200 employees across seven countries, we provide billions...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Third RepublicLondon, England, United Kingdom
    Permanent
    Our client helps companies build highly complex mathematical models from available limited datasets in order to help them make key business decisions. With increased demand and growth at the forefro...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Quality Control Specialist - Pest ControlLondon, England, United Kingdom
    Full-time
    This is a rare opportunity to join a fast-growing firm at the forefront of the digital asset ecosystem, working on cutting-edge infrastructure and tooling with a global, remote-first team.Maintain ...Show moreLast updated: 6 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    DuffelLondon
    Full-time
    Create the future of travel with us.Whether it’s to visit the people closest to us, starting an exciting adventure, or a career-defining business trip, travel is an essential part of our lives.Yet ...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Explore GroupLondon, England, United Kingdom
    Full-time +1
    Direct message the job poster from Explore Group.Scaling Tech Teams Across the UK & Europe.I’m hiring for a fantastic opportunity with a fast-growing, forward-thinking company in the RegTech / FinTec...Show moreLast updated: 23 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Oscar TechnologyLondon, England, United Kingdom
    Temporary
    Social network you want to login / join with : .We're working with a fast growing client undergoing rapid expansion, looking for an experienced Site Reliability Engineer (SRE) to join them on a 6-month...Show moreLast updated: 17 hours ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Orgvue LimitedLondon, England, United Kingdom
    Full-time
    Orgvue is an organisational design and planning platform that empowers your business to transform its workforce by understanding the work people do and the skills they have.Our platform connects st...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    AudioStackLondon, England, GB
    Full-time
    Quick Apply
    We’re on a mission to democratize audio creation by building world-class audio infrastructure for our customers.As a Site Reliability Engineer, you’ll play a key role in improving our platform's de...Show moreLast updated: 26 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    XcedeLondon, London, United Kingdom
    Full-time
    A technology-focused, multi-strat investment firm, operating at the cutting edge of their industry, is looking for a Site Reliability Engineer to join their highly skilled, innovative team.Make sur...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    UnitaryLondon, England, United Kingdom
    Full-time
    Get AI-powered advice on this job and more exclusive features.This range is provided by Unitary.Your actual pay will be based on your skills and experience — talk with your recruiter to learn more....Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    Attio LtdLondon, England, United Kingdom
    Full-time
    Attio is the CRM for the next generation of businesses.We're transforming the world’s largest software category from the ground up, building the new foundation that will define how companies operat...Show moreLast updated: 30+ days ago
    Site Reliability Engineer

    Site Reliability Engineer

    Liberty Charge Ltd.London
    Full-time
    At Believ, formerly known as Liberty Charge, we believe sustainable transport should be accessible to everyone.We’re a Charge Point Operator (CPO) on a mission to create the UK’s most reliable elec...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    CurveLondon, England, United Kingdom
    Full-time
    Social network you want to login / join with : .Curve was founded with a rebellious spirit, and a lofty vision; to truly simplify your finances, so you can focus on what matters most in life.That’s why...Show moreLast updated: 14 days ago
    Site Reliability Engineer

    Site Reliability Engineer

    ThreddLondon, England, United Kingdom
    Full-time
    Are you passionate about building reliable, scalable, and high-performing systems? Do you thrive on solving complex infrastructure challenges while driving automation and observability best practic...Show moreLast updated: 17 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    PythianLondon, England, United Kingdom
    Full-time
    Get AI-powered advice on this job and more exclusive features.Europe (UK, Macedonia, Poland, Romania, Spain) | Remote | Work from home. At Pythian, we are experts in strategic database and analytics...Show moreLast updated: 3 days ago
    • Promoted
    • New!
    Site Reliability Engineer

    Site Reliability Engineer

    Certain AdvantageLondon, England, United Kingdom
    Permanent
    Certain Advantage are recruiting on behalf of our prestigious Financial Services client for an SRE Engineer in their AWS DB team who support numerous native DBs like RDS / Aurora / Neptune plus Cockroa...Show moreLast updated: 21 hours ago