Neo4j

Software Engineer - Site Reliability Engineering

Company
Location
London Area, United Kingdom
Posted At
6/18/2025
Advertise with us by contacting: [email protected]
Description
About Neo4j

Neo4j is the leader in Graph Database & Analytics, helping organizations uncover hidden patterns and relationships across billions of data connections deeply, easily, and quickly. Customers use Neo4j to gain a deeper understanding of their business and reveal new ways of solving their most pressing problems. Over 84% of Fortune 100 companies use Neo4j, along with a vibrant community of 250,000+ developers, data scientists, and architects across the globe.

At Neo4j, we’re proud to build the technology that powers breakthrough solutions for our customers. These solutions have helped NASA get to Mars two years earlier, broke the Panama Papers for the ICIJ, and are helping Transport for London to cut congestion by 10% and save $750M a year. Some of our other notable customers include Intuit, Lockheed Martin, Novartis, UBS, and Walmart.

Neo4j experienced rapid growth this year as organizations looking to deploy generative AI (GenAI) recognized graph databases as essential for improving it’s accuracy, transparency, and explainability. Growth was further fueled by enterprise demand for Neo4j’s cloud offering and partnerships with leading cloud hyperscalers and ecosystem leaders. Learn more at neo4j.com and follow us on LinkedIn.

Our Vision

At Neo4j, we have always strived to help the world make sense of data.

As business, society and knowledge become increasingly connected, our technology promotes innovation by helping organizations to find and understand data relationships. We created, drive and lead the graph database category, and we’re disrupting how organizations leverage their data to innovate and stay competitive.

The Team

The Site Reliability Engineering team’s mission is to improve the reliability of Neo4j’s DBaaS product: Neo4j Aura. Operating at global scale across all three major cloud providers, Aura runs hundreds of Kubernetes clusters and hosts thousands of Neo4j instances in production at any given time.

We’re reshaping what SRE means at Neo4j Aura—and we want you to be part of that journey.

Rather than firefighting or chasing alerts, we’re helping teams design for reliability from day one. That means building the tools, practices, and culture that embed SRE principles at the heart of how Aura operates. You’ll be joining a team focused on long-term resilience, engineering excellence, and meaningful collaboration with product teams.

The Role

  • Automate for insight and scale: Build systems that make troubleshooting fast, safe, and scalable across thousands of Neo4j instances. From internal tools that surface clear insights to canaries that support safe rollouts, you’ll focus on automation that elevates reliability engineering.
  • Treat operations as a software problem: Replace tribal knowledge and ad-hoc scripts with tools and systems that codify best practices—making operations predictable, scalable, and repeatable.
  • Design for resilience, learn from failure: Own and evolve the tooling and processes behind incident response. From clear alerts to blameless reviews, you’ll help ensure teams respond with confidence and learn with clarity.
  • Champion reliability as a product feature: Help teams define and act on SLIs and SLOs, turning reliability into a shared, data-driven priority across engineering.
  • Create signals, not noise: Shape an observability stack that tells us what matters, when it matters—so we can detect issues early and resolve them quickly.

We're interested in hearing from Engineers with deep experience in some of the following areas

  • Writing backend tools and automation in Go—our primary language—with an emphasis on sound architecture, testing, and maintainability. Strong software skills in other languages, like Python, are also welcome.
  • Applying SRE practices in real-world environments: defining SLIs and SLOs, reducing toil through automation, and driving reliability through engineering.
  • Collaborating with other teams to promote SRE thinking—educating on principles like observability, ownership, and service level objectives.
  • Troubleshooting large-scale, cloud-based systems with confidence and curiosity.
  • Monitoring distributed systems and understanding their performance characteristics.
  • Designing systems with reliability, safety, and debug-ability as first-class concerns.
  • Working with observability tools like OTel Collector, Prometheus, Grafana, and Google Cloud’s operations suite.
  • Deploying and managing applications on Kubernetes; cluster-level administration is a plus.
  • Managing infrastructure with Kustomize and Terraform—keeping it clear, modular, and easy to evolve.
  • Building and maintaining CI/CD workflows—ours run on GitHub Actions.
  • Participating in on-call rotations and incident response with a focus on improvement, not blame.
  • Writing and contributing to postmortems that lead to meaningful, lasting changes.

Advertise with us by contacting: [email protected]
logo
Hunt UK Visa Sponsors

Copyright © 2025

About us

How does it workContact UsBlog

Stay up to date

TwitterTelegram
Software Engineer - Site Reliability Engineering | Neo4j | Hunt UK Visa Sponsors