Data Reliability Engineer II (dRE)
Role Overview
As a
Data Reliability Engineer II, you will play a crucial, hands-on role in our global DRE team ensuring our critical database systems are reliable, fast, and scalable. Moving away from traditional, reactive database administration, our mission is to proactively maintain and enhance the reliability, integrity, and performance of our data footprint.
You will help manage a diverse, modern GCP data ecosystem—spanning relational, NoSQL, and analytical platforms (with a focus on PostgreSQL via Cloud SQL and AlloyDB, alongside Spanner, Cassandra, Firestore, Bigtable, and BigQuery) - while ensuring smooth operations, performance tuning, and automated deployments.
As our team embraces a cloud-first mindset, you will be encouraged to move beyond a specialized database mindset to become a broad generalist, developing skills across modern data technologies.
Accountabilities
- System Health & Proactive Incident Management: Monitor database system health and participate in a follow-the-sun incident response rotation.
- Respond to and resolve database-related incidents, diagnose issues, and conduct root-cause analysis to prevent recurring problems.
- Observability, Performance & Optimization: Monitor and tune top resource-consuming patterns and critical queries proactively so databases never become a bottleneck.
- Track performance metrics to ensure database speed and uptime meet the practical needs of our applications.
- Work closely with development teams to better understand application workloads and optimize database code within an application context.
- DevOps, Provisioning & Automation: Provision databases and schemas according to application requirements.
- Actively apply Infrastructure as Code (IaC) principles and CI/CD pipelines to manage database structures across environments.