Join Us at SRT Marine Systems as a System Monitoring & Observability Engineer (Prometheus / Grafana)
Job Title: System Monitoring & Observability Engineer (Prometheus / Grafana)
Location: 1 day / week in Cardiff office
Job Type: Contract, Hybrid, Full-Time
Duration: 6 months
Status and Rate: Outside of IR35, day rate.
SRT Marine Systems plc (SRT) are a market leader in its domain of international marine surveillance technology and systems. We are respected, established and an ambitious multi-national company headquartered in the UK with a global customer base.
The company has a global impact in the marine domain by leading the next generation of maritime domain awareness technologies, products and systems that significantly enhance, security, safety and environment protection and sustainability. Our customers are worldwide and range from the largest national coast guards to individual vessel owners.
SRT is an exciting company where high quality results are rewarded. We are ambitious and are constantly seeking to innovate to deliver better products and services to our customers. We strive to make SRT a rewarding and challenging place to work where talented hard-working individuals have the opportunity to make a real impact across the marine world.
About The Role
We are seeking a skilled engineer to implement an end-user observability visualisation. We already have observability dashboards in place for use by our engineers, implemented using Prometheus for metrics collection and Grafana for visualisation. This initiative is expected to build on that stack to provide a more user-friendly observability solution for end-users of our system.
Our clients are in different countries around the world with varying WAN capabilities and our system is physically distributed in-country on-prem across several sites. You will be supported by a wealth of experienced engineers, including UX designers. Our lead observability engineer and a UX expert will provide guidance as needed.
What You’ll Be Doing
- Monitoring & Metrics Collection
- Design, configure, and maintain Prometheus-based monitoring solutions.
- Develop and manage metric exporters for application and system-level data.
- Optimize Prometheus scraping configurations and retention policies.
- Alerting & Incident Response
- Define and maintain alert rules based on SLIs/SLOs and performance baselines.
- Ensure alerts are actionable, with minimal false positives.
- Participate (not necessarily lead) in on-call rotations and incident postmortems.
- Observability Dashboards
- Design and maintain Grafana dashboards for real-time operational insights.
- Collaborate with engineering and product teams to create tailored visualisations.
- Provide self-service dashboard capabilities for end users.
- System Performance & Reliability
- Monitor infrastructure (servers, containers, databases, services) for uptime, latency, and throughput.
- Identify bottlenecks and recommend improvements.
- Platform Maintenance & Automation
- Keep the platform maintainable, easily configurable, and fully automatable.
- Enable simple redeployments and configuration changes with minimal effort.
What You’ll Bring
- Proven experience with Prometheus (including PromQL) and Grafana in production environments.
- Strong knowledge of Linux-based systems.
- Experience writing and optimizing PromQL queries for alerts and dashboards.
- Familiarity with Prometheus exporters (e.g. node_exporter, blackbox_exporter, custom exporters).
- Understanding of alertmanager configuration and routing.
- Proficiency with Grafana dashboard creation and templating.
- Strong troubleshooting skills for infrastructure and application issues.
- Strong ability and motivation to quickly learn and master new technologies and frameworks.
- Familiarity with containers (Docker).
- Scripting skills (Bash, Python, or Go) for automation.
Our Values at SRT Marine
Ambition – Aspiring to lead in maritime domain management.
Innovation – Driving improvement through creativity and forward-thinking.
Quality – Committing to high standards in performance and reliability.
Responsibility – Being individually accountable and team-driven.
Team – Collaborating openly with colleagues, partners, and customers.
Why Join Us?
- Work on mission-critical maritime surveillance systems used worldwide.
- Be part of an ambitious, innovative, and supportive team.
- Make a direct impact on global maritime safety and sustainability.
- Enjoy flexible hybrid working.
- Competitive salary and benefits, including:
- Matched pension contributions up to 5%
- 25 days annual leave (rising to 28 with service)
- Private health care
- Flexible working opportunities
- Development and training programmes
SRT Marine plc is an equal opportunity employer. We are committed to creating an inclusive environment for all employees and welcome applications from all backgrounds.