About us
Source is a research-led advisory firm that helps the world’s largest professional services firms make their most important decisions.
With a wealth of independent insight, knowledge, and experience in the industry, Source delivers clear-cut direction that gives firms and their leaders the confidence to act.
As our AI & Data Engineer, you will be instrumental in enabling us to build robust, deep, and valuable data through advanced analytics and AI-driven capabilities.
The role
Our core asset is our data, and we are looking for a specialist who can not only maintain our high-standard data infrastructure while our Lead Data Engineer is on paternity leave but also accelerate our evolution into an AI-first organization.
This role is a unique hybrid of stability and innovation. You will ensure our existing pipelines remain robust while leading the charge on AI improvements to our internal operations, systems, and client-facing products.
You will be key to helping us extract new insights, provide deeper analysis, and enable AI-driven self-service capabilities for our internal and external users.
Key facets of this role
- AI Integration & Innovation: Design and deploy AI-driven features to automate internal operations and enhance our qualitative/quantitative research assets.
- Vector Infrastructure: Build and maintain vector databases and RAG (Retrieval-Augmented Generation) pipelines to unlock the value of our unstructured data.
- Pipeline Evolution: Transform existing ETL/ELT processes into AI-ready pipelines, ensuring data quality for machine learning training and inference.
- System Maintenance: Provide interim stewardship of our core data platform, ensuring uptime and performance while the Lead Data Engineer is away.
Technical Mentorship: Act as the internal subject matter expert, upskilling the broader team on MLOPs and AI data best practices. Operational AI: Implement agentic workflows or automated insights to turn raw data into "AI-driven self-service" capabilities for our global clients.
The type of person we need in this role
This role can only be done effectively by someone who:
- Experience: 4+ years in Data Engineering, with at least 2 years focused on AI/ML implementation (LLMs, NLP, or predictive modeling).
- AI Toolkit: Proven experience with Vector Databases (e.g., OpenSearch, CosmosDB, Milvus) and frameworks like LangChain or LlamaIndex.
- Core Engineering: Deep proficiency in Python and PostgreSQL.
- Big Data & Ops: Hands-on experience with Apache Spark (PySpark) and workflow orchestration (e.g., Airflow, Prefect, or Dagster).
- Cloud & Warehouse: Extensive experience with a major cloud provider (AWS/Azure/GCP) and modern warehouses like Snowflake, Redshift, or BigQuery.
- DevOps Mindset: Proficient with Git, CI/CD and the operationalisation of ML models (MLOps).
- Adaptability: The ability to step into a leadership gap, manage existing priorities, and pivot quickly toward innovation.
The qualities we’re looking for
- Problem-Solver: A proactive and analytical mindset, with the ability to diagnose and solve complex data and AI/ML infrastructure challenges.
- Collaborative & Enabling: Excellent communication and interpersonal skills, with a strong desire to teach, mentor, and share expertise effectively with Data Analysts, the Senior Data Engineer, and other stakeholders.
- Detail-Oriented: Meticulous attention to data quality, integrity, and pipeline robustness.
- Adaptable: Eagerness to learn new technologies and adapt to evolving ML/AI landscapes.
- Impact-Driven: A desire to contribute directly to the success of data-driven products and business outcomes, particularly in enabling new insights and self-service capabilities.