The firm is developing a cutting-edge high-performance computing (HPC) platform to support our portfolio managers, developers, quantitative analysts, and data scientists, enabling seamless scaling of compute capabilities both on-premise and in the cloud. We seek a senior, hands-on engineer who is customer-focused and an advocate for customer-driven solutions. The ideal candidate will have an understanding of physical and cloud-based infrastructure, strong experience in automating infrastructure and programming, proficiency in service and infrastructure lifecycle management. They will engage with teams to understand their requirements, drive automation and development for our HPC platforms, and collaborate with other teams for integration. The candidate should also have expertise in Linux systems administration, container orchestration, networking, security, and infrastructure-as-code. Experience integrating, testing, and optimizing the integration of HPC with storage, data platforms and tuning Linux / hardware for demanding workloads is a bonus.
Principal Responsibilities
- Developing and automating components and services in our platform from customer needs
- Collaborate within a customer-focused team to design, develop, test, and deploy infrastructure and services in alignment with business needs.
- Serve as a Subject Matter Expert (SME) for our Platform offering and customer needs
- Customize and develop home grown or off the shelf solutions to meet customer needs
- Optimize performance, reliability and delivery of our Platform and Services.
Qualifications/Desired Skills
- Significant experience in Programming Languages for Automation and Tooling.
- Working knowledge of containers and container orchestration
- Experience contributing to and collaborating on a shared code base
- Experience with configuration management and automation tools, such as Chef, Ansible, Salt, Packer
- Experience with building monitoring and alerting on logs and metrics
- Excellent written and verbal communications
- Excellent troubleshooting and analytical skills
- Self-starter able to execute independently, on a deadline, and under pressure
Nice To Have Skills
- Experience with HPC Schedulers (such as Slurm, LSF, Ray, MOAB, PBS, etc)
- Deep understanding of Linux operating systems, with substantial practical experience in performance tuning and resource fencing, specifically related to HPC workloads.