Software Engineer, HPC Scheduling

Posted · Add Comment
Career Techniques Inc
Published
April 1, 2026
Location
Dallas, TX - Hybrid - 3 days/week in-office
Category
 
Job Type

Description

The HPC Scheduling team develops and manages a large high-performance compute (HPC) platform to enable the business to conduct complex research at scale. We are seeking a highly motivated person to join our team to help us continue to push the envelope running batch workloads on Kubernetes.

The ideal candidate will have an active interest in Kubernetes and batch computing, a broad range of experience with software engineering and development, as well as experience managing large-scale infrastructure and complex tooling environments.

The main focus will be on Armada - an exciting open source CNCF project built and maintained by the team - which is used to solve multi-cluster Kubernetes batch job scheduling at scale.

You’ll join an experienced team, working at the cutting-edge of ML workloads and at scale.

 

Responsibilities

  • Designing and developing high-quality software solutions using procedural programming languages, with a focus on Golang
  • Building and maintaining highly scalable, highly available and globally distributed systems to support large-scale research workloads
  • Managing and optimizing data interactions across relational and non-relational databases, particularly PostgreSQL
  • Developing and operating containerized applications within Kubernetes, ensuring effective orchestration and workload scheduling
  • Supporting, tuning and troubleshooting Linux-based systems as part of our core compute platform
  • Applying core networking knowledge to help debug, optimize and enhance platform connectivity and performance
  • Independently diagnosing and resolving complex technical issues across infrastructure and software layers
  • Applying solid software architecture principles, computer science fundamentals and data structure knowledge to guide design decisions and code quality
  • Driving continuous improvement by contributing to CI/CD pipelines and engineering best practices
  • Staying up to date with emerging technologies and approaches, and applying new knowledge across disciplines

 

Requirements

  • Experience with developing Kubernetes components, such as controllers and operators
  • Experience with event-driven programming and message queues, such as apache Kafka and Pulsar
  • Experience of high-performance computing, Kubernetes, or DAG (Directed Acyclic Graph) workflows
  • Experience of running systems at scale using a cloud provider, ideally AWS
  • Use of operational and runtime tools and practices, including monitoring and logging with systems such as Prometheus and Grafana
  • Experience of operating or using job scheduling systems, such as SLURM
  • Max. file size: 100 MB.
  • Please complete the math question to prove you are human.

Related Jobs

PMO Coordinator   Dallas, TX - Hybrid - 3 days/week in-office new
May 14, 2026
Observability Platform Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 6, 2026
Security Platform Operations Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 5, 2026
Network Automation Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 5, 2026
Cloud Engineer   New York, NY - Hybrid - 4 days/week in-office
April 28, 2026