Kubernetes Engineer

Posted · Add Comment
Career Techniques Inc
Published
April 1, 2026
Location
Dallas, TX - Hybrid 3 days/week In-Office
Category
 
Job Type

Description

Hybrid - 3 days/week in-office

USC and GC Preferred

Relocation Assistance Available!

 

In this role, you will design, implement, and optimize GPU-accelerated container platforms at scale, enabling high-performance workloads (AI/ML, HPC, LLM training) across hybrid or on-prem environments.

You will have deep expertise with both NVIDIA and Kubernetes ecosystems, including GPU scheduling, device plugins and custom operators.

Key responsibilities of the role include:

  • Architecting and operating Kubernetes clusters optimized for GPU workloads, leveraging NVIDIA GPU Operator, Network Operator and DCGM
  • Developing, deploying and maintaining custom Kubernetes operators and controllers to automate infrastructure services
  • Integrating NVIDIA device plugins, Multi-Instance GPU (MIG) and GPU sharing features into the scheduling layer
  • Optimizing GPU utilization and job placement through scheduler extensions, such as kube-scheduler plugins, Slurm and Volcano
  • Collaborating with HPC, ML and DevOps teams to ensure multi-tenant, high-throughput cluster performance
  • Driving observability and telemetry integrations using Prometheus, Grafana, DCGM Exporter and OpenTelemetry
  • Implementing secure multi-user and multi-namespace GPU isolation, with RBAC and policy enforcement, such as OPA or Gatekeeper
  • Maintaining CI/CD pipelines for Kubernetes infrastructure using GitOps, ArgoCD and FluxCD
  • Contributing to infrastructure-as-code, using Terraform, Helm, and Kustomize
  • Participating in performance tuning, incident response and production readiness reviews
  • Max. file size: 100 MB.
  • Please complete the math question to prove you are human.

Related Jobs

Observability Platform Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 6, 2026
Security Platform Operations Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 5, 2026
Network Automation Engineer   Dallas, TX - Hybrid - 3 days/week in-office
May 5, 2026
Cloud Engineer   New York, NY - Hybrid - 4 days/week in-office
April 28, 2026
Technology Talent Acquisition Consultant   New York, NY - Hybrid (3 days/week in-office)
April 22, 2026