Published

February 5, 2025

Location

Dallas, TX - 5 days/week In-Office

Description

As a Kubernetes Site Reliability Engineer (SRE) with a specific focus on private cloud/on-prem environments, you will design, implement, and maintain highly reliable and scalable Kubernetes clusters in private cloud and on-premise environments. You will work closely with developers, infrastructure engineers, and security teams to ensure that these clusters meet performance, security, and compliance standards.

Responsibilities:

Kubernetes Architecture and Deployment: Design, implement, and manage Kubernetes clusters in private cloud/on-prem environments, optimizing for performance and scalability.
Infrastructure Optimization: Continuously evaluate and enhance on-prem Kubernetes infrastructure to ensure high performance and efficient resource utilization.
Automation and Deployment: Use Helm charts and other automation tools to streamline the deployment process and ensure consistent builds.
Monitoring and Troubleshooting: Actively monitor the health of Kubernetes clusters, quickly troubleshoot issues, and minimize downtime.
Security and Compliance: Ensure robust security measures, including proper implementation of TLS/SSL certificates and mutual TLS, and compliance with industry standards.
Documentation and Reporting: Maintain comprehensive technical documentation for private cloud/on-prem Kubernetes environments, including architecture diagrams and operating procedures.

Requirements:

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
3-4 years of hands-on experience with Linux, containerization, and orchestration tools (e.g., Docker, Podman, Kubernetes).
Proven experience managing Kubernetes clusters in private cloud/on-prem environments.
Proficiency with TLS/SSL certificates and mutual TLS.
Strong understanding of Kubernetes resources like Deployments, StatefulSets, ConfigMaps, and RBAC.
Familiarity with observability and GitOps practices.
Strong problem-solving and analytical skills, with a focus on resolving complex infrastructure issues.
Excellent communication and collaboration skills, with experience working in cross-functional teams.
Ability to adapt quickly in a fast-paced, evolving technological landscape.
Experience with Service Mesh tools (e.g., Linkerd, Istio).
Familiarity with DevOps tools (e.g., Jenkins, Ansible, Terraform).
Knowledge of storage and networking solutions for on-prem environments
Self-motivated, with the ability to work independently.
Strong team player with a collaborative approach.
Detail-oriented, with a focus on operational excellence and process documentation.
Adaptable and able to manage shifting priorities in a fast-paced environment.

Apply Online

Related Jobs

IAM Manager Dallas, TX - 5 day/week in-office new

April 23, 2025

Cloud Infrastructure Architecture Manager Dallas, TX - 5 days/week In-Office

April 9, 2025

Senior Network Engineer Dallas, TX - 5 days/week In-Office

April 9, 2025

Lead Data Engineer Dallas, TX - 5 Days/week In-Office

April 8, 2025

NetSuite Developer USA, Remote

March 18, 2025