Career Techniques Inc
Description
As a Kubernetes Site Reliability Engineer (SRE) with a specific focus on private cloud/on-prem environments, you will design, implement, and maintain highly reliable and scalable Kubernetes clusters in private cloud and on-premise environments. You will work closely with developers, infrastructure engineers, and security teams to ensure that these clusters meet performance, security, and compliance standards.
Responsibilities:
- Kubernetes Architecture and Deployment: Design, implement, and manage Kubernetes clusters in private cloud/on-prem environments, optimizing for performance and scalability.
- Infrastructure Optimization: Continuously evaluate and enhance on-prem Kubernetes infrastructure to ensure high performance and efficient resource utilization.
- Automation and Deployment: Use Helm charts and other automation tools to streamline the deployment process and ensure consistent builds.
- Monitoring and Troubleshooting: Actively monitor the health of Kubernetes clusters, quickly troubleshoot issues, and minimize downtime.
- Security and Compliance: Ensure robust security measures, including proper implementation of TLS/SSL certificates and mutual TLS, and compliance with industry standards.
- Documentation and Reporting: Maintain comprehensive technical documentation for private cloud/on-prem Kubernetes environments, including architecture diagrams and operating procedures.
Requirements:
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.
- 3-4 years of hands-on experience with Linux, containerization, and orchestration tools (e.g., Docker, Podman, Kubernetes).
- Proven experience managing Kubernetes clusters in private cloud/on-prem environments.
- Proficiency with TLS/SSL certificates and mutual TLS.
- Strong understanding of Kubernetes resources like Deployments, StatefulSets, ConfigMaps, and RBAC.
- Familiarity with observability and GitOps practices.
- Strong problem-solving and analytical skills, with a focus on resolving complex infrastructure issues.
- Excellent communication and collaboration skills, with experience working in cross-functional teams.
- Ability to adapt quickly in a fast-paced, evolving technological landscape.
- Experience with Service Mesh tools (e.g., Linkerd, Istio).
- Familiarity with DevOps tools (e.g., Jenkins, Ansible, Terraform).
- Knowledge of storage and networking solutions for on-prem environments
- Self-motivated, with the ability to work independently.
- Strong team player with a collaborative approach.
- Detail-oriented, with a focus on operational excellence and process documentation.
- Adaptable and able to manage shifting priorities in a fast-paced environment.