Career Techniques Inc
Description
Reporting to senior security leadership, you will own the full incident lifecycle — from detection and triage through resolution, root cause analysis, and post-incident review — and drive continuous improvement across the firm's ITSM ecosystem. You will lead and grow a cross-functional incident response function, working closely with Engineering, Data Center Operations, Security Engineering, and external vendors to minimize business impact and build long-term operational resilience at scale.
Responsibilities
- Define and own the enterprise incident response strategy, ensuring alignment with the firm's security and operational objectives
- Lead and develop the incident response team, establishing clear roles, career paths, and a high-performance culture within the centralized security org
- Serve as the executive escalation point for all high-severity and security incidents, providing decisive leadership under pressure
- Establish and continuously improve incident and problem management processes, driving consistent execution and operational discipline across teams
- Own the full post-incident review process, ensuring root cause analysis is rigorous, documented, and translates into permanent remediation
- Build and maintain a Known Error Database and runbook library to accelerate response and reduce mean time to resolution
- Analyze incident trends, KPIs, and performance data (MTTR, SLA, recurrence rates) to identify systemic risk and drive proactive improvement
- Partner with Engineering, Change Management, and Security Engineering to implement fixes and harden infrastructure against repeat incidents
- Develop executive-level reporting on incident posture, response performance, and improvement initiatives for senior leadership and stakeholders
- Champion the adoption of ITSM tooling (Jira Service Management) and incident response best practices across the organization
Requirements
- Bachelor’s Degree or equivalent experience
- 10+ years of experience in IT Service Management or Security Operations, with significant ownership of Incident and/or Problem Management
- 5+ years in a people leadership role, with a track record of building and scaling high-performing teams
- Proven experience leading major incident response in high-availability or mission-critical environments — HPC, cloud, or large-scale data center experience strongly preferred
- Strong command of incident lifecycle management, escalation frameworks, and service restoration strategy
- Deep experience driving root cause analysis and long-term remediation programs
- Hands-on experience with Jira Service Management or comparable enterprise ITSM tooling
- Exceptional communication and stakeholder management skills, with the ability to operate effectively at the executive level and across technical teams
- Strategic thinker with strong analytical skills and a data-driven approach to operational improvement
- ITIL certification or equivalent framework expertise preferred
