Command Center Incident Manager
We are looking for a dynamic, energetic, and bright individual to join our Technical Operations team. This is a tremendous and unique opportunity to join a nimble global company that has achieved significant scale over the past fifteen years, yet still possesses enormous growth potential. You will play an instrumental role in achieving this growth.
What You Will Be Doing
As part of the global Command Center, you will play a key role in the performance and stability of our product infrastructure and platforms across all data center and business locations. This team is committed to delivering the highest system uptime and operations transparency. You will be performing critical duties and have essential functions within our incident, event and problem management processes and you must be able to demonstrate that you can stay composed, focused, and effective under pressure.
Roles & Responsibilities:
- Monitoring Server infrastructure, bandwidth utilization and website up-time through various monitoring applications and action alerts appropriately following the agreed OLA.
- Manage & drive restoration efforts for all IT incidents by guiding technical teams to execute timely resolutions. To use necessary escalation channels whenever appropriate to achieve resolution of incidents within the agreed service level agreement.
- Coordinate and manage communication bridges with intelligence and authority. Maintains bridge commander presence throughout the event.
- Provide timely, succinct and clear written and verbal communication to all stakeholders during internal crisis events including delivery of a written Service Interruption Report within 24 hours of service restored declaration.
- Manage ticket life-cycle of all major incidents ensuring adherence of pre-defined incident & problem management process flow. Should also be able to assess data, identifying gaps, trends and inaccuracies, and turn that data into actionable outcomes and opportunities
- Track, report and manage all follow-up actions for timely closure including procedure, process, training, technology and people actions associated with improving services.
- Mentor team members on industry best practices of IT Service Management and help drive standard processes, training, and responsiveness for internal crisis events.
- Completes daily, weekly and month-end reporting and analysis of key performance indicators for technical leadership team, including analysis of key performance metrics to help management evaluate success of programs and projects.
- Review all the scheduled changes and ensure all the activities designed to implement the change are as per the standards containing scheduled start/end time, affected components, impact to users/customers, roll back time & procedure.
- As a owner of problem management process, you are responsible to act as the liaison with teams responsible for problem resolution, ensure the problems are resolved within the agreed SLA, coordinate major problem review and closure.
- Team player showing genuine commitment, readily available to support fellow team members and mentor them when needed. Should liaison with different functional groups and business units seamlessly.
- Strong communication (English) skills are particularly important for this role and able to translate messages and information to people at all levels. Must be able to write concise, internal-customer, documents that anticipate and answer executive level questions after an outage or internal crisis event.
- Experience in IT Service Management especially Incident, Change and Problem Management processes and procedures. Exposure to monitor network, server and other infrastructure services is preferred. Hands on experience with Zabbix, Nagios, Jira is highly desirable.
- Excellent working knowledge of best practices such as ITIL or other equivalent programs (COBIT, PRINCE) in IT Service Management.
- Posses methodological mind. Incident Managers need to use systematic methodology to evaluate, design and implement process or technology change to achieve measurable business benefits.
- A Problem solver, An Incident Manager must be adept at finding solutions to problems and trialing different ways to find a resolution.