Operational Resiliency Specialist
Building the Internet of Money
Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.
What makes us different? Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the adoption of cryptocurrency so the world can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.
Before you apply, please read the Kraken Culture Explained to learn more about our internal culture, values, and mission.
As a fully remote company, we have Krakenites in 60+ countries who speak over 50 languages. Krakenites are industry pioneers who have a long track record of building premium products for professionals and institutions as well as newcomers to the space. Kraken is committed to industry-leading security through our products like Kraken Pro, Kraken NFT, and Cryptowatch, with a focus on world-class customer support and crypto education for all.
Become a Krakenite and build the internet of money!
About the Role
The global Operational Resiliency (OpR) Team supports the security, availability, and durability of one of the leading cryptocurrency exchanges in the world.
Kraken is seeking an experienced candidate to join a team of specialists owning operational resiliency initiatives. As an Operational Resiliency Specialist, you will collaborate across multiple business units driving and overseeing all aspects of Change Management, Release Management, and Incident Management. This role will also assist in the development and enhancement of existing processes and procedures including monitoring, alerting, and procedural documentation maintenance.
This is a shift-based role based in the Pacific Time Zone (PT). A non-negotiable requirement for this role is the ability to regularly support the Pacific Time Zone (PT) between the hours of 11 am PT to 7 pm PT (6 pm UTC to 2 am UTC). Shifts will be arranged well in advance, in accordance with other team member schedules, and will require ~18 hours of weekend shift work per month.
- Support the efforts of keeping one of the fastest growing companies in the world up and available in a 24/7 environment
- Drive daily release planning activities including ensuring changes are well-documented and daily stand-ups and change windows are effectively coordinated
- Work with stakeholders to routinely review incident response playbooks, maintain escalation flow schedules, and participate in table top exercises
- Develop and implement communications plans for incidents and maintenances with the Communications team
- Act as watchdogs to monitor system dashboard for health, uptime, and availability and working closely with the Client Engagement Team to identify issues early on
- Identify areas lacking visibility for monitoring improvement efforts
- Inform automation efforts to further enhance monitoring and alerting capabilities
- Guide unplanned incidents from alert, response, resolution, and post-mortems with affected teams
- Work closely with Technical Project Management, Product, and other stakeholders to hand-off items needing remediation and identify long-term improvement strategies
- 3+ years as a project manager, scrum master, incident responder, release coordinator, or similar IT service management coordination function
- Excellent oral and written communication skills
- Strong understanding of the software development lifecycle including the importance of testing and rollback planning practices
- Highly responsive and extremely organized with the ability to direct the flow of a highly available technical environment that operates 24/7/365
- Experience translating business requirements into technical specifications
- Expertise with Agile, Scrum and Kanban methodologies
- Highly proficient in designing and configuring Jira workflows
- Agile and Project Management Certifications strongly preferred: PMP, PMI-ACP, ITIL, etc.
- Prior experience setting up incident response monitoring and alerting schedules is a plus
- Self starter