Kraken – Site Reliability Engineer, Data Platform

Building the Future of Crypto

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here.

As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Desktop, Wallet, and Kraken Futures.

Become a Krakenite and build the future of crypto!

Proof of Work

The Team

Join our Data Infrastructure team and play a pivotal role in upholding the reliability, scalability, and efficiency of our robust Data platform. As a Senior Site Reliability Engineer (SRE) specialized in Data Infrastructure, you will collaborate closely with diverse cross-functional teams to conceive, execute, and oversee the foundational data infrastructure that empowers our array of applications and services.

As a key member of our Data Infrastructure team, you will:

Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations.
Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage.
Provide a state-of-the-art suite of BI tools for multiple teams within the company.
Guarantee the availability, high performance, scalability and cost efficiency of our data platform.

Your proficiency in cloud technologies, infrastructure as code, automation, monitoring, logging, user and machine AuthNZ, and certificate management will be instrumental in upholding the exceptional operational standards we set for our services.

The Opportunity

Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
Develop and maintain automation scripts using bash/shell scripting and to automate operational tasks and deployments
Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
Manage and implement role-based access control (RBAC) and permissions for a multitude of user groups and machine workflows across different environments
Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium Change Data Capture (CDC)
Ensure the timely and accurate processing of streaming data, enabling data analysts and engineers to gain insights from up-to-date information
Utilize Kubernetes to manage containerized applications within the data infrastructure, ensuring efficient deployment, scaling, and orchestration
Implement effective incident response procedures and participate in on-call rotations
Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement appropriate solutions
Document architecture, processes, and best practices to enable knowledge sharing and support continuous improvement
Support AI/ML teams with their infra requests

Skills You Should HODL

Proven experience (5+ years) working as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles, with a focus on data infrastructure and security
Experience with maintaining real-time data processing technologies, such as Kafka and Flink clusters and Debezium instances
Working experience in managing hybrid multi-tenant cloud systems particularly on AWS
Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
Experience with containerization and orchestration tools, particularly Kubernetes, Nomad, and Docker
Solid understanding of bash/shell scripting and proficiency in at least one programming language (preferably Python or JVM languages)
Experience maintaining data-related technologies: Apache Airflow, Apache Spark, DBs, BI tooling
Experience solving data access management issues at large scale data-lake
Familiarity with CI/CD deployment pipelines and related tools
Strong problem-solving skills and the ability to troubleshoot complex systems
Experience with data-related technologies (databases, data lakes, airflow, spark) is a plus