DevOps Engineer, ML Ops

Remote

Building the Future of Crypto

Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology.

What makes us different?

Kraken is a mission-focused company rooted in crypto values. As a Krakenite, you’ll join us on our mission to accelerate the global adoption of crypto, so that everyone can achieve financial freedom and inclusion. For over a decade, Kraken’s focus on our mission and crypto ethos has attracted many of the most talented crypto experts in the world.

Before you apply, please read the Kraken Culture page to learn more about our internal culture, values, and mission. We also expect candidates to familiarize themselves with the Kraken app. Learn how to create a Kraken account here.

As a fully remote company, we have Krakenites in 70+ countries who speak over 50 languages. Krakenites are industry pioneers who develop premium crypto products for experienced traders, institutions, and newcomers to the space. Kraken is committed to industry-leading security, crypto education, and world-class client support through our products like Kraken Pro, Kraken NFT, and Kraken Futures.

Become a Krakenite and build the future of crypto!

Proof of Work

The Team

Kraken is looking for an experienced Machine Learning Ops engineer to join our AI/ML Team in the centralized Data organization. In this role you will be building infrastructure to support building cutting edge AI/ML technology to solve the most complex and exciting problems in the quickly growing and evolving crypto industry. We are looking for an extremely strong communicator and team-player, who is able to break down large complex problems into smaller more manageable problems-to-solve. You will take initiative to work with engineers across the team and org, exploring different ways to resolve issues.

The Opportunity

  • Build ML and AI Ops infrastructure to enable the development and deployment of production models running at scale. This includes deployments across multiple cloud infrastructures
  • Lead resource planning and optimization, especially with GPU instances
  • Develop tool(s) to load test various production AI/ML systems
  • Work closely with SREs across the entire organization
  • Support 24/7/365 uptime of services by supporting a partial on-call rotation along with other members of the team
  • Stay up-to-date in machine learning, and artificial intelligence trends and technologies, all while contributing to the growth of AI/ML in the Crypto industry

Skills You Should HODL

  • Experience in deploying, maintaining, and monitoring production systems
  • A minimum of 3-5 years of experience in DevOps, SRE, AI/ML Engineer, or a similar discipline
  • Familiarity with the software development lifecycle, DevOps (build, continuous integration, deployment tools) and best practices
  • Programming skills in Python, Scala, Rust or other languages
  • Good written and verbal communication skills and interpersonal skills
  • Deep experience with Kubernetes and Docker
  • Experience with AWS, specifically S3, Athena, EMR, Sagemaker, and Lambda
  • Experience with Terraform, MLFlow, Flink, Kafka, MariaDB, and Nomad are all a plus
  • Knowledge of GenAI tools, such as Langchain, LlamaIndex, and open source Vector DBs, is a plus
  • Bachelor’s degree in Computer Science, Machine Learning or related field