We Raised $8M Series A to Continue Building Experiment Tracking and Model Registry That “Just Works”

Read more
Back to job listings

Dev team

Senior Site Reliability Engineer

At Neptune, we have a quite ambitious goal of becoming an MLOps standard for data scientists worldwide. Our platform is a metadata repository built for research and production teams that run multiple ML experiments.

Our team is growing – we are on the lookout for a Senior Site Reliability Engineer who will ensure that our production services are always up and running. You’ll be responsible for the reliability and usability of the developer platform (CI/CD). Among your tasks will be driving initiatives to improve the API error rate and latency.

You’ll have a lot of independence and space to test your creative ideas. We are looking for a self-driven, hands-on, and proactive person who is not afraid to take responsibility for the outcome.

 

In this role, you will:

  • Own and operate platform and storage services like Kubernetes, Kafka, Elasticsearch, MySQL;
  • Monitor the infrastructures utilization and plan capacity;
  • Own the service level health indicators: Service Level Metrics & Service Level Objectives;
  • Own the developer platform: CI/CD;
  • Own the installation/upgrade process for on-prem deployments;
  • Build tools and design processes that help improve observability and system resiliency;
  • Establish design patterns for monitoring, benchmarking, and deploying new features for the backend services;
  • Automate and operationalize engineering tasks – data migrations, capacity changes, etc.

 

Our expectations:

  • 5+ years of experience in a Site Reliability Engineer, Backend Engineer or similar role;
  • Coding experience in Java and/or Scala;
  • Knowledge of Linux: administration, networking, containerization;
  • Experience with and deep understanding of Kubernetes;
  • Experience with MySQL and ElasticSearch;
  • Good, communicative spoken and written English.

Nice to have:

  • Coding experience in Python and/or Bash;
  • CI/CD tools expertise;
  • Experience with Helm;
  • Experience with Kafka;
  • Experience with Terraform.

 

We can offer:

  • Fully remote work and flexible working hours;
  • The thrill of building a world-class product for some of the most innovative people on earth;
  • Startup atmosphere and a lot of autonomy;
  • Competitive base salary (inflation-indexed) and opportunity to participate in the Employee Stock Option Plan;
  • 20 paid service-free days per year;
  • Co-financing of Multisport card, private medical care, and free lunch if you ever happen to be at the office.

 

Any questions?

Don’t hesitate to contact our Talent Acquisition Partner, Bartosz, and check out our About us page to get to know the story and faces behind Neptune.

Apply now