At Neptune, we have a quite ambitious goal of becoming an MLOps standard for data scientists worldwide. Our platform is a metadata repository built for research and production teams that run multiple ML experiments.
Our team is growing – we are on the lookout for a Senior Site Reliability Engineer who will ensure that our production services are always up and running. You’ll be responsible for the reliability and usability of the developer platform (CI/CD). Among your tasks will be driving initiatives to improve the API error rate and latency.
You’ll have a lot of independence and space to test your creative ideas. We are looking for a self-driven, hands-on, and proactive person who is not afraid to take responsibility for the outcome.
In this role, you will:
- Own and operate platform and storage services like Kubernetes, Kafka, Elasticsearch, MySQL;
- Monitor the infrastructures utilization and plan capacity;
- Own the service level health indicators: Service Level Metrics & Service Level Objectives;
- Own the developer platform: CI/CD;
- Own the installation/upgrade process for on-prem deployments;
- Build tools and design processes that help improve observability and system resiliency;
- Establish design patterns for monitoring, benchmarking, and deploying new features for the backend services;
- Automate and operationalize engineering tasks – data migrations, capacity changes, etc.
- 5+ years of experience in a Site Reliability Engineer, Backend Engineer or similar role;
- Coding experience in Java and/or Scala;
- Knowledge of Linux: administration, networking, containerization;
- Experience with and deep understanding of Kubernetes;
- Experience with MySQL and ElasticSearch;
- Good, communicative spoken and written English.
Nice to have:
- Coding experience in Python and/or Bash;
- CI/CD tools expertise;
- Experience with Helm;
- Experience with Kafka;
- Experience with Terraform.
We can offer:
- The thrill of building a world-class product for some of the most innovative people on Earth;
- Startup atmosphere, friendly working environment, and a lot of autonomy;
- Opportunity to learn, experiment with ideas, and grow;
- Competitive base salary and opportunity to participate in the Employee Stock Option Plan;
- Flexible working hours and fully remote work if you want;
- Multisport card, medical care, and free lunch at the office.