← Back to jobs

Senior Software Engineer (Site Reliability)

Babylist
FULL_TIME Remote · US United States & Canada, City of Little Canada, US USD 15568–18682 / month Posted: 2026-05-17 Until: 2026-07-16
Apply Now →
Click to apply for this remote job opportunity.
Complete your application on the next page.
Job Description
Babylist is looking for a Senior Software Engineer, Site Reliability to join our Platform team In this position, you will play a vital role in ensuring our systems and services’ stability, scalability, and reliability You will work closely with all Babylist Engineering teams to support shared infrastructure and developer tools Your expertise in site reliability engineering, AWS cloud infrastructure, and modern DevOps practices will be instrumental in optimizing our systems and driving continuous improvement Our Tech Stack: Ruby on Rails, React, AWS, Sidekiq, MySQL, Redis, Native iOS and Android Manage and build our AWS infrastructure using Infrastructure as Code (IaC) tools like Terraform. You will ensure that our EKS clusters and databases are running up-to-date versions, optimizing performance and reliability Improve the speed and reliability of our Continuous Integration (CI) systems to support the entire Engineering Team, enabling faster and more efficient development and deployment processes Provide support to developers in troubleshooting issues across local development, staging, and production environments Establish, communicate, and support best practices for monitoring and alerting. This will involve setting up effective monitoring systems and defining actionable alerts for proactive incident management Benefits HAPPIER WEEKENDS – We work to live, not live to work. That means we generally don’t work nights or weekends, and we offer 10 paid holidays throughout the year, in addition to your paid time off. PAID TIME OFF THAT WORKS FOR YOU – Life is unpredictable. Use your time off for what you need, whether that’s a vacation, family event or a day to unplug. PARENTAL LEAVE – We offer a flexible, 12-week paid parental leave policy. FLEXIBLE BENEFITS – Choose from benefits that support your working style, whether that’s remote, in the office or something in between. HEALTH & WELLNESS – Wellness benefits include 100% paid coverage for medical, vision and dental for full-time employees. BABYLIST STORE DISCOUNT – All full-time employees receive a generous Babylist Store discount. You have excellent verbal and written communication skills, and the ability to collaborate effectively with cross-functional teams Troubleshooting and debugging are second nature to you, allowing you to quickly identify and resolve issues across various environments Proven experience in on-call management best practices, including effective incident response, escalation procedures, and post-incident reviews to drive continuous improvement and ensure system reliability Experience designing and supporting CI systems such as CircleCI, Jenkins, or GitHub Actions Experience supporting high-traffic consumer-facing websites, understanding the unique challenges and considerations in maintaining such systems 8+ years of experience as a Site Reliability Engineer or similar role, demonstrating a strong background in maintaining highly available and scalable systems You are familiar with monitoring and alerting best practices, utilizing tools like Datadog, Cronitor, Sentry, and PagerDuty to ensure proactive identification and resolution of issues You possess strong experience working with AWS cloud-based infrastructure and services, ensuring their reliability, performance, and security You have a solid understanding of cloud-native systems design, including CDNs, load balancers, cloud networking, DNS, caching, and distributed systems Proficiency with Docker and Kubernetes is essential, as you will contribute to the design, deployment, and management of containerized applications in our environment Proficiency with Terraform is a must, as you will be a member of the team responsible for managing and building our AWS infrastructure using Infrastructure as Code (IaC) practices You’re comfortable and enthusiastic about working in an AI-forward environment where AI tools are part of daily operations.You embrace using technology to enhance your work while keeping people at the center- Babylist uses AI to record and transcribe all interviews for evaluation purposes in accordance with CCPA and GDPR. By participating in an interview, you consent to this recording and transcription During the interview process, we’re evaluating your individual problem-solving skills, creativity, and approach to challenges While AI tools like ChatGPT, Claude, and Cursor are part of your daily toolkit once you join Babylist, all interviews, assessments, and take-home assignments must be completed indepen