Site Reliability Engineer

VITURE

FULL_TIME Remote · US Cupertino, CA, Santa Clara, US USD 135000–180000 / month Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Who We Are VITURE is the #1 XR glasses brand in the US — and we got here in just four years, making us the fastest-growing startup in smart glasses. Now we're building what comes next: the first great AI interface you wear, not hold. We blend spatial computing, real-time AI, and beautiful industrial design into intelligent eyewear you'd actually want to wear every day. Our mission: build the future of wearable intelligence — technology that feels personal, intuitive, and indispensable. We're a cross-disciplinary team shipping at the intersection of hardware, software, and AI. Ideal Candidate We're looking for an experienced Site Reliability Engineer (SRE) to build and scale the cloud infrastructure that powers our intelligent eyewear. You'll design, automate, and maintain highly available, high-performance environments that bridge our hardware, AI models, and user-facing applications. If you're passionate about system reliability, eliminating toil through automation, and pushing what's possible in AI and spatial computing backends, we'd love to hear from you. Role Description This is an exempt, full-time salaried position based in Cupertino, CA. As a Site Reliability Engineer, you will design, build, and operate the scalable infrastructure across VITURE's XR+AI product ecosystem. You will collaborate with backend engineers, AI researchers, product managers, and hardware teams to ensure our services deliver a lightning-fast, highly reliable experience to hundreds of thousands of users worldwide, from concept through production. Responsibilities Design, deploy, and manage scalable infrastructure across mainstream cloud platforms to support our high-traffic AI services (e.g., LLM inference pipelines, real-time voice, and spatial computing backends). Establish and execute incident management protocols, participate in on-call rotations, and lead blameless post-mortems to continuously reduce Mean Time to Recovery (MTTR) and improve system reliability. Champion Infrastructure as Code (IaC) principles. Leverage automation tools (e.g., Terraform, Ansible) to automate provisioning, configuration, and deployments, actively eliminating manual operational toil. Build and maintain comprehensive observability platforms (monitoring, logging, tracing) to track real-time resource utilization, define critical metrics, and ensure strict Service Level Objectives (SLOs) are met. Collaborate closely with software and AI teams to embed reliability early in the development lifecycle, streamline CI/CD pipelines, and optimize performance across servers and containerized environments. Enforce robust security postures by implementing network policies, configuring firewalls, and establishing rigorous backup and Disaster Recovery (DR) strategies. Mentor junior engineers and contribute to fostering a culture of engineering excellence and reliability as needed. Qualifications Bachelor's degree or above in Computer Science, Electrical Engineering, or a related field; or equivalent practical experience with a demonstrated track record of exceptional work. 2+ years of professional experience in an SRE, DevOps, or Cloud Infrastructure role, preferably supporting high-concurrency consumer applications or AI services. Strong hands-on experience managing and scaling distributed systems on mainstream public or private cloud ecosystems. Solid expertise in containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes) for deploying complex microservices architectures. Proficiency in scripting or programming languages (such as Python, Go, or Shell) to develop custom automation and tooling. Experience with modern observability, telemetry, and centralized logging stacks (e.g., Prometheus, Grafana, ELK Stack). Excellent communication skills with the ability to articulate technical decisions and collaborate effectively with cross-functional teams. Nice to Have Previous experience managing infrastructure for AI/ML workloads (e.g., GPU cluster management, multimodal AI inference deployments). Personal project experience is a strong plus — we love seeing what you build for fun. Share your side projects, open-source infrastructure contributions, or passion builds with us. Product-minded with a refined standard for reliability and craft — you care deeply about the user experience when systems degrade gracefully, not just whether the servers are running. What We Offer Define and build the next-generation AI glasses platform — a front-row seat to shaping how people interact with wearable intelligence An AI Native working environment — AI is deeply integrated into our workflows, tools, a