ML Platform Engineer

Foxglove Technologies, Inc

INTERN Remote · US San Francisco, CA, US Posted: 2026-05-11 Until: 2026-06-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Build the data infrastructure for robots operating in the real world. Robotics is moving from research labs into production across factories, warehouses, vehicles, and field deployments. When robots fail, behave unexpectedly, or need to be improved, engineers rely on data to understand what actually happened. At Foxglove, we build the observability, visualization, and data infrastructure that makes that possible. Our tools are used by robotics and autonomous systems teams to ingest, store, query, replay, and analyze massive volumes of multimodal sensor data from live systems and from production fleets. About the Role We're looking for a ML Platform Engineer with deep infrastructure instincts to help design, deploy, and scale the systems that power Foxglove's data platform. This is a platform-first role: you'll own the infrastructure layer that makes ML possible in production, not just the models that run on top of it. You'll be responsible for the reliability, scalability, and performance of the ML platform itself, from inference serving and pipeline orchestration to training infrastructure and evaluation frameworks. The problems are real and urgent: petabyte-scale multimodal robotics data, high-throughput retrieval and embedding pipelines, and the internal ML flywheel that lets our team ship fast. This is a hands-on infrastructure role, not research. Key Responsibilities Design, deploy, and operate production inference infrastructure - including model serving, autoscaling, load balancing, and cost optimization across cloud environments Own the platform architecture for embedding and retrieval pipelines that power semantic search over multimodal robotics data (image, video, point cloud, and timeseries) Build and maintain the training and evaluation infrastructure that enables rapid iteration on model performance - including job orchestration, exper