Job Description
Job Description The Role: We are seeking an experienced, technical oriented, impact delivering-driven expert in ML Training Infrastructure with a strong ability to execute hands-on technical work. In this role, you will be responsible for designing and building scalable, reliable, and high-performance AI/ML platform infrastructure to support advanced AI research and model development initiatives. As a Senior ML Engineer, you will collaborate closely with machine learning engineers, research scientists, and other partners to develop state-of-the-art AI solutions that enable the future of intelligent driving technologies across General Motors vehicles. What You'll Do: Design and development of scalable, reliable, high-performance ML framework to support model training at scale. Model training performance analysis and optimization solutions to scale distributed training workflows and maximize resource utilization across heterogeneous hardware environments, and save cost. Raise the bar on system observability, debuggability, and operational excellence, and user experience. Collaborate with cross-functional teams to integrate new features and technologies into the platform. Your Skills & Abilities (Required Qualifications) Bachelors degree or higher in Computer Science or equivalent major OR equivalent relevant experience 3+ years professional software engineering experience 2+ years specialized experience in AI/ML infrastructure, e.g., enabling distributed training for scaling large ML models Strong programming skills in Python, with proficiency in frameworks such as,PyTorch (preferred), TensorFlow, or similar Experience with distributed computing, GPU computing, and cloud environments (AWS, GCP, Azure).