Job Description
Job Requisition ID JR2017131 Job Category Engineering Time Type Full time NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. Our invention serves as the visual cortex of modern computers and is at the heart of our products and services. Our work reveals new frontiers to explore, inspires remarkable creativity and discovery, and fuels what were once science fiction inventions from artificial intelligence to autonomous cars. NVIDIA is seeking exceptional individuals like you to help us drive the next wave of artificial intelligence. NVIDIA DGX, HGX, and MGX servers deliver the world's leading solutions for enterprise AI infrastructure at scale. Enterprise needs a computing infrastructure that can be easily managed in a data center. We are the Datacenter Software Tools team at NVIDIA. We deliver Infrastructure and Tools for data center deployment, firmware and software package deployment and server manageability. We are looking for a hard-working and experienced senior manager having experience with Datacenter Software and Firmware release management and infrastructure. In this role, you will be driving the release of software and firmware for the world’s best resilient GPU based datacenter servers. This is a highly transparent role at NVIDIA to guarantee high quality infrastructure and tooling for software and firmware release features for NVIDIA's scale up and scale out solutions – spanning frontend, backend, infrastructure, and CI/CD based automation. This role requires you to work closely with multi-functional teams including system architects, firmware developers, compliance and security teams, and product management to deliver exceptional software and firmware release solutions. Join us at the forefront of technological advancement. What You’ll Be Doing In this technical role, you will be bringing in leadership on how releases should be delivered to end customers of rack-scale computing based on tightly coupled compute and switch trays and build end to end infra and workflows to ensure the highest quality releases for data center firmware and software. Define release scope for rack scale products working cross functionally with product management, technical architects and program management. Deliver these releases that flow through the validation matrix for customer end use cases, ensuring delivered firmware and software is of the highest quality. Solutions must scale, be resilient, and support secure upgrades or rollbacks across diverse customer scenarios. Influence architecture, design and implementation decisions for compute and switch trays software and firmware - ensuring quality across nightly, dev and production drops for all customer use cases, with the right release-validation strategy at each phase of development life cycle. Partner with all matrixed organizations: Developers, SWQA, Product engineering to left-shift release quality from dev to QA in a very fast-moving environment with end-to-end CI/CD to ensure no bug is found at customer site. Enforce it with well-placed quality metrics for any product milestone and track KPIs published at regular cadence that are enforced. Monitor and report progress of releases to all stakeholders. Own ingestion and packaging of software and firmware binaries, readying them for deployment across multiple platforms at scale across different CSP environments. Document procedures and engage in collaborative discussions to refine software and firmware release workflows, including identifying and resolving issues in release milestone packaging and deployment procedures and remove bottlenecks. Shape the team's roadmap and drive innovation — including self-service interfaces, automation, AI-assisted validation and triage, and sophisticated release-compliance reporting. Continuously review and identify improvement opportunities in established release processes, infrastructure, and practices. Ensure the teams are performing in the most efficient and transparent way with a strong focus on automation and measurable targets. What We Need To See 12+ overall years in the software industry with specialization in system software and/or firmware development. 5+ years of proven technical hands-on leadership for multi-team organizations across data center firmware like BMC, FPGA, CPLDs, network switches, building Infrastructure for continue improvement for quality of releases. BS/MS/PhD in CS, CE, EE, or a related technical field — or equivalent experience Prior experience in systems software or firmware development with a proven history of guiding complex software features or products throughout the entire product life cycle. Ideall