← Back to jobs

Linux Admin with GPU

Jobs via Dice
FULL_TIME Remote · US US Posted: 2026-05-11 Until: 2026-07-10
Apply Now →
You will be redirected to the original job posting on BeBee.
Apply directly with the employer.
Job Description
Dice is the leading career destination for tech experts at every stage of their careers. Our client, Rivago infotech inc, is seeking the following. Apply via Dice today! Role: Data Center GPU Commissioning Engineer Location: Remote USA Duration: Long term Project Job Description The Data Center GPU Commissioning Engineer is responsible for commissioning, validating, and stabilizing GPU‑based infrastructure in data center environments. This role ensures GPU servers, interconnects, drivers, firmware, and platform software are correctly installed, configured, tested, and production‑ready to support AI, ML, and HPC workloads. The engineer works closely with Deployment, Network, Platform, and Operations teams to deliver reliable, high‑performance GPU clusters and ensure smooth handover to run operations. Key Responsibilities Perform end‑to‑end commissioning of GPU servers and clusters in data centers. Validate hardware installation, power, cooling, and cabling readiness for GPU systems. Install and configure GPU drivers, firmware, BIOS settings, and system software. Verify GPU health, performance, and stability using standard validation and burn‑in tests. Validate high‑speed interconnects and networking used for GPU workloads. Execute cluster‑level testing for AI / HPC readiness and baseline performance. Identify, troubleshoot, and resolve hardware, driver, or configuration issues during commissioning. Work with OEMs and vendors for issue resolution and firmware recommendations. Ensure systems comply with security, hardening, and operational standards. Document commissioning procedures, results, and as‑built configurations. Support handover to operations teams and assist during early‑life stabilization. Technical Skills Required Skills & Experience Hands‑on experience with GPU‑based servers in data center environments Strong understanding of: Linux system administration GPU drivers, firmware, and system tuning Server BIOS, firmware upgrades, and hardware diagnostics Familiarity with data center networking concepts and high‑performance interconnects Exposure to AI / ML / HPC environments is strongly preferred Operational Skills Strong troubleshooting and root cause analysis skills Experience working in structured deployment and commissioning processes Ability to follow and improve runbooks and SOPs Certifications (Preferred) OEM server certifications (HPE / Dell / Lenovo or equivalent) Linux administration certifications GPU / AI platform certifications (nice to have)