Forward Deployed Engineer (Inference & Post-Training)

ChatGPT Jobs

FULL_TIME Remote · US San Francisco, CA, US USD 270000–300000 / month Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Job Description Forward Deployed Engineer Job Info Forward Deployed Engineer (Inference & Post-Training) Location & Work Arrangement Location: San Francisco, CA Work Mode: On-site, Remote Company: Together AI Job Description Summary A Forward Deployed Engineer (FDE) acts as a hands-on technical partner to strategic production AI teams, leveraging high-quality models for large-scale inference. This role serves as a deep-domain specialist in inference optimization, fine-tuning pipelines, and production deployment, collaborating with Solutions Architects to ensure customer success and platform adoption. Machine Learning & Artificial Intelligence Key Responsibilities Inference Engine Optimization: Select, configure, and optimize inference engines based on hardware, model architecture, and workload. Configuration & Performance Tuning: Develop configurations for POCs and benchmarks; tune KV cache, speculative decoding, tensor parallelism, and quantization strategies. Post-Training & Fine-Tuning: Drive RL training runs and guide customers through LoRA, SFT, DPO, RLHF, and GRPO pipelines. Strategic Customer Alignment: Serve as the primary technical contact for strategic accounts; monitor endpoint configurations and ensure milestone achievement. Opinionated Onboarding: Establish alignment during onboarding to ensure optimal configurations from day one. Product Feedback Loop: Influence software and model roadmaps by surfacing field insights and driving early feature adoption. Qualifications Experience: 5+ years in technical roles with a focus on inference systems, open-source LLM deployment, or post-training workflows. Inference Engines: Expert-level hands-on experience with vLLM, TensorRT-LLM, or SGLang. Optimization Expertise: Deep knowledge of KV cache tuning, speculative decoding, tensor/pipeline parallelism, and quantization. Post-Training: Experience with LoRA, SFT, DPO, RLHF, and GRPO fine-tuning pipelines. Model Awareness: Broad knowledge of state-of-the-art open-source models for selection based on use cases and hardware. Coding Skills: Strong Python proficiency for production environments. Compensation & Benefits Base Salary Range: $270,000 - $300,000 OTE (US Full-time) Additional Compensation: Startup Equity + Benefits Benefits: Health insurance, flexible remote work policy.