Staff AI Engineer

People In AI

FULL_TIME Remote · US San Francisco Bay Area, US USD 225000–250000 / month Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Job Title: Staff AI Engineer (Agentic AI) Location: San Francisco, CA (Onsite) Compensation: $225,000 - $250,000 base + meaningful equity Role: Staff AI Engineer – Agentic AI A well-funded AI infrastructure startup is hiring a Staff AI Engineer to help build the core intelligence layer powering autonomous workflows across complex engineering software environments. This company is developing production-grade agentic AI systems capable of executing real multi-step workflows inside enterprise desktop applications used by Fortune 100 organizations. The work sits at the intersection of AI agents, workflow orchestration, evaluation systems, and large-scale enterprise automation. This is a deeply technical, highly hands-on role focused on solving difficult real-world agent reliability problems, not lightweight chatbot integrations or internal AI demos. Why This Role Matters The agent platform is the core product and the company’s primary technical differentiator. This role will directly influence how AI systems: Reason through complex workflows Interact with tools and software systems Recover from failures Optimize cost and latency Scale reliably in production enterprise environments You’ll operate in a high-autonomy engineering culture with direct ownership over architecture, evaluation infrastructure, and production agent performance. This is an opportunity to help define how real-world agentic systems are built and measured at scale. What You’ll Do Build and improve production AI agents capable of executing complex, multi-step workflows across desktop software environments and enterprise systems. Design and own critical components of the agent architecture, including tool orchestration, workflow state management, context handling, model routing, memory systems, and failure recovery strategies. Develop evaluation and benchmarking infrastructure focused on workflow completion rates, reliability, regression detection, operational efficiency, and overall production performance. Translate real customer workflows into measurable evaluation frameworks and production benchmarks that can continuously improve agent quality over time. Optimize token usage, inference efficiency, and runtime cost to ensure the systems remain commercially viable at scale. Collaborate closely with engineering, product, and customer-facing teams to expand workflow coverage and improve the reliability of agent behavior in real-world environments. Operate as a highly hands-on technical leader by contributing directly to implementation, architecture decisions, and long-term engineering direction. Ideal Candidate Has strong experience building production-grade agentic AI systems rather than simple LLM wrappers or chatbot interfaces. Understands the complexities of multi-step AI orchestration, including tool use, workflow reliability, state management, and context handling under ambiguous conditions. Brings a strong Python engineering background and demonstrates strong systems-level thinking across architecture and production environments. Has experience designing evaluation frameworks or benchmarking systems for AI or machine learning products. Is comfortable operating in fast-moving startup environments with broad ownership, technical ambiguity, and high expectations for execution. Enjoys remaining deeply hands-on technically while also influencing engineering direction and technical standards across the organization. Bonus Points For Experience working with evaluation frameworks such as SWE-bench, GAIA, or similar benchmarking environments. Exposure to workflow automation, desktop automation, or enterprise software orchestration systems. Experience deploying AI systems into real production enterprise environments with meaningful scale and reliability constraints. Previous startup experience within highly technical AI product organizations. Familiarity with observability, tracing, and reliability tooling used to monitor production AI systems. About Us People In AI partners with high-growth AI and machine learning companies to help build exceptional technical teams. We work closely with startups and scaling organizations building meaningful AI products and infrastructure, helping candidates connect with high-impact opportunities through a transparent and streamlined process.