Job Description
Attention: Kapitus is aware that individuals posing as recruiters may be communicating with job seekers about supposed positions with Kapitus. Kapitus has received reports that the content and method of communication can vary, but messages may contain requests for payment (e.g., fees for equipment or training) and/or for sensitive financial information. Kapitus will never ask a candidate for employment for payment or financial information during the initial application or interview process. All open positions are posted in location specific employment portals available at www.kapitus.com/careers All legitimate Kapitus job postings on employment sites will direct candidates to complete an application through these portals before completion of the hiring process. Candidates with additional questions or concerns regarding any recruiting communications or Kapitus’ recruiting process in general should email recruiting@kapitus.com We are seeking a senior MLOps Architect to design and scale a modern ML and Generative AI platform across AWS. This role will own the architecture for traditional ML and LLM/Generative AI pipelines, ensuring production reliability, governance, cost optimization (FinOps), and enterprise-grade security. The ideal candidate has deep expertise in AWS, SageMaker, Databricks, Atlan (data catalog/governance), and modern MLOps tooling, and understands how to operationalize LLMs, RAG systems, and foundation models within a governed, scalable MLOps stack. This is a strategic, hands-on architecture role responsible for integrating GenAI capabilities into an enterprise ML platform. What You’ll Do MLOps & GenAI Platform Architecture Design and implement scalable ML and LLM infrastructure on AWS (SageMaker, EKS, S3, IAM, Lambda, Step Functions, CloudWatch). Architect end-to-end ML and Generative AI lifecycle workflows: Data ingestion & preprocessing o Feature engineering / embedding generation o Model training & fine-tuning (traditional ML + foundation models) Model evaluation & validation Deployment (real-time, batch, streaming) Monitoring & retraining Integrate LLM pipelines (prompt workflows, RAG architectures, fine-tuning flows) into the enterprise MLOps stack. Define standards for CI/CD/CT pipelines across ML and GenAI workloads. Generative AI & LLM Operationalization Architect Retrieval-Augmented Generation (RAG) pipelines including: Embedding generation workflows Vector database integration Document ingestion and chunking strategies Retrieval evaluation and monitoring Design and deploy LLM-based services using: Managed services (e.g., SageMaker endpoints, Bedrock-style APIs) Containerized custom inference services Establish prompt versioning, evaluation frameworks, and experiment tracking for LLM systems. Implement guardrails for hallucination control, safety monitoring, bias detection, and usage logging. Define architecture for LLM fine-tuning workflows (including data curation, evaluation, and cost controls). Implement scalable orchestration of LLM pipelines using workflow engines and event-driven patterns. Deployment, Monitoring & Reliability Architect scalable inference patterns for: Traditional ML models LLM APIs RAG systems Implement model monitoring frameworks for: Performance degradation Drift detection LLM output quality Latency and token usage metrics Define SLAs/SLOs for ML and GenAI systems. Design safe deployment strategies (blue/green, canary, shadow testing). Establish logging, observability, and traceability standards for GenAI systems FinOps & Cost Optimization Implement cost tracking for: Training workloads o GPU utilization Inference endpoints o Token consumption (LLM APIs) Vector database storage Optimize LLM workloads for cost-performance tradeoffs (model size, batching, caching strategies). Design autoscaling and compute optimization strategies for GPU and CPU-based inference. Partner with finance and engineering teams to forecast ML/GenAI infrastructure spend. Platform Enablement & Standards Define enterprise standards for: Experiment tracking