Job Description
Job Duties Design, build, and operate secure, scalable GCP and OpenShift (OCP/GKE) platforms to support deployment of GenAI models, LLMs, and RAG workloads. · Provision and manage cloud infrastructure using Terraform, including landing zones, networking, org policies, and hybrid connectivity across GCP and Azure. · Enable MLOps/LLMOps pipelines for model deployment, monitoring, and lifecycle management, integrating Arize AI and GenAI platforms. · Implement platform engineering best practices, including Kubernetes-based abstractions, internal developer portals, and self-service environments. · Ensure platform security, governance, and secrets management using HashiCorp Vault, IAM, and policy-as-code. · Establish observability, SLOs, and SRE practices to ensure reliability and performance of GenAI and platform services. · Collaborate with data scientists, ML engineers, and application teams to onboard new LLMs, APIs, and inference services efficiently. Tech Skills needed GCP · Azure · Terraform · Cloud Networking · Landing Zones · Org Policy / Governance · HashiCorp Vault · Hybrid Connectivity · Kubernetes · GKE · OpenShift (OCP) · Platform Engineering · Observability · SRE / SLOs · Python · Internal Developer Portals · GenAI Platforms · LLMs · RAG · MLOps/LLMOps Pay: From $45.00 per hour Work Location: In person