Job Description
Code and Theory is seeking a senior cloud engineer who wants to own the full technical lifecycle of enterprise client deployments — from assessing a client's existing infrastructure and designing the integration architecture, to provisioning the environment and keeping it running in production. This is a hands-on role that also requires real client engagement: you will sit in technical working sessions, work directly with client DevOps teams, and be the person who translates security and compliance requirements into concrete infrastructure decisions. You will be working on a platform that connects our AI product to enterprise marketing environments across different cloud providers. Each client brings its own infrastructure, data stack, and security perimeter. Your job is to figure out how to connect to it, build the integration reliably, and hand it off in a state that can be operated and maintained over time. WHAT YOU'LL DO Assess each client's cloud infrastructure, data stack, and security perimeter before any build starts — translate findings into a concrete integration plan Design and provision client environments using Terraform — networking, IAM, container orchestration, managed storage, and secrets management across GCP, AWS, or Azure depending on the client Deploy and operate LLM inference pods in client cloud environments — managing API integration, rate limits, latency, and failure handling without needing a data scientist in the room Build and maintain integration layers that connect client data sources to the AI layer — you own the plumbing that makes inference useful; the data science team owns what runs on top of it Deploy and maintain containerized workloads via Helm — orchestration, ETL workers, and AI inference pods running inside the client's cloud perimeter Own data pipeline deployments end to end — scheduling, pagination, retry logic, and rate-limit management against client API gateways Manage distributed ETL jobs at scale — JSON flattening, schema enforcement, and structured output delivery Enforce data residency requirements — raw data stays inside the client's environment, only structured output leaves to our shared infrastructure Serve as the primary technical contact for each client's DevOps and infrastructure teams throughout the engagement Lead technical working sessions with client teams — validate configurations, confirm IAM and credential models, review cluster specs before deployment Triage and resolve pipeline and infrastructure failures across multiple client environments simultaneously Implement container security standards — non-root execution, read-only filesystems, startup integrity hashes, tamper protection Mentor junior and mid-level engineers and contribute implementation patterns to the shared playbook after every client engagement WHAT YOU'LL NEED 6+ years of cloud engineering experience — production experience on at least two of GCP, AWS, or Azure, and willing to operate across all three depending on client environment Deep Terraform experience — you have provisioned multi-environment, multi-tenant production infrastructure from scratch, not just applied existing configurations Comfortable with Kubernetes and Helm in production — deploying, debugging, scaling, and securing containerized workloads, not just running them Have run data pipelines in production — you know what breaks, how to recover, and how to build retry and backoff logic that actually holds up Hands-on experience with distributed processing at scale — not just theoretical knowledge Enterprise API integration experience — OAuth 2.0, API key management, rate limiting, API gateways — end to end Solid security fundamentals — container hardening, credential management, and data residency enforcement in environments where it actually matters Have worked directly with client or customer engineering teams — comfortable leading technical conversations with enterprise DevOps and infrastructure teams, not just supporting them You write things down — runbooks, integration notes, playbook contributions. The next engineer should be able to operate what you built Hands-on experience integrating LLM APIs (Anthropic, OpenAI, or equivalent) into production pipelines — not as an end user, but wiring inference into systems that run at scale across real client environments NICE TO HAVE Experience with: LLM inference infrastructure Marketing technology stacks (DAM, CDP, CRM) Multi-tenant client environments A cloud certification (AWS Solutions Architect, GCP Professional Cloud Architect, or Azure equivalent) Agency, consultancy, or product company background s