Platform AI-Ops Engineer

NALEJ.ai Corporation

FULL_TIME Remote · US Arlington, VA, Tarrant, US Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

NALEJ is pioneering the transformation of hybrid infrastructure technologies, redefining how secure data management and operational systems are implemented across diverse sectors. Our vision is to cultivate an ecosystem where NALEJ OS, an Enterprise Open Source platform, facilitates seamless and secure interactions from cloud to edge. By offering dual-use technologies across DevSecOps, DataSecOps, AI+ML Ops, Cyber Ops, and Network Ops, NALEJ provides a secure and efficient environment for both defense and businesses customers. We deliver services through SaaS, PaaS, IaaS, Hybrid, and Air-Gap models, focusing on secure data exchange without direct exposure of sensitive information. Our commitment to innovation includes providing secure edge networks and ensuring compliance with IL4-IL6+SAP MOSA, NIST, and Zero Trust standards. Nalej operates a hybrid, bidirectional cloud-to-edge and edge-to-cloud Kubernetes platform. It is the substrate on which app and software developers build and run agentic workbenches and agent-driven flows, and the connective tissue that securely links those agents to the datasources. This includes MCP server, logs, metrics, and live traffic they need to reason over without breaking tenant isolation or leaving the zero-trust perimeter. We're an AI-first team — every engineer uses an AI Integrated Development Environments daily, with context files, custom agents, and shared memory tuned to NALEJ OS platform. We expect you to work in this novel environment and we'll invest in making you better at it. Key Responsibilities Architecture Design: Develop and own end-to-end architectural blueprints for agentic platform features supporting next-generation applications. Team Leadership: Guide and mentor cross-functional engineering teams through best practices, technical reviews, and continuous learning. Solution Delivery: Drive the successful execution of platform initiatives, ensuring alignment with business and product objectives. Technology Expertise: Act as a subject matter expert across cloud platforms, open-source tools, edge technologies, and IaC. Collaboration and Communication: Work closely with product managers, engineers, DevOps, and stakeholders to define technical requirements and ensure architectural integrity. Project Management: Oversee technical timelines, dependencies, and deliverables for platform-centric projects. Documentation: Maintain high-quality architectural documentation, design patterns, and technical standards. Required Skills and Experience Hands-on with agentic systems: Claude Agent SDK, MCP, LangGraph, or comparable LLM-driven orchestration Running AI/ML workloads on Kubernetes — GPU scheduling, model serving, developer workbench platforms GitOps at scale: ArgoCD ApplicationSets, Helm, Kustomize. Example Projects You Might Lead We are converging the platform into an agentic toolbench — a graph of cooperating agents that pair with human developers, carry context between tenants and clusters, and reach back to edge sites with the same zero-trust guarantees they enjoy in cloud. Extend the MCP / agent / skill substrate so agents can securely query datasources, tail logs, inspect east-west traffic, and act on cluster state within tenant- and clearance-aware boundaries. Own the cloud ↔ edge routing story — bidirectional mesh, identity federation, policy, and graph-based context sharing between agents operating on different sides of the boundary. Onboard AI workloads (inference endpoints, developer workbenches, orchestration services) with the same rigor as the rest of the mesh: STRICT mTLS, admission policy, per-workload IAM, cost attribution, SLOs. Harden the CI template library and Pulumi IaC that deliver all of this — ApplicationSets, sync-wave ordering, IAM policy, ASG drift detection — as the agent surface grows. Preferred Qualifications Bachelor's or Master’s degree in Computer Science, Engineering, or related field. Certifications in cloud architecture (AWS Certified Solutions Architect, Google Cloud Architect, etc.). Experience working in US regulated environment s (e.g., Defense, healthcare, Secure control systems). Experience with agent-to-agent (A2A) protocols, cache-coherence framing for multi-agentic systems, and/or graph-based agent memory practices. Additional Notes US Citizens ONLY MUST HAVE EXISTING OR ABILITY TO GAIN SECRET CLEARANCE <