Senior Data Engineer - AI

Everest Technologies

FULL_TIME Remote · US , , United States, US Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

We are looking for a Data Engineer to build the "memory" and "knowledge" backbone of our Agentic AI ecosystem. You will be responsible for designing data pipelines that feed into our servers, ensuring that AI agents managed via have real-time access to accurate, secure, and contextually relevant enterprise data. Key Responsibilities Context Engineering: Design and optimize data schemas specifically for LLM consumption, ensuring that data retrieved via is structured to minimize token usage and maximize reasoning accuracy. Hybrid Pipeline Development: Build robust data pipelines using Python (for AI/ML workflows) and C#/.NET (for enterprise integration) to move data from legacy systems into AI-ready formats. Vector Database Management: Implement and maintain Vector Databases (e.g., Pinecone, Weaviate, or Milvus) to support Retrieval-Augmented Generation (RAG) alongside live . Data Governance for AI: Work with the to enforce data masking, PII redaction, and fine-grained access control before data reaches an LLM. Metadata Orchestration: Manage the OpenAPI and MCP metadata that allows AI agents to "understand" the data they are querying. Technical Qualifications Languages: Expert-level Python (Pandas, PySpark, SQLAlchemy) and strong familiarity with C# for interacting with .NET-based data layers. AI Data Stack: Hands-on experience with Vector Databases and embedding models. API Management: Understanding of how data is exposed through and secured via MCP-specific authorization flows. Modern Data Stack: Experience with SQL/NoSQL databases, dbt, and cloud data warehouses (Snowflake, BigQuery, or Databricks). Protocol Knowledge: Familiarity with the and how it standardizes data retrieval for AI agents. Preferred Skills Experience building Knowledge Graphs to provide relational context to AI agents. Familiarity with semantic caching to reduce LLM costs and improve response times. Knowledge of for monitoring data drift in agentic conversations.