Job Description
Title: AI Data Engineer Location: Rockville, MD (Hybrid) Duration: 12+ months Contract Project Description The AI Data Engineer and implements data pipelines and retrieval systems for a generative AI platform. This role is responsible for ingesting, transforming, and indexing domain content to enable accurate, grounded responses from AI-powered applications. The AI Data Engineer collaborates with agent developers and platform engineers to continuously improve knowledge retrieval quality and coverage. Key Responsibilities Data Engineering & ETL • Design and develop ETL pipelines for ingesting structured and unstructured data sources into searchable knowledge stores Build robust, repeatable ingestion workflows that handle document parsing, transformation, and loading at scale Implement data quality checks and validation to ensure accuracy and completeness of ingested content Utilize AWS services (e.g., S3, Lambda, Step Functions, OpenSearch, Bedrock) to build and operate data pipelines and retrieval infrastructure. RAG Pipeline Development & Search Tuning • Architect and optimize retrieval-augmented generation (RAG) pipelines including document chunking strategies, vector embedding generation, and retrieval mechanisms Tune search relevance and retrieval quality using vector databases and search engines, iterating on ranking and filtering approaches. Evaluate retrieval accuracy using evaluation frameworks and custom benchmarks, establishing measurable quality baselines Experiment with embedding models, chunking parameters, and hybrid search strategies to continuously improve answer quality. Quality & Testing • Design and implement test strategies for data pipelines, including validation of ingestion accuracy, data completeness, and transformation correctness Develop automated regression tests to detect retrieval quality degradation across pipeline changes Build and maintain evaluation benchmarks that measure retrieval precision, recall, and relevance Champion test-driven development (TDD) practices for pipeline and integration code Generative AI & Emerging Technologies • Stay informed of advances in RAG architectures, embedding models, and retrieval optimization techniques Identify opportunities to improve knowledge retrieval through emerging approaches (e.g., contextual retrieval, reranking, hybrid search) Collaborate with agent developers to ensure knowledge tools return well structured, contextually relevant results . Security & Compliance • Assist with adherence to technology policies and comply with all security controls Implement secure coding practices, particularly in handling personally identifiable information (PII) and sensitive regulatory data Participate in threat modeling and security discussions for API and infrastructure components Understand and apply ***''''''''s security standards and best practices for regulated financial environments