Artizen, Inc.
Job Description
AI / LLM Test Engineer/Quality Specialist to help improve the quality, reliability, and usefulness of responses generated by Large Language Models (LLMs). This role focuses on evaluating and improving AI output quality - not model training or research. Responsibilities include designing evaluation frameworks, identifying failure patterns, creating benchmark and test prompts, analyzing response quality, and proposing strategies to improve accuracy, reasoning, consistency, safety, and user experience. The ideal candidate has strong analytical thinking, experience with prompt engineering or AI testing, and creative ideas for measuring and improving real-world LLM performance. Key skills include Experience evaluating AI/LLM testing Prompt engineering and test case design QA methodologies and quality metrics Strong analytical and communication skills Familiarity with AI evaluation tools, automation, or data analysis Ability to identify edge cases, hallucinations, and response weaknesses