AI Quality Evaluator (Polish) | $15/hr Remote

Crossing Hurdles

FULL_TIME Remote · US Poland, NY, United States, NY, US Posted: 2026-05-11 Until: 2026-07-10

You will be redirected to the original job posting on BeBee.
Apply directly with the employer.

Job Description

Responsibilities Evaluate AI model responses for personalization quality, including grounding, integration, and helpfulness. Design and execute multi-turn prompts based on personal context to test AI capabilities. Analyze responses for hallucinations, incorrect personalization, and poor inferences. Perform side-by-side comparison of model outputs to determine quality and effectiveness. Write clear and structured rationales for response evaluations and rankings. Extract and verify debug information to ensure proper use of data sources. Maintain strict data hygiene and ensure accurate documentation of evaluations. Collaborate with cross-functional teams to improve AI model performance. Requirements Strong proficiency in Polish with excellent reading and writing skills. Experience in data annotation, AI evaluation, content moderation, or a related role. Strong analytical thinking and ability to assess nuanced AI responses. Ability to design creative, multi-turn prompts based on personal context. Understanding of personalization concepts, including identifying incorrect or forced personalization. High attention to detail in evaluating subtle differences in model outputs. Excellent written communication and structured reasoning skills. Ability to work independently in a remote environment. Willingness to use a personal Google account for evaluation purposes. Full-time availability with at least 4 hours overlap with PST. Bachelor’s degree or equivalent experience in a relevant analytical field.