Gramian Consulting Group

We are looking for an AI Evaluation Engineer specialized in data analysis to design benchmark tasks that simulate real-world analytical workflows.

Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)
Implement evaluation pipelines using Python and SQL
Create reproducible environments using Docker
Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Originally posted on Himalayas

To apply for this job please visit himalayas.app.

Keep exploring on Get A Job.ai

Not quite the right fit? Your next opportunity is a click away.

Hiring instead? Post a job and reach candidates searching right now.

Get A Job.ai