Loading...

Freelance Agent Evaluation Engineer

  • Part Time
  • Anywhere

Mindrift

We’re building a dataset to evaluate AI coding agents by creating challenging tasks and evaluation criteria within realistic simulated environments. As a Freelance Agent Evaluation Engineer, you’ll design tasks, write tests, and iterate with AI agents to evaluate their performance.

Requirements

  • Degree in Computer Science, Software Engineering, or related fields
  • 5+ years in software development, primarily Python (FastAPI, pytest, async/await, subprocess, file operations)
  • Background in full-stack development, with experience building React-based interfaces (JavaScript/TypeScript) and robust back-end systems
  • Experience writing tests (functional, integration — not just running them)
  • Docker containers, and familiarity with infrastructure tools (Postgres, Kafka, Redis)
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
  • English proficiency – B2

Benefits

  • Part-time, project-based work
  • Up to $21 per hour equivalent
  • Flexibility to work at own pace

Originally posted on Himalayas

To apply for this job please visit himalayas.app.

About this role & career path

Working in Peru

Peru, officially the Republic of Peru, is a country in western South America. It is bordered to the north by Ecuador and Colombia, to the east by Brazil, to the southeast by Bolivia, to the south by Chile, and to the south and west by the Pacific Ocean. Peru is a megadiverse country, with habitats ranging from the arid plains of the Pacific coastal region in the west, to the peaks of the Andes mountains extending from the north to the southeast of the country, to the tropical Amazon basin rainforest in the east with the Amazon River. Peru has a population of over 32 million, and its capital an

    More jobs at Mindrift

    Keep exploring on Get A Job.ai

    Not quite the right fit? Your next opportunity is a click away.

    Hiring instead? Post a job and reach candidates searching right now.