Full Time
Remote
Posted 4 weeks ago

Featherless AI

About the Role

We’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.

This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.

What You’ll Do

Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement
Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)
Research and improve cross-lingual transfer, alignment, and robustness in large language models
Build and maintain evaluation benchmarks for multilingual performance
Collaborate with engineers and researchers on training pipelines and model architecture decisions
Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate
Translate research insights into practical improvements in production models

What We’re Looking For

Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling
Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)
Experience working with large-scale text datasets across multiple languages
Solid understanding of:
- Tokenization and vocabulary design for multilingual models
- Data quality metrics, filtering, and dataset bias
- Transfer learning and multilingual representation learning
Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)
Ability to operate independently and ship research in a startup pace environment

Nice to Have

Experience with low-resource languages or non-Latin scripts
Open-source contributions in NLP or data tooling
Experience training or evaluating large language models
Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)

Why Join Us

Real ownership over research direction and impact
A team that values papers and production
Access to meaningful scale: large datasets, modern infrastructure, and fast iteration
Competitive compensation and meaningful equity at an early stage

Originally posted on Himalayas

To apply for this job please visit himalayas.app.

Keep exploring on Get A Job.ai

Not quite the right fit? Your next opportunity is a click away.

Browse all jobs
More jobs by category
Remote jobs you can do from anywhere
Research typical pay for this role
Set a job alert so new matches reach you first
Upload your resume to apply faster

Hiring instead? Post a job and reach candidates searching right now.

Get A Job.ai

AI Researcher – Multilingual Data

About the Role

What You’ll Do

What We’re Looking For

Solid understanding of:

Nice to Have

Why Join Us

Keep exploring on Get A Job.ai

Lead Product Manager, Safety

Oracle Record to Report Lead

Data Analyst (F/H)

Board Advisor (Volunteer)

Accounts Receivable Specialist – Freelance, Remote

People Operations Specialist – Contracts & HR Administration

Senior Information Security Engineer

Senior Windows Administator / Platform Engineer IV

Community Manager (Senior Level Considered)

Growth Marketing Lead | $125K-$150K USD + Bonus + Equity + Remote | Award Winnin