Featherless AI
About the Role
We’re looking for an AI Researcher focused on multilingual data to help us build and scale next-generation language models across diverse languages and domains. You’ll own research and execution around data sourcing, curation, evaluation, and training strategies for multilingual and low-resource languages, with a strong emphasis on publishing high-quality research and translating it into production systems.
This role is ideal for someone who enjoys working close to the frontier: balancing papers, prototypes, and real-world impact in a fast-moving startup environment.
What You’ll Do
-
Design and execute research on multilingual datasets, including data collection, filtering, deduplication, and quality measurement
-
Develop strategies for low-resource and long-tail languages (sampling, augmentation, curriculum design)
-
Research and improve cross-lingual transfer, alignment, and robustness in large language models
-
Build and maintain evaluation benchmarks for multilingual performance
-
Collaborate with engineers and researchers on training pipelines and model architecture decisions
-
Publish research at top venues (e.g., ACL, EMNLP, NeurIPS, ICML, ICLR) and contribute to open-source when appropriate
-
Translate research insights into practical improvements in production models
What We’re Looking For
-
Strong background in NLP / ML research, with a focus on multilingual or cross-lingual modeling
-
Publication record at respected conferences or journals (ACL, EMNLP, NeurIPS, ICML, ICLR, etc.)
-
Experience working with large-scale text datasets across multiple languages
-
Solid understanding of:
-
Tokenization and vocabulary design for multilingual models
-
Data quality metrics, filtering, and dataset bias
-
Transfer learning and multilingual representation learning
-
-
Comfortable prototyping in Python with modern ML frameworks (PyTorch, JAX, etc.)
-
Ability to operate independently and ship research in a startup pace environment
Nice to Have
-
Experience with low-resource languages or non-Latin scripts
-
Open-source contributions in NLP or data tooling
-
Experience training or evaluating large language models
-
Familiarity with multilingual benchmarks (e.g., XTREME, FLORES, TyDi QA)
Why Join Us
-
Real ownership over research direction and impact
-
A team that values papers and production
-
Access to meaningful scale: large datasets, modern infrastructure, and fast iteration
-
Competitive compensation and meaningful equity at an early stage
Originally posted on Himalayas
To apply for this job please visit himalayas.app.
Keep exploring on Get A Job.ai
Not quite the right fit? Your next opportunity is a click away.
- Browse all jobs
- More jobs by category
- Remote jobs you can do from anywhere
- Research typical pay for this role
- Set a job alert so new matches reach you first
- Upload your resume to apply faster
Hiring instead? Post a job and reach candidates searching right now.