Software Engineer - Pretraining Data
Company: Magic
Location: Seattle
Posted on: May 24, 2025
Job Description:
Magic's mission is to build safe AGI that accelerates humanity's
progress on the world's most important problems. We believe the
most promising path to safe AGI lies in automating research and
code generation to improve models and solve alignment more reliably
than humans can alone. Our approach combines frontier-scale
pre-training, domain-specific RL, ultra-long context, and
inference-time compute to achieve this goal.About the role:As a
Software Engineer working on our pretraining data, you write
efficient and robust pipelines for giant, multimodal datasets. You
will develop and optimize web scraping techniques to harvest and
maintain data at internet-scale.What you might work on:
- Design & implement multimodal (video, audio, text etc) web
crawlers for scraping and indexing petabytes of data
- Create large scale data processing pipelines using tools like
Ray, Apache Spark, Apache Flink, Google BigQuery etc.
- Implement and scale deduplication techniques across modalities
and apply heuristic and model-based techniques for parsing and
filtering crawled data
- Identify new data sources for inclusion in pre/post-training
datasetsWhat we're looking for:
- Strong proficiency in distributed computing and parallel
processing techniques
- Obsession with details, reliability, and good testing to ensure
data quality and integrity
- Experience with designing and maintaining high-performance,
scalable data architectures
- Ability to design, develop and operate an LLM data pipeline
from web scraping to data loadingMagic strives to be the place
where high-potential individuals can do their best work. We value
quick learning and grit just as much as skill and experience.Our
culture:
- Integrity. Words and actions should be aligned
- Hands-on. At Magic, everyone is building
- Teamwork. We move as one team, not N individuals
- Focus. Safely deploy AGI. Everything else is noise
- Quality. Magic should feel like magicCompensation, benefits and
perks (US):
- Annual salary range: $100K - $550K
- Equity is a significant part of total compensation, in addition
to salary
- 401(k) plan with 6% salary matching
- Generous health, dental and vision insurance for you and your
dependents
- Unlimited paid time off
- Visa sponsorship and relocation stipend to bring you to SF, if
possible
- A small, fast-paced, highly focused team
#J-18808-Ljbffr
Keywords: Magic, Edmonds , Software Engineer - Pretraining Data, IT / Software / Systems , Seattle, Washington
Didn't find what you're looking for? Search again!
Loading more jobs...