Found Description
Our client is a fast-growing AI research and technology company building reasoning-first, agentic AI systems, with a footprint spanning the US and Asia. The team is behind several widely adopted open-source research agents that have posted top-tier results on industry benchmarks, and is led by scientific leadership with backgrounds spanning top US universities and frontier AI labs. Backed by a serial entrepreneur with a track record of building category-defining tech companies, the company is now scaling its compute infrastructure to support next-generation training and inference workloads at massive scale.
The Role
Build and evolve the core infrastructure layer for large-scale AI training and inference on 10,000+ GPU clusters — Kubernetes scheduling, storage, networking, and reliability engineering that makes massive shared compute efficient, reliable, and easy to operate for research and engineering teams.
What You'll Do
- Build and evolve Kub...