Found Description
About Us
We are a stealth‑mode startup building next‑generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility, making advanced models portable, efficient, and customizable for real‑world deployment.
If you want to be part of a small, fast‑moving team shaping the future of applied AI systems, this is your opportunity.
Role
We are looking for a C++ Engineer with a strong systems and GPU programming background to help extend and optimize an open‑source AI inference runtime. You will work on low‑level internals of large language model serving, focusing on:
- Dynamic adapter integration (e.g., LoRA/QLoRA)
- Incremental model update mechanisms
- Multi‑session inference caching and scheduling
- GPU performance improvements (Tensor Cores, CUDA/ROCm)
Th...