Found Description
About The Role
In this role, you will contribute to the design, development, and optimization of high-performance GPU-accelerated algorithms for advanced numerical methods and computer vision applications. You will work closely with domain experts and engineering teams to translate complex mathematical models into efficient, scalable, and vector-friendly solutions, moving ideas from Python prototypes to production-grade C++ implementations. You will play an important role in improving application performance on modern NVIDIA GPU architectures, influencing technical decisions around CUDA kernel optimization, mixed-precision computing, task-based runtimes, and multi-GPU execution.
Responsibilities
- Design, implement, and tune high-performance CUDA kernels
- Profile, benchmark, and iterate using Nsys, Nsight
- Collaborate with domain scientists to translate mathematical models into vector-friendly algorithms
- Prototype new ap...