Found Description
Confidential is looking for an expert in optimizing large language model (LLM) performance. In this role, you will optimize LLM inference for cost, latency, and throughput while profiling and tuning GPU performance at a deep level. Collaboration with the model and platform teams is essential to enhance architecture performance.
Deep experience in deep-learning inference optimization, hands-on GPU programming, and fluency in modern LLM serving stacks are required. This role is crucial for high-performance production environments.
#J-18808-LjbffrReady to Apply?
Submit your application for Senior LLM Inference & GPU Optimization Engineer at Confidential
Apply Now