Found Description
About the job
As a member of our AI model team, you will drive innovation in model serving and inference architectures for advanced AI systems. Your work will focus on optimizing model deployment and inference strategies to deliver highly responsive, efficient, and scalable performance across real‑world applications. You will work on a wide spectrum of systems, ranging from resource‑efficient models designed for limited hardware environments to complex, multi‑modal architectures that integrate data such as text, images, and audio.
Responsibilities
- Design and deploy state‑of‑the‑art model serving architectures that deliver high throughput and low latency while optimizing memory usage. Ensure these pipelines run efficiently across diverse environments, including resource‑constrained devices and edge platforms. Establish clear performance targets such as reduced latency, improved token response, and minimized memory footprint.
- Build, run, ...
Ready to Apply?
Submit your application for AI Research Engineer (Kernel & Inference Optimization) at Tether Operations Limited
Apply Now