I

Member of Technical Staff, TPU or AMD GPU Performance Engineering

Inferact

singapore, singapore, Singapore Full-time June 30, 2026

Found Description

Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware, a position that took years to build.

About the Role

We're looking for an AMD GPU performance engineer to make vLLM a first-class inference engine across the AMD accelerator ecosystem. You'll build and optimize AMD GPU backends, kernels, runtime paths, and benchmarking infrastructure using ROCm, HIP, Triton, CK, AITER, and related tooling so vLLM can deliver frontier inference performance on AMD GPUs.

You’ll work at the boundary of inference systems, kernels, compilers, and hardware architecture, improving performance‑critical paths such as attention, GEMM, sampling, KV cache, and communication‑heavy operations. Your work will help make AMD GPU support in vLLM usable, fast, benchmarked, and maintainable.

Skills and...

Ready to Apply?

Submit your application for Member of Technical Staff, TPU or AMD GPU Performance Engineering at Inferact

Apply Now