From 30 to 60 Tokens/Second: How I Got vLLM Running on 2x RTX 3090

📰 Medium · LLM

Learn how to install and run vLLM on 2x RTX 3090 to achieve 60 tokens/second, a significant performance boost for LLM applications

advanced Published 6 May 2026

Action Steps

Install Ubuntu on a machine with 2x RTX 3090
Configure the NVIDIA drivers for optimal performance
Build and install vLLM using the provided instructions
Run vLLM on the configured hardware to achieve 60 tokens/second
Test and validate the performance of vLLM on the 2x RTX 3090 setup
Optimize vLLM settings for further performance improvements

Who Needs to Know This

This guide benefits AI engineers and researchers working with large language models, as it provides a step-by-step approach to optimizing vLLM performance on high-end hardware

Key Insight

💡 With the right hardware and configuration, vLLM can achieve significant performance gains, making it suitable for demanding LLM applications