Building MLC-LLM from Source with uv on Ubuntu 24.04

I wanted a source build of MLC-LLM that was transparent, reproducible, and easy to rebuild while iterating on the code. This is the setup that worked for me on Ubuntu 24.04, using uv to manage the Python environment and package installs.

This post focuses on the build itself. My machine already had a working NVIDIA setup, so I am not covering driver or CUDA toolkit installation here.

Machine I used

CPU: AMD Ryzen 9 7950X 16-Core Processor
GPU: NVIDIA RTX 4090
Memory: 64GB
Storage: 4TB SSD
OS: Ubuntu 24.04

Install system dependencies

First, I updated the system and installed the packages I needed for building TVM and MLC-LLM:

sudo apt update
sudo apt install -y \
  build-essential git git-lfs curl ca-certificates pkg-config wget \
  ccache libtinfo-dev zlib1g-dev libedit-dev libxml2-dev libzstd-dev

git lfs install

I also installed LLVM 20 from apt.llvm.org:

wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 20

sudo apt install -y llvm-20-dev libpolly-20-dev

Depending on your base Ubuntu image, you may need to add the LLVM apt repository before llvm-20-dev and libpolly-20-dev are available. In my case, LLVM 20 was the toolchain I wanted to target, so I used llvm-config-20 later in the TVM build.

Create a workspace and Python environment

I created a working directory, then used uv to create a virtual environment with Python 3.13:

mkdir -p ~/work && cd ~/work
uv venv --python 3.13
source .venv/bin/activate

Then I installed the Python-side build tools:

uv pip install cmake ninja setuptools

Clone the repository

Next, I cloned MLC-LLM with submodules:

git clone --recursive https://github.com/mlc-ai/mlc-llm

Build and install TVM from source

Before installing MLC-LLM itself, I built TVM from the vendored source tree under 3rdparty/tvm.

I did this in two steps:

install tvm-ffi
install TVM itself

Install `tvm-ffi`

cd mlc-llm/3rdparty/tvm

uv pip install --editable 3rdparty/tvm-ffi --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DTVM_FFI_ATTACH_DEBUG_SYMBOLS=ON" \
  --config-setting cmake.args="-DTVM_FFI_BUILD_TESTS=OFF" \
  --config-setting cmake.args="-DTVM_FFI_BUILD_PYTHON_MODULE=ON" \
  --config-setting cmake.args="-DCMAKE_C_COMPILER_LAUNCHER=ccache" \
  --config-setting cmake.args="-DCMAKE_CXX_COMPILER_LAUNCHER=ccache"

A few details here were intentional:

I used RelWithDebInfo instead of a pure release build so I could keep debug symbols around.
I enabled ccache so rebuilds would be faster.
I installed in editable mode because this is a source-oriented workflow.

Install TVM

uv pip install --editable . --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DUSE_LLVM=llvm-config-20 --link-static" \
  --config-setting cmake.args="-DHIDE_PRIVATE_SYMBOLS=ON" \
  --config-setting cmake.args="-DUSE_CUDA=ON" \
  --config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87;89" \
  --config-setting cmake.args="-DUSE_CUBLAS=ON" \
  --config-setting cmake.args="-DUSE_CUTLASS=ON" \
  --config-setting cmake.args="-DUSE_THRUST=ON" \
  --config-setting cmake.args="-DUSE_NVTX=ON"

This is where the build becomes explicitly CUDA-oriented. On this machine, I wanted TVM to build with:

LLVM support
CUDA enabled
cuBLAS enabled
CUTLASS enabled
Thrust enabled
NVTX enabled

Verify that TVM can see CUDA

Before going any further, I checked that TVM actually installed correctly and could see CUDA:

python -c "import tvm; print('USE_CUDA:', tvm.support.libinfo().get('USE_CUDA')); print('tvm.cuda().exist:', tvm.cuda().exist)"

On my system, I got:

USE_CUDA: ON
tvm.cuda().exist: True

Install the MLC-LLM Python package

Once TVM was in place, I moved back to the MLC-LLM root and installed the main package from source:

cd ../..

uv pip install --editable . --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DTVM_SOURCE_DIR='3rdparty/tvm'" \
  --config-setting cmake.args="-DUSE_CUDA=ON" \
  --config-setting cmake.args="-DUSE_CUTLASS=ON" \
  --config-setting cmake.args="-DUSE_CUBLAS=ON" \
  --config-setting cmake.args="-DUSE_VULKAN=OFF" \
  --config-setting cmake.args="-DUSE_METAL=OFF" \
  --config-setting cmake.args="-DUSE_OPENCL=OFF" \
  --config-setting cmake.args="-DUSE_OPENCL_ENABLE_HOST_PTR=OFF" \
  --config-setting cmake.args="-DUSE_THRUST=ON" \
  --config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87;89" \
  --config-setting cmake.args="-DFLASHINFER_CUDA_ARCHITECTURES=87;89"

uv pip install --editable python --verbose --config-setting editable=compat

A couple of choices are worth calling out here:

I pointed the build at the vendored TVM tree with -DTVM_SOURCE_DIR='3rdparty/tvm'.
I explicitly disabled backends I did not need on this machine, such as Vulkan, Metal, and OpenCL.
I kept the CUDA architecture configuration aligned across the build.

Verify that MLC-LLM installed correctly

After the build finished, I verified both the CLI and the Python package:

mlc_llm chat -h
python -c "import mlc_llm; print(mlc_llm)"

If those two checks pass, the source install is in a good state.

Notes on the build flags

A few of the flags I used are worth understanding:

-DCMAKE_BUILD_TYPE=RelWithDebInfo
A good compromise between optimized binaries and usable debug symbols.
-DUSE_LLVM=llvm-config-20 --link-static
Tells TVM exactly which LLVM toolchain to use.
-DCMAKE_CUDA_ARCHITECTURES=87;89 and -DFLASHINFER_CUDA_ARCHITECTURES=87;89
These control which GPU architectures are targeted during compilation. This was the configuration I used successfully on my machine.

Final thoughts

On Ubuntu 24.04, building MLC-LLM from source with uv was straightforward once I treated it as a two-stage process:

build TVM correctly with LLVM and CUDA
build MLC-LLM against that TVM tree

If your goal is a reproducible, source-based MLC-LLM setup on a CUDA machine, this workflow is a solid starting point.