Building MLC-LLM from Source on NVIDIA Jetson AGX Orin 32GB
In my previous post, I wrote about building MLC-LLM from source with uv on Ubuntu 24.04 on a desktop machine with a Ryzen 9 7950X and an RTX 4090. This time, I wanted to document a similar workflow on a very different system: the NVIDIA Jetson AGX Orin 32GB.
The overall flow turned out to be quite similar:
- install system dependencies
- install LLVM
- create a Python environment with
uv - build TVM from source
- build MLC-LLM from source on top of that TVM build
What changes is the target environment. On the desktop build, I used CUDA architecture settings for a discrete NVIDIA GPU. On Jetson AGX Orin, I used a Jetson-specific CUDA architecture setting and kept the build focused on the CUDA path I actually needed.
This post is a memo-based walkthrough of the exact process I used.
Target machine
This build was done on:
- NVIDIA Jetson AGX Orin 32GB
I am only covering the MLC-LLM and TVM build steps here. I assume the Jetson system already has a working NVIDIA software stack appropriate for CUDA development.
Install system dependencies
First, I updated the package index and installed the build dependencies:
sudo apt update
sudo apt install -y \
build-essential git git-lfs curl ca-certificates pkg-config wget \
ccache libtinfo-dev zlib1g-dev libedit-dev libxml2-dev libzstd-dev \
llvm-20-dev libpolly-20-dev
git lfs install
As in my Ubuntu desktop build, I also installed LLVM 20:
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 20
I used LLVM 20 here because I wanted the TVM build to target a known LLVM toolchain explicitly through llvm-config-20.
Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
Create a workspace and virtual environment
I already had a ~/work directory, so I created the virtual environment there:
cd ~/work
uv venv -p 3.13 .venv
source .venv/bin/activate
Then I installed the Python-side build tools:
uv pip install cmake ninja setuptools
Clone MLC-LLM
After that, I cloned the repository with submodules:
git clone --recursive https://github.com/mlc-ai/mlc-llm
Build and install TVM from source
Before building MLC-LLM itself, I built TVM from the repository’s 3rdparty/tvm directory.
cd mlc-llm/3rdparty/tvm
Install tvm-ffi
The first step was installing tvm-ffi in editable mode:
uv pip install --editable 3rdparty/tvm-ffi --verbose --config-setting editable=compat \
--config-setting cmake.args="-G Ninja" \
--config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
--config-setting cmake.args="-DTVM_FFI_ATTACH_DEBUG_SYMBOLS=ON" \
--config-setting cmake.args="-DTVM_FFI_BUILD_TESTS=OFF" \
--config-setting cmake.args="-DTVM_FFI_BUILD_PYTHON_MODULE=ON" \
--config-setting cmake.args="-DCMAKE_C_COMPILER_LAUNCHER=ccache" \
--config-setting cmake.args="-DCMAKE_CXX_COMPILER_LAUNCHER=ccache"
Install TVM
Next, I installed TVM itself:
uv pip install --editable . --verbose --config-setting editable=compat \
--config-setting cmake.args="-G Ninja" \
--config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
--config-setting cmake.args="-DUSE_LLVM=llvm-config-20 --link-static" \
--config-setting cmake.args="-DHIDE_PRIVATE_SYMBOLS=ON" \
--config-setting cmake.args="-DUSE_CUDA=ON" \
--config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87" \
--config-setting cmake.args="-DUSE_CUBLAS=ON" \
--config-setting cmake.args="-DUSE_CUTLASS=ON"\
--config-setting cmake.args="-DUSE_THRUST=ON" \
--config-setting cmake.args="-DUSE_NVTX=ON"
The key Jetson-specific difference from my desktop build was the CUDA architecture setting:
-DCMAKE_CUDA_ARCHITECTURES=87
In the Ubuntu desktop post, I used a broader architecture configuration suitable for that machine. On Jetson AGX Orin, I kept it targeted to the architecture I needed.
Verify that TVM installed correctly
Before moving on, I verified that TVM had CUDA enabled and could actually detect the CUDA device:
python -c "import tvm; print('USE_CUDA:', tvm.support.libinfo().get('USE_CUDA')); print('tvm.cuda().exist:', tvm.cuda().exist)"
On my system, this returned:
USE_CUDA: ON
tvm.cuda().exist: True
Install the MLC-LLM Python package
With TVM in place, I went back to the MLC-LLM root and installed the main package:
cd ../..
uv pip install --editable . --verbose --config-setting editable=compat \
--config-setting cmake.args="-G Ninja" \
--config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
--config-setting cmake.args="-DTVM_SOURCE_DIR='3rdparty/tvm'" \
--config-setting cmake.args="-DUSE_CUDA=ON" \
--config-setting cmake.args="-DUSE_CUTLASS=ON"\
--config-setting cmake.args="-DUSE_CUBLAS=ON" \
--config-setting cmake.args="-DUSE_VULKAN=OFF" \
--config-setting cmake.args="-DUSE_METAL=OFF" \
--config-setting cmake.args="-DUSE_OPENCL=OFF" \
--config-setting cmake.args="-DUSE_OPENCL_ENABLE_HOST_PTR=OFF" \
--config-setting cmake.args="-DUSE_THRUST=ON" \
--config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87" \
--config-setting cmake.args="-DFLASHINFER_CUDA_ARCHITECTURES=87"
uv pip install --editable python --verbose --config-setting editable=compat
Verify that MLC-LLM installed successfully
Finally, I checked that the CLI and Python package were available:
mlc_llm chat -h
python -c "import mlc_llm; print(mlc_llm)"
If both commands work, the source build is in a good state.
M on a Desktop PC](/blog/2026-03-11-compile-and-run-open-source-llms-with-mlc-llm-on-rtx-4090.md) -...