Skip to main content

Building MLC-LLM from Source on NVIDIA Jetson AGX Orin 32GB

In my previous post, I wrote about building MLC-LLM from source with uv on Ubuntu 24.04 on a desktop machine with a Ryzen 9 7950X and an RTX 4090. This time, I wanted to document a similar workflow on a very different system: the NVIDIA Jetson AGX Orin 32GB.

The overall flow turned out to be quite similar:

  1. install system dependencies
  2. install LLVM
  3. create a Python environment with uv
  4. build TVM from source
  5. build MLC-LLM from source on top of that TVM build

What changes is the target environment. On the desktop build, I used CUDA architecture settings for a discrete NVIDIA GPU. On Jetson AGX Orin, I used a Jetson-specific CUDA architecture setting and kept the build focused on the CUDA path I actually needed.

This post is a memo-based walkthrough of the exact process I used.

Target machine

This build was done on:

  • NVIDIA Jetson AGX Orin 32GB

I am only covering the MLC-LLM and TVM build steps here. I assume the Jetson system already has a working NVIDIA software stack appropriate for CUDA development.

Install system dependencies

First, I updated the package index and installed the build dependencies:

sudo apt update
sudo apt install -y \
  build-essential git git-lfs curl ca-certificates pkg-config wget \
  ccache libtinfo-dev zlib1g-dev libedit-dev libxml2-dev libzstd-dev \
  llvm-20-dev libpolly-20-dev

git lfs install

As in my Ubuntu desktop build, I also installed LLVM 20:

wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 20

I used LLVM 20 here because I wanted the TVM build to target a known LLVM toolchain explicitly through llvm-config-20.

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a workspace and virtual environment

I already had a ~/work directory, so I created the virtual environment there:

cd ~/work
uv venv -p 3.13 .venv
source .venv/bin/activate

Then I installed the Python-side build tools:

uv pip install cmake ninja setuptools

Clone MLC-LLM

After that, I cloned the repository with submodules:

git clone --recursive https://github.com/mlc-ai/mlc-llm

Build and install TVM from source

Before building MLC-LLM itself, I built TVM from the repository’s 3rdparty/tvm directory.

cd mlc-llm/3rdparty/tvm

Install tvm-ffi

The first step was installing tvm-ffi in editable mode:

uv pip install --editable 3rdparty/tvm-ffi --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DTVM_FFI_ATTACH_DEBUG_SYMBOLS=ON" \
  --config-setting cmake.args="-DTVM_FFI_BUILD_TESTS=OFF" \
  --config-setting cmake.args="-DTVM_FFI_BUILD_PYTHON_MODULE=ON" \
  --config-setting cmake.args="-DCMAKE_C_COMPILER_LAUNCHER=ccache" \
  --config-setting cmake.args="-DCMAKE_CXX_COMPILER_LAUNCHER=ccache"

Install TVM

Next, I installed TVM itself:

uv pip install --editable . --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DUSE_LLVM=llvm-config-20 --link-static" \
  --config-setting cmake.args="-DHIDE_PRIVATE_SYMBOLS=ON" \
  --config-setting cmake.args="-DUSE_CUDA=ON" \
  --config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87" \
  --config-setting cmake.args="-DUSE_CUBLAS=ON" \
  --config-setting cmake.args="-DUSE_CUTLASS=ON"\
  --config-setting cmake.args="-DUSE_THRUST=ON" \
  --config-setting cmake.args="-DUSE_NVTX=ON"

The key Jetson-specific difference from my desktop build was the CUDA architecture setting:

-DCMAKE_CUDA_ARCHITECTURES=87

In the Ubuntu desktop post, I used a broader architecture configuration suitable for that machine. On Jetson AGX Orin, I kept it targeted to the architecture I needed.

Verify that TVM installed correctly

Before moving on, I verified that TVM had CUDA enabled and could actually detect the CUDA device:

python -c "import tvm; print('USE_CUDA:', tvm.support.libinfo().get('USE_CUDA')); print('tvm.cuda().exist:', tvm.cuda().exist)"

On my system, this returned:

USE_CUDA: ON
tvm.cuda().exist: True

Install the MLC-LLM Python package

With TVM in place, I went back to the MLC-LLM root and installed the main package:

cd ../..

uv pip install --editable . --verbose --config-setting editable=compat \
  --config-setting cmake.args="-G Ninja" \
  --config-setting cmake.args="-DCMAKE_BUILD_TYPE=RelWithDebInfo" \
  --config-setting cmake.args="-DTVM_SOURCE_DIR='3rdparty/tvm'" \
  --config-setting cmake.args="-DUSE_CUDA=ON" \
  --config-setting cmake.args="-DUSE_CUTLASS=ON"\
  --config-setting cmake.args="-DUSE_CUBLAS=ON" \
  --config-setting cmake.args="-DUSE_VULKAN=OFF" \
  --config-setting cmake.args="-DUSE_METAL=OFF" \
  --config-setting cmake.args="-DUSE_OPENCL=OFF" \
  --config-setting cmake.args="-DUSE_OPENCL_ENABLE_HOST_PTR=OFF" \
  --config-setting cmake.args="-DUSE_THRUST=ON" \
  --config-setting cmake.args="-DCMAKE_CUDA_ARCHITECTURES=87" \
  --config-setting cmake.args="-DFLASHINFER_CUDA_ARCHITECTURES=87"

uv pip install --editable python --verbose --config-setting editable=compat

Verify that MLC-LLM installed successfully

Finally, I checked that the CLI and Python package were available:

mlc_llm chat -h
python -c "import mlc_llm; print(mlc_llm)"

If both commands work, the source build is in a good state.