Prerequisites
Before you begin, ensure you have the following:
- Python Version:
- Python 3.11 recommended (compatible with 3.11–3.13)
- For Windows ARM64: Python 3.11 ARM64 version is required
- Conda/Micromamba:
- Miniconda (for macOS, Linux, Windows x64)
- Micromamba (for Windows ARM64)
Step 1: Install Conda or Micromamba
It’s a best practice to create a dedicated environment for each project.
For macOS, Linux, and Windows (x64)
Check if you have conda installed:
If conda is not installed, download and install Miniconda or Anaconda from:
https://www.anaconda.com/download/success
For Windows ARM64
Check if you have micromamba installed:
If micromamba is not installed, download and install it from:
https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html
Step 2: Create a New Environment
For macOS, Linux, and Windows (x64)
Create a new conda environment for your Nexa SDK project:
conda create -n nexa-env python=3.11
Activate the environment:
For Windows ARM64
Windows ARM64 requires a special setup with the win-arm64 platform and ARM64 Python:
- Create the environment with win-arm64 platform specification:
CONDA_SUBDIR=win-arm64 micromamba create -n nexa-env python=3.11 -c conda-forge
Or alternatively:
micromamba create -n nexa-env python=3.11 --platform win-arm64 -c conda-forge
- Activate the environment:
micromamba activate nexa-env
- Configure the environment to use win-arm64 packages:
micromamba config append subdirs win-arm64
For Windows ARM64, you must use the ARM64 version of Python 3.11 from conda-forge with the win-arm64 platform specification for Nexa SDK to work properly.
Step 3: Install Nexa SDK
Install the latest NexaAI Python SDK from PyPI. The installation command varies by platform:
macOS
For macOS, you need to install the MLX version:
pip install 'nexaai[mlx]'
Windows (x64)
Windows (ARM64)
Make sure you’ve completed Step 2 for Windows ARM64 to set up the win-arm64 environment before installing.
Linux
Step 4: Authentication Setup
Before running any examples, you need to set up your NexaAI authentication token.
Set Token in Environment
Replace "YOUR_NEXA_TOKEN_HERE" with your actual NexaAI token from https://sdk.nexa.ai/:
-
Linux/macOS:
export NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
-
Windows:
$env:NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
Step 5: Running Your First Model
Language Model (LLM)
from nexaai.llm import LLM, GenerationConfig
from nexaai.common import ModelConfig, ChatMessage
# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/Qwen/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf"
m_cfg = ModelConfig()
llm = LLM.from_(model_path, plugin_id="cpu_gpu", device_id="cpu", m_cfg=m_cfg)
# Create conversation
conversation = [ChatMessage(role="system", content="You are a helpful assistant.")]
conversation.append(ChatMessage(role="user", content="Hello, how are you?"))
# Apply chat template and generate
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100)):
print(token, end="", flush=True)
Multimodal Model (VLM)
from nexaai.vlm import VLM, GenerationConfig
from nexaai.common import ModelConfig, MultiModalMessage, MultiModalMessageContent
# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/gemma-3n-E4B-it-4bit-MLX/model-00001-of-00002.safetensors"
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_path, m_cfg=m_cfg, plugin_id="cpu_gpu", device_id="")
# Create multimodal conversation
conversation = [MultiModalMessage(role="system",
content=[MultiModalMessageContent(type="text", text="You are a helpful assistant.")])]
# Add user message with image
contents = [
MultiModalMessageContent(type="text", text="Describe this image"),
MultiModalMessageContent(type="image", text="path/to/image.jpg")
]
conversation.append(MultiModalMessage(role="user", content=contents))
# Apply chat template and generate
prompt = vlm.apply_chat_template(conversation)
for token in vlm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100, image_paths=["path/to/image.jpg"])):
print(token, end="", flush=True)
Embedder
from nexaai.embedder import Embedder, EmbeddingConfig
# Initialize embedder
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-fp16-mlx/model.safetensors"
embedder = Embedder.from_(name_or_path=model_path, plugin_id="cpu_gpu")
# Generate embeddings
texts = ["Hello world", "How are you?"]
config = EmbeddingConfig(batch_size=2)
embeddings = embedder.generate(texts=texts, config=config)
for text, embedding in zip(texts, embeddings):
print(f"Text: {text}")
print(f"Embedding dimension: {len(embedding)}")
Reranker
from nexaai.rerank import Reranker, RerankConfig
# Initialize reranker
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-rerank-mlx/jina-reranker-v2-base-multilingual-f16.safetensors"
reranker = Reranker.from_(name_or_path=model_path, plugin_id="cpu_gpu")
# Rerank documents
query = "What is machine learning?"
documents = ["Machine learning is a subset of AI", "Python is a programming language"]
config = RerankConfig(batch_size=2)
scores = reranker.rerank(query=query, documents=documents, config=config)
for doc, score in zip(documents, scores):
print(f"[{score:.4f}] {doc}")
Next Steps