Skip to main content

Prerequisites

Before you begin, ensure you have the following:
  • Python Version:
    • Python 3.11 recommended (compatible with 3.11–3.13)
    • For Windows ARM64: Python 3.11 ARM64 version is required
  • Conda/Micromamba:
    • Miniconda (for macOS, Linux, Windows x64)
    • Micromamba (for Windows ARM64)

Step 1: Install Conda or Micromamba

It’s a best practice to create a dedicated environment for each project.

For macOS, Linux, and Windows (x64)

Check if you have conda installed:
conda --version
If conda is not installed, download and install Miniconda or Anaconda from:
https://www.anaconda.com/download/success

For Windows ARM64

Check if you have micromamba installed:
micromamba --version
If micromamba is not installed, download and install it from:
https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html

Step 2: Create a New Environment

For macOS, Linux, and Windows (x64)

Create a new conda environment for your Nexa SDK project:
conda create -n nexa-env python=3.11
Activate the environment:
  • Linux/macOS:
    conda activate nexa-env
    
  • Windows:
    conda activate nexa-env
    

For Windows ARM64

Windows ARM64 requires a special setup with the win-arm64 platform and ARM64 Python:
  1. Create the environment with win-arm64 platform specification:
CONDA_SUBDIR=win-arm64 micromamba create -n nexa-env python=3.11 -c conda-forge
Or alternatively:
micromamba create -n nexa-env python=3.11 --platform win-arm64 -c conda-forge
  1. Activate the environment:
micromamba activate nexa-env
  1. Configure the environment to use win-arm64 packages:
micromamba config append subdirs win-arm64
For Windows ARM64, you must use the ARM64 version of Python 3.11 from conda-forge with the win-arm64 platform specification for Nexa SDK to work properly.

Step 3: Install Nexa SDK

Install the latest NexaAI Python SDK from PyPI. The installation command varies by platform:

macOS

For macOS, you need to install the MLX version:
pip install 'nexaai[mlx]'

Windows (x64)

pip install nexaai

Windows (ARM64)

Make sure you’ve completed Step 2 for Windows ARM64 to set up the win-arm64 environment before installing.
pip install nexaai

Linux

pip install nexaai

Step 4: Authentication Setup

Before running any examples, you need to set up your NexaAI authentication token.

Set Token in Environment

Replace "YOUR_NEXA_TOKEN_HERE" with your actual NexaAI token from https://sdk.nexa.ai/:
  • Linux/macOS:
    export NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
    
  • Windows:
    $env:NEXA_TOKEN="YOUR_NEXA_TOKEN_HERE"
    

Step 5: Running Your First Model

Language Model (LLM)

Python
from nexaai.llm import LLM, GenerationConfig
from nexaai.common import ModelConfig, ChatMessage

# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/Qwen/Qwen3-0.6B-GGUF/Qwen3-0.6B-Q8_0.gguf"
m_cfg = ModelConfig()
llm = LLM.from_(model_path, plugin_id="cpu_gpu", device_id="cpu", m_cfg=m_cfg)

# Create conversation
conversation = [ChatMessage(role="system", content="You are a helpful assistant.")]
conversation.append(ChatMessage(role="user", content="Hello, how are you?"))

# Apply chat template and generate
prompt = llm.apply_chat_template(conversation)
for token in llm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100)):
    print(token, end="", flush=True)

Multimodal Model (VLM)

Python
from nexaai.vlm import VLM, GenerationConfig
from nexaai.common import ModelConfig, MultiModalMessage, MultiModalMessageContent

# Initialize model
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/gemma-3n-E4B-it-4bit-MLX/model-00001-of-00002.safetensors"
m_cfg = ModelConfig()
vlm = VLM.from_(name_or_path=model_path, m_cfg=m_cfg, plugin_id="cpu_gpu", device_id="")

# Create multimodal conversation
conversation = [MultiModalMessage(role="system", 
                                content=[MultiModalMessageContent(type="text", text="You are a helpful assistant.")])]

# Add user message with image
contents = [
    MultiModalMessageContent(type="text", text="Describe this image"),
    MultiModalMessageContent(type="image", text="path/to/image.jpg")
]
conversation.append(MultiModalMessage(role="user", content=contents))

# Apply chat template and generate
prompt = vlm.apply_chat_template(conversation)
for token in vlm.generate_stream(prompt, g_cfg=GenerationConfig(max_tokens=100, image_paths=["path/to/image.jpg"])):
    print(token, end="", flush=True)

Embedder

Python
from nexaai.embedder import Embedder, EmbeddingConfig

# Initialize embedder
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-fp16-mlx/model.safetensors"
embedder = Embedder.from_(name_or_path=model_path, plugin_id="cpu_gpu")

# Generate embeddings
texts = ["Hello world", "How are you?"]
config = EmbeddingConfig(batch_size=2)
embeddings = embedder.generate(texts=texts, config=config)

for text, embedding in zip(texts, embeddings):
    print(f"Text: {text}")
    print(f"Embedding dimension: {len(embedding)}")

Reranker

Python
from nexaai.rerank import Reranker, RerankConfig

# Initialize reranker
model_path = "~/.cache/nexa.ai/nexa_sdk/models/NexaAI/jina-v2-rerank-mlx/jina-reranker-v2-base-multilingual-f16.safetensors"
reranker = Reranker.from_(name_or_path=model_path, plugin_id="cpu_gpu")

# Rerank documents
query = "What is machine learning?"
documents = ["Machine learning is a subset of AI", "Python is a programming language"]
config = RerankConfig(batch_size=2)
scores = reranker.rerank(query=query, documents=documents, config=config)

for doc, score in zip(documents, scores):
    print(f"[{score:.4f}] {doc}")

Next Steps


I