LLM Usage
Large Language Models for text generation and chat applications.Streaming Conversation
We support CPU/GPU inference for GGUF format models. You can pick any GGUF models from the community and run with thecpu_gpu plugin.
CPU/GPU Configuration
Control whether your model runs on CPU or GPU using a combination ofdevice_id and nGpuLayers:
GPU Execution Requirements:
device_idmust be set to"GPUOpenCL"nGpuLayersmust be greater than0(typically set to999to offload all layers)
device_idisnull(default)- OR
nGpuLayersis0
Example: Running on GPU
Example: Running on CPU (Default)
Multimodal Usage
Vision-Language Models for image understanding and multimodal applications.Streaming Conversation
We support CPU/GPU inference for GGUF format models.ASR Usage
Automatic Speech Recognition for audio transcription.Basic Usage
We support CPU inference for whisper.cpp models.TTS Usage
Text-to-Speech synthesis for converting text into natural-sounding speech.Basic Usage
We support CPU inference for TTS models in GGUF format.Need Help?
Join our community to get support, share your projects, and connect with other developers.Discord Community
Get real-time support and chat with the Nexa AI community
Slack Community
Collaborate with developers and access community resources
Was this page helpful?