Local embeddings (ONNX) ​
Semango’s local embedding provider runs ONNX models via ONNX Runtime. It does not use Python or SentenceTransformers.
Choose a model ​
You can use any Hugging Face repo that contains a .onnx file. The default config points to the onnx-models org:
embedding:
provider: local
model: onnx-models/bge-small-en-v1.5-onnxDownload models with the CLI ​
Semango includes a model manager:
# Search models on Hugging Face (onnx-models org)
semango models search bge
# Download by alias or full ID
semango models download bge-small
# List installed models
semango models listModels are cached under:
~/.cache/semango/modelsGPU Acceleration ​
Semango 🥠supports GPU acceleration for local embeddings via CUDA. It is enabled by default and will automatically fall back to CPU if no compatible GPU or CUDA runtime is found.
To explicitly configure GPU usage:
embedding:
provider: local
model: onnx-models/bge-small-en-v1.5-onnx
# true (default): try GPU, fallback to CPU
# false: force CPU only
gpu: trueWhen GPU is successfully enabled, you will see a log entry: INFO GPU acceleration (CUDA) enabled for ONNX session
Point to a local path ​
You can also use a local model directory or an explicit .onnx file:
embedding:
provider: local
model: /path/to/model-dir
# or: /path/to/model.onnxOutput name override ​
Some ONNX models expose non-standard output names. You can override:
embedding:
provider: local
model: onnx-models/bge-small-en-v1.5-onnx
onnx_output_name: sentence_embedding