Ingestion

Semango indexes files using built-in loaders. This page reflects what is actually implemented in the current codebase.

Supported file types

The code loader detects language from file extension and stores code as text. Tree-sitter parsing is not implemented yet.

Large code files (> 5MB) are skipped.

Supported extensions:

.go .js .ts .py .jsx .tsx .java .c .cpp .h .hpp .rs .rb .php .cs .swift .kt .scala

Chunking is applied to text and PDF content:

yaml

files:
  chunk_size: 1000
  chunk_overlap: 200

For tabular files, Semango converts each row to a text snippet and embeds it (subject to tabular limits).