Changelog
All notable changes to Search Toolkit are documented here.
0.0.9
0.0.9
Breaking changes
Document model
Search Toolkit now uses a unified document model built around Document and DocumentChunk, with a deterministic identity derived from a source_id and a locator. See the Document model page for full details.
- Extractors now produce
DocumentChunkobjects directly; the separate page representation has been removed. Document.idandDocumentChunk.idare now computed deterministically fromsource_id(pluslocatorfor chunks), making indexing idempotent. The explicitidfield onFileanddocument_idonDocumentChunkhave been removed.- Added
source_id,locator,parent_ref, andchunk_typeas first-class fields, along with typed, extensible metadata models. The same identity contract is mirrored onSearchResultChunk.
Vespa indexing model
Vespa now indexes one chunk per document via the new DOCUMENT_PER_CHUNK indexing mode, which becomes the recommended model. The previous single-document model is deprecated.
- Added the
DOCUMENT_PER_CHUNKindexing mode, including default fields, ranking profiles, and the full write, delete, and search paths. - Added an
IndexingModeto the schema definition with deprecation hooks for migrating existing schemas. - The index API is split into a base
VespaSearchIndexand a dedicatedSingleDocumentSearchIndex; the single-document model is deprecated in favor ofDOCUMENT_PER_CHUNK. - The schema
id_fieldis deprecated and is no longer allowed forDOCUMENT_PER_CHUNKindexes.
Other
- Renamed the
indicesmodule tosearch. Update imports accordingly.
Improvements
- Added blob-storage
FileLoaderimplementations for S3, Azure, and GCS, plus astorage-s3extra. - Added a
language-detection-fasttextplugin. - Added OCR model literals and constants.
- Vespa: extracted a dedicated
VespaClientwith improved error handling. - Vespa: added a backend-agnostic services definition, topology v2, and a translator, with automatic v2 topology generation for single-node Docker deployments.
- Vespa: added Vespa-to-Vespa copy and index-to-streaming migration workflows.
- Vespa: emit a metrics consumer in
services.xml. - Vespa: warn when rank2 features are configured without rank1, and when ranking weights default to 0.
Security
- Updated
langchain-coreto~=1.4.
Bugfixes
- Vespa: rank by cosine similarity instead of euclidean distance.
- Vespa: fix retrieval of the document count.
- Vespa: thread
distribute_across_groupsintoload_topology_file. - Vespa CLI: lazy-import index registration SDK models.
- Treat truncated LLM responses as retryable and enrich
LLMException/SummaryGenerationErrorfor structured logging. - OCR extractor: use mimetype metadata for filetype detection.
- Added
text/x-fileandtext/x-script.pythonMIME types to the registry.
0.0.8
0.0.8
Initial release of Search Toolkit as a tech preview.