Changelog

All notable changes to Search Toolkit are documented here.

0.0.9

0.0.9

Breaking changes

Document model

Search Toolkit now uses a unified document model built around Document and DocumentChunk, with a deterministic identity derived from a source_id and a locator. See the Document model page for full details.

  • Extractors now produce DocumentChunk objects directly; the separate page representation has been removed.
  • Document.id and DocumentChunk.id are now computed deterministically from source_id (plus locator for chunks), making indexing idempotent. The explicit id field on File and document_id on DocumentChunk have been removed.
  • Added source_id, locator, parent_ref, and chunk_type as first-class fields, along with typed, extensible metadata models. The same identity contract is mirrored on SearchResultChunk.

Vespa indexing model

Vespa now indexes one chunk per document via the new DOCUMENT_PER_CHUNK indexing mode, which becomes the recommended model. The previous single-document model is deprecated.

  • Added the DOCUMENT_PER_CHUNK indexing mode, including default fields, ranking profiles, and the full write, delete, and search paths.
  • Added an IndexingMode to the schema definition with deprecation hooks for migrating existing schemas.
  • The index API is split into a base VespaSearchIndex and a dedicated SingleDocumentSearchIndex; the single-document model is deprecated in favor of DOCUMENT_PER_CHUNK.
  • The schema id_field is deprecated and is no longer allowed for DOCUMENT_PER_CHUNK indexes.

Other

  • Renamed the indices module to search. Update imports accordingly.

Improvements

  • Added blob-storage FileLoader implementations for S3, Azure, and GCS, plus a storage-s3 extra.
  • Added a language-detection-fasttext plugin.
  • Added OCR model literals and constants.
  • Vespa: extracted a dedicated VespaClient with improved error handling.
  • Vespa: added a backend-agnostic services definition, topology v2, and a translator, with automatic v2 topology generation for single-node Docker deployments.
  • Vespa: added Vespa-to-Vespa copy and index-to-streaming migration workflows.
  • Vespa: emit a metrics consumer in services.xml.
  • Vespa: warn when rank2 features are configured without rank1, and when ranking weights default to 0.

Security

  • Updated langchain-core to ~=1.4.

Bugfixes

  • Vespa: rank by cosine similarity instead of euclidean distance.
  • Vespa: fix retrieval of the document count.
  • Vespa: thread distribute_across_groups into load_topology_file.
  • Vespa CLI: lazy-import index registration SDK models.
  • Treat truncated LLM responses as retryable and enrich LLMException / SummaryGenerationError for structured logging.
  • OCR extractor: use mimetype metadata for filetype detection.
  • Added text/x-file and text/x-script.python MIME types to the registry.
0.0.8

0.0.8

Initial release of Search Toolkit as a tech preview.