Reference for the helpers used inside migrations. All helpers are imported from mistralai.search.toolkit.plugins.vespa.migration and called from within a migration's migrate() method.

from mistralai.search.toolkit.plugins.vespa.migration import (
    VespaMigration,
    set_app_name,
    create_schema,
    add_field,
    add_query_profiles,
)
Application configuration

Application configuration

set_app_name

def set_app_name(app_name: str) -> None

Set the application name for the migration run. Required in the first migration. Must contain only lowercase letters (a-z).

set_content_id

def set_content_id(content_id: str) -> None

Override the content cluster id used in services.xml (defaults to the application name). Use this when the deployed cluster was originally created with a different id. Otherwise, Vespa treats a rename as a remove and recreate operation, destroying all stored documents.

allow_schema_removal

def allow_schema_removal(until: str) -> None

Mark a migration as intentionally removing one or more schemas. Vespa rejects schema removals by default to protect data; this writes a validation-overrides.xml permitting the removal until until (an ISO "YYYY-MM-DD" date). Pick a date a week or two past your deploy window.

set_distribute_across_groups

def set_distribute_across_groups(value: bool) -> None
Warning

Deprecated. A compatibility hook for Kubernetes autodiscovery. Define groups explicitly in the topology file for new deployments.

Creating schemas

Creating schemas

create_schema

def create_schema(
    name: str,
    mode: SearchMode,
    embedding_dimensions: int,
    indexing_mode: IndexingMode,
    fields: list[FieldDefinition] | None = None,
    custom_functions: list[FunctionWithInputs] | None = None,
    default_query_profile_name: str | None = None,
    top_chunks: int | None = None,
    id_field: str | None = None,
    content_cluster: str = "content",
) -> None

Create a schema with an explicit indexing mode. This is the recommended way to define a schema.

ParameterTypeDefaultDescription
namestr(required)Document type name (lowercase letters and underscores only)
modeSearchMode(required)INDEX or STREAMING
embedding_dimensionsint(required)Embedding size applied to all embedding fields
indexing_modeIndexingMode(required)Document layout: DOCUMENT_PER_CHUNK (recommended) or SINGLE_DOCUMENT (legacy, deprecated)
fieldslist[FieldDefinition] | NoneNoneFields added on top of the indexing-mode defaults. With DOCUMENT_PER_CHUNK the standard chunk fields are injected automatically; SINGLE_DOCUMENT requires at least one field
custom_functionslist[FunctionWithInputs] | NoneNoneCustom ranking functions (see Manage Ranking)
default_query_profile_namestr | NoneNoneName for the auto-generated query profile. Defaults to the schema name
top_chunksint | NoneNoneSINGLE_DOCUMENT only. Number of top chunks to select. Rejected with DOCUMENT_PER_CHUNK.
id_fieldstr | NoneNoneSINGLE_DOCUMENT only. Deprecated. Rejected with DOCUMENT_PER_CHUNK (chunk identity is always id).
content_clusterstr"content"Vespa content cluster that holds this schema's documents

create_default_schema

Warning

Deprecated. Removed before 1.0.0. The legacy helper for SINGLE_DOCUMENT schemas takes schema_version and additional_fields instead of indexing_mode and fields. Use create_schema(..., indexing_mode=...) instead.

Enums

Enums

Both are imported from mistralai.search.toolkit.plugins.vespa.app.schemas.app.

SearchMode

ValueDescription
SearchMode.INDEXTraditional indexed search: BM25, ANN/HNSW, two-phase ranking
SearchMode.STREAMINGStreaming search: exact nearest neighbor, attribute filtering, single-phase ranking

IndexingMode

ValueDescription
IndexingMode.DOCUMENT_PER_CHUNKRecommended. One Vespa document per chunk, individually addressable by its deterministic id
IndexingMode.SINGLE_DOCUMENTLegacy: one Vespa document per source, chunks packed as arrays. Deprecated. Removed before 1.0.0.
Field types

Field types

Fields are declared with FieldDefinition, imported from mistralai.search.toolkit.plugins.vespa.app.schemas.app, and passed to create_schema(fields=[...]) or add_field(...). Set multi_dimensional=True (where supported) to make a field an array.

Field typePurposeUsed in query matchingGenerates ranking functions
EmbeddingFieldVector embeddings for semantic searchYes (ANN/HNSW)Distance, cosine similarity
TextFieldText for keyword/BM25 searchYesBM25, field match
StringFieldStored metadata, not matchedNoNone
TimestampFieldTime-based rankingNoFreshness, recency
CountFieldNumeric value used for rankingNoNormalization, boost
IntFieldStored integer, not rankedNoNone
BoolFieldStored boolean, not rankedNoNone
LanguageFieldPer-document language tag (RFC 3066)No (filter)None

EmbeddingField

FieldDefinition.EmbeddingField(name: str, multi_dimensional: bool = False)

Vector embeddings indexed with HNSW. The tensor dimension is derived from the schema's embedding_dimensions. Set multi_dimensional=True for an array of embeddings (e.g. per-chunk).

TextField

FieldDefinition.TextField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

Tokenized text matched against the user query (BM25). Included in the default fieldset. summary controls how the field is rendered in results.

StringField

FieldDefinition.StringField(
    name: str,
    multi_dimensional: bool = False,
    fast_search: bool = False,
    match: MatchDefinition | None = None,
    summary: SummaryDefinition | None = SummaryDefinition(),
    attribute: bool = True,
)

Stored metadata that is not matched against the query (use for filters, ids, tags). fast_search builds a dedicated attribute index (more memory/CPU, faster lookups); match sets an explicit matching scheme; attribute=False keeps the value out of in-memory attributes.

TimestampField

FieldDefinition.TimestampField(name: str, type: Literal["long", "int"] = "long")

Numeric timestamp used for freshness and recency ranking functions.

CountField

FieldDefinition.CountField(name: str)

Integer count that generates normalization and boost ranking functions. For an integer you do not want to rank on, use IntField.

IntField

FieldDefinition.IntField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

Stored integer not used for ranking or matching.

BoolField

FieldDefinition.BoolField(name: str)

Stored boolean, not used for ranking or matching.

LanguageField

FieldDefinition.LanguageField(name: str = "language")

Sets the document language (an RFC 3066 tag) and creates an index so documents can be filtered by language.

Evolving schemas

Evolving schemas

Add to an existing schema in a later migration. Each raises ValueError if the named schema does not exist.

add_field

def add_field(schema_name: str, field: FieldDefinition) -> None

Add a single field to an existing schema.

add_custom_functions

def add_custom_functions(schema_name: str, functions: list[FunctionWithInputs]) -> None

Add custom ranking functions to an existing schema.

add_query_profiles

def add_query_profiles(query_profiles: list[QueryProfile]) -> None

Add or update query profiles on the application. See Manage Ranking.

add_schema_rank_profiles

def add_schema_rank_profiles(schema_name: str, rank_profiles: list[Path]) -> None

Attach custom rank-profile files to a schema.

add_schema_model_files

def add_schema_model_files(schema_name: str, model_files: list[Path]) -> None

Attach ML model files to a schema.

add_schema_custom_document_summary

def add_schema_custom_document_summary(schema_name: str, custom_document_summary: DocumentSummary) -> None

Attach a custom document summary to a schema.

VespaMigration

VespaMigration

The base class for every migration. Subclass it and implement migrate(), calling the helpers above:

from mistralai.search.toolkit.plugins.vespa.app.schemas.app import IndexingMode, SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_schema, set_app_name


class InitialSchema(VespaMigration):
    def migrate(self) -> None:
        set_app_name("myapp")
        create_schema(
            name="articles",
            mode=SearchMode.INDEX,
            embedding_dimensions=1024,
            indexing_mode=IndexingMode.DOCUMENT_PER_CHUNK,
        )
See also

See also