Migration helpers reference

Reference for the helpers used inside migrations. All helpers are imported from mistralai.search.toolkit.plugins.vespa.migration and called from within a migration's migrate() method.

from mistralai.search.toolkit.plugins.vespa.migration import (
    VespaMigration,
    set_app_name,
    create_schema,
    add_field,
    add_query_profiles,
)

from mistralai.search.toolkit.plugins.vespa.migration import (
    VespaMigration,
    set_app_name,
    create_schema,
    add_field,
    add_query_profiles,
)

Application configuration

`set_app_name`

def set_app_name(app_name: str) -> None

def set_app_name(app_name: str) -> None

Set the application name for the migration run. Required in the first migration. Must contain only lowercase letters (a-z).

`set_content_id`

def set_content_id(content_id: str) -> None

def set_content_id(content_id: str) -> None

Override the content cluster id used in services.xml (defaults to the application name). Use this when the deployed cluster was originally created with a different id. Otherwise, Vespa treats a rename as a remove and recreate operation, destroying all stored documents.

`allow_schema_removal`

def allow_schema_removal(until: str) -> None

def allow_schema_removal(until: str) -> None

Mark a migration as intentionally removing one or more schemas. Vespa rejects schema removals by default to protect data; this writes a validation-overrides.xml permitting the removal until until (an ISO "YYYY-MM-DD" date). Pick a date a week or two past your deploy window.

`set_distribute_across_groups`

def set_distribute_across_groups(value: bool) -> None

def set_distribute_across_groups(value: bool) -> None

Warning

Deprecated. A compatibility hook for Kubernetes autodiscovery. Define groups explicitly in the topology file for new deployments.

Creating schemas

`create_schema`

def create_schema(
    name: str,
    mode: SearchMode,
    embedding_dimensions: int,
    indexing_mode: IndexingMode,
    fields: list[FieldDefinition] | None = None,
    custom_functions: list[FunctionWithInputs] | None = None,
    default_query_profile_name: str | None = None,
    top_chunks: int | None = None,
    id_field: str | None = None,
    content_cluster: str = "content",
) -> None

def create_schema(
    name: str,
    mode: SearchMode,
    embedding_dimensions: int,
    indexing_mode: IndexingMode,
    fields: list[FieldDefinition] | None = None,
    custom_functions: list[FunctionWithInputs] | None = None,
    default_query_profile_name: str | None = None,
    top_chunks: int | None = None,
    id_field: str | None = None,
    content_cluster: str = "content",
) -> None

Create a schema with an explicit indexing mode. This is the recommended way to define a schema.

Parameter	Type	Default	Description
`name`	`str`	(required)	Document type name (lowercase letters and underscores only)
`mode`	`SearchMode`	(required)	`INDEX` or `STREAMING`
`embedding_dimensions`	`int`	(required)	Embedding size applied to all embedding fields
`indexing_mode`	`IndexingMode`	(required)	Document layout: `DOCUMENT_PER_CHUNK` (recommended) or `SINGLE_DOCUMENT` (legacy, deprecated)
`fields`	`list[FieldDefinition] \| None`	`None`	Fields added on top of the indexing-mode defaults. With `DOCUMENT_PER_CHUNK` the standard chunk fields are injected automatically; `SINGLE_DOCUMENT` requires at least one field
`custom_functions`	`list[FunctionWithInputs] \| None`	`None`	Custom ranking functions (see Manage Ranking)
`default_query_profile_name`	`str \| None`	`None`	Name for the auto-generated query profile. Defaults to the schema name
`top_chunks`	`int \| None`	`None`	`SINGLE_DOCUMENT` only. Number of top chunks to select. Rejected with `DOCUMENT_PER_CHUNK`.
`id_field`	`str \| None`	`None`	`SINGLE_DOCUMENT` only. Deprecated. Rejected with `DOCUMENT_PER_CHUNK` (chunk identity is always `id`).
`content_cluster`	`str`	`"content"`	Vespa content cluster that holds this schema's documents

`create_default_schema`

Warning

Deprecated. Removed before 1.0.0. The legacy helper for SINGLE_DOCUMENT schemas takes schema_version and additional_fields instead of indexing_mode and fields. Use create_schema(..., indexing_mode=...) instead.

Enums

Both are imported from mistralai.search.toolkit.plugins.vespa.app.schemas.app.

`SearchMode`

Value	Description
`SearchMode.INDEX`	Traditional indexed search: BM25, ANN/HNSW, two-phase ranking
`SearchMode.STREAMING`	Streaming search: exact nearest neighbor, attribute filtering, single-phase ranking

`IndexingMode`

Value	Description
`IndexingMode.DOCUMENT_PER_CHUNK`	Recommended. One Vespa document per chunk, individually addressable by its deterministic id
`IndexingMode.SINGLE_DOCUMENT`	Legacy: one Vespa document per source, chunks packed as arrays. Deprecated. Removed before 1.0.0.

Field types

Fields are declared with FieldDefinition, imported from mistralai.search.toolkit.plugins.vespa.app.schemas.app, and passed to create_schema(fields=[...]) or add_field(...). Set multi_dimensional=True (where supported) to make a field an array.

Field type	Purpose	Used in query matching	Generates ranking functions
`EmbeddingField`	Vector embeddings for semantic search	Yes (ANN/HNSW)	Distance, cosine similarity
`TextField`	Text for keyword/BM25 search	Yes	BM25, field match
`StringField`	Stored metadata, not matched	No	None
`TimestampField`	Time-based ranking	No	Freshness, recency
`CountField`	Numeric value used for ranking	No	Normalization, boost
`IntField`	Stored integer, not ranked	No	None
`BoolField`	Stored boolean, not ranked	No	None
`LanguageField`	Per-document language tag (RFC 3066)	No (filter)	None

`EmbeddingField`

FieldDefinition.EmbeddingField(name: str, multi_dimensional: bool = False)

FieldDefinition.EmbeddingField(name: str, multi_dimensional: bool = False)

Vector embeddings indexed with HNSW. The tensor dimension is derived from the schema's embedding_dimensions. Set multi_dimensional=True for an array of embeddings (e.g. per-chunk).

`TextField`

FieldDefinition.TextField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

FieldDefinition.TextField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

Tokenized text matched against the user query (BM25). Included in the default fieldset. summary controls how the field is rendered in results.

`StringField`

FieldDefinition.StringField(
    name: str,
    multi_dimensional: bool = False,
    fast_search: bool = False,
    match: MatchDefinition | None = None,
    summary: SummaryDefinition | None = SummaryDefinition(),
    attribute: bool = True,
)

FieldDefinition.StringField(
    name: str,
    multi_dimensional: bool = False,
    fast_search: bool = False,
    match: MatchDefinition | None = None,
    summary: SummaryDefinition | None = SummaryDefinition(),
    attribute: bool = True,
)

Stored metadata that is not matched against the query (use for filters, ids, tags). fast_search builds a dedicated attribute index (more memory/CPU, faster lookups); match sets an explicit matching scheme; attribute=False keeps the value out of in-memory attributes.

`TimestampField`

FieldDefinition.TimestampField(name: str, type: Literal["long", "int"] = "long")

FieldDefinition.TimestampField(name: str, type: Literal["long", "int"] = "long")

Numeric timestamp used for freshness and recency ranking functions.

`CountField`

FieldDefinition.CountField(name: str)

FieldDefinition.CountField(name: str)

Integer count that generates normalization and boost ranking functions. For an integer you do not want to rank on, use IntField.

`IntField`

FieldDefinition.IntField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

FieldDefinition.IntField(
    name: str,
    multi_dimensional: bool = False,
    summary: SummaryDefinition = SummaryDefinition(),
)

Stored integer not used for ranking or matching.

`BoolField`

FieldDefinition.BoolField(name: str)

FieldDefinition.BoolField(name: str)

Stored boolean, not used for ranking or matching.

`LanguageField`

FieldDefinition.LanguageField(name: str = "language")

FieldDefinition.LanguageField(name: str = "language")

Sets the document language (an RFC 3066 tag) and creates an index so documents can be filtered by language.

Evolving schemas

Add to an existing schema in a later migration. Each raises ValueError if the named schema does not exist.

`add_field`

def add_field(schema_name: str, field: FieldDefinition) -> None

def add_field(schema_name: str, field: FieldDefinition) -> None

Add a single field to an existing schema.

`add_custom_functions`

def add_custom_functions(schema_name: str, functions: list[FunctionWithInputs]) -> None

def add_custom_functions(schema_name: str, functions: list[FunctionWithInputs]) -> None

Add custom ranking functions to an existing schema.

`add_query_profiles`

def add_query_profiles(query_profiles: list[QueryProfile]) -> None

def add_query_profiles(query_profiles: list[QueryProfile]) -> None

Add or update query profiles on the application. See Manage Ranking.

`add_schema_rank_profiles`

def add_schema_rank_profiles(schema_name: str, rank_profiles: list[Path]) -> None

def add_schema_rank_profiles(schema_name: str, rank_profiles: list[Path]) -> None

Attach custom rank-profile files to a schema.

`add_schema_model_files`

def add_schema_model_files(schema_name: str, model_files: list[Path]) -> None

def add_schema_model_files(schema_name: str, model_files: list[Path]) -> None

Attach ML model files to a schema.

`add_schema_custom_document_summary`

def add_schema_custom_document_summary(schema_name: str, custom_document_summary: DocumentSummary) -> None

def add_schema_custom_document_summary(schema_name: str, custom_document_summary: DocumentSummary) -> None

Attach a custom document summary to a schema.

VespaMigration

`VespaMigration`

The base class for every migration. Subclass it and implement migrate(), calling the helpers above:

from mistralai.search.toolkit.plugins.vespa.app.schemas.app import IndexingMode, SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_schema, set_app_name


class InitialSchema(VespaMigration):
    def migrate(self) -> None:
        set_app_name("myapp")
        create_schema(
            name="articles",
            mode=SearchMode.INDEX,
            embedding_dimensions=1024,
            indexing_mode=IndexingMode.DOCUMENT_PER_CHUNK,
        )

from mistralai.search.toolkit.plugins.vespa.app.schemas.app import IndexingMode, SearchMode
from mistralai.search.toolkit.plugins.vespa.migration import VespaMigration, create_schema, set_app_name


class InitialSchema(VespaMigration):
    def migrate(self) -> None:
        set_app_name("myapp")
        create_schema(
            name="articles",
            mode=SearchMode.INDEX,
            embedding_dimensions=1024,
            indexing_mode=IndexingMode.DOCUMENT_PER_CHUNK,
        )

Application configuration

set_app_name

set_content_id

allow_schema_removal

set_distribute_across_groups

Creating schemas

create_schema

create_default_schema

Enums

SearchMode

IndexingMode

Field types

EmbeddingField

TextField

StringField

TimestampField

CountField

IntField

BoolField

LanguageField