databricks.ai_search package

The canonical Python module for Databricks AI Search.

The classes AISearchClient / AISearchIndex / AISearchException are the canonical names; VectorSearchClient / VectorSearchIndex / VectorSearchException are preserved as backward-compat aliases (each is the same class, so isinstance and is checks work either way).

class databricks.ai_search.client.AISearchClient(workspace_url=None, personal_access_token=None, service_principal_client_id=None, service_principal_client_secret=None, azure_tenant_id=None, azure_login_id='2ff814a6-3304-4ab8-85cb-cd0e6f879c1d', disable_notice=False, credential_strategy=None)

Bases: object

A client for interacting with the AI Search service.

This client provides methods for managing endpoints and indexes in the AI Search service.

Initialize the AISearchClient.

Parameters:
  • workspace_url (str) – The URL of the Databricks workspace (e.g., “https://your-workspace.databricks.com”). Optional if running inside a Databricks notebook (auto-detected).

  • personal_access_token (str) – Personal access token for authentication. Used with workspace_url for PAT auth. Optional if running inside a Databricks notebook (auto-detected).

  • service_principal_client_id (str) – OAuth client ID of the service principal. Required for service principal auth.

  • service_principal_client_secret (str) – OAuth client secret of the service principal. Required for service principal auth.

  • azure_tenant_id (str) – Azure tenant ID for Azure-based authentication. Required only for Azure service principal auth.

  • azure_login_id (str) – Azure login ID (Databricks Azure Application ID) for authentication. Defaults to “2ff814a6-3304-4ab8-85cb-cd0e6f879c1d” (AZURE_PUBLIC). See all login IDs: https://github.com/databricks/databricks-sdk-py/blob/main/databricks/sdk/environments.py

  • disable_notice (bool) – Whether to disable authentication notice messages. Default is False.

  • credential_strategy (CredentialStrategy) – Credential strategy for specialized authentication scenarios. Use CredentialStrategy.MODEL_SERVING_USER_CREDENTIALS to authenticate with the calling user’s credentials in Model Serving environments.

Note

Authentication Methods

The client supports multiple authentication methods. Choose one based on your environment:

1. Auto-detection (Recommended for Databricks Notebooks)

When no credentials are provided, the client automatically detects credentials from the Databricks notebook environment. This is the simplest method when running inside Databricks.

Example:

from databricks.ai_search.client import AISearchClient

# Credentials are automatically detected from notebook context
client = AISearchClient()
2. Personal Access Token (PAT)

Use a personal access token for user-based authentication. Requires both workspace_url and personal_access_token.

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient(
    workspace_url="https://your-workspace.databricks.com",
    personal_access_token="dapi..."
)
3. Service Principal (OAuth M2M) - AWS & GCP

Use OAuth Machine-to-Machine authentication with a service principal for automated workflows. Requires workspace_url, service_principal_client_id, and service_principal_client_secret.

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient(
    workspace_url="https://your-workspace.databricks.com",
    service_principal_client_id="your-client-id",
    service_principal_client_secret="your-client-secret"
)
4. Azure Service Principal

For Azure Databricks, use Azure AD authentication with additional azure_tenant_id.

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient(
    workspace_url="https://adb-1234567890123456.7.azuredatabricks.net",
    service_principal_client_id="your-azure-app-id",
    service_principal_client_secret="your-azure-app-secret",
    azure_tenant_id="your-azure-tenant-id"
)
5. Model Serving with User Credentials

When calling from within Databricks Model Serving, use the invoking user’s credentials for fine-grained access control.

Example:

from databricks.ai_search.client import AISearchClient
from databricks.ai_search.utils import CredentialStrategy

client = AISearchClient(
    credential_strategy=CredentialStrategy.MODEL_SERVING_USER_CREDENTIALS
)

Additional Notes:

  • Service principal authentication requires workspace_url to be explicitly provided

  • Personal access token authentication also requires workspace_url

  • When using auto-detection in notebooks, credentials are inferred from the execution context

  • Azure authentication requires both azure_tenant_id and standard service principal credentials

create_delta_sync_index(endpoint_name, index_name, primary_key, source_table_name, pipeline_type, embedding_dimension=None, embedding_vector_column=None, embedding_source_column=None, embedding_model_endpoint_name=None, sync_computed_embeddings=False, columns_to_sync=None, model_endpoint_name_for_query=None, budget_policy_id=None, usage_policy_id=None, index_subtype=None)

Create a delta sync index.

Parameters:
  • columns_to_sync (str) – The columns that would be synced to the vector index with the primary key and vector column always being synced. If the field is not defined, all columns will be synced.

  • endpoint_name (str) – The name of the endpoint.

  • index_name (str) – The name of the index.

  • primary_key (str) – The primary key of the index.

  • source_table_name (str) – The name of the source table.

  • pipeline_type (str) – The type of the pipeline. Must be CONTINUOUS or TRIGGERED.

  • embedding_dimension (int) – The dimension of the embedding vector.

  • embedding_vector_column (str) – The name of the embedding vector column.

  • embedding_source_column (str) – The name of the embedding source column.

  • embedding_model_endpoint_name (str) – The name of the embedding model endpoint.

  • sync_computed_embeddings (bool) – Whether to automatically sync the vector index contents and computed embeddings to a new UC table, table name will be ${index_name}_writeback_table.

  • model_endpoint_name_for_query (str) – When set, queries will use this embedding model instead of the embedding_model_endpoint_name. If unset, queries continue to use embedding_model_endpoint_name.

  • budget_policy_id (str) – The budget policy ID to associate with this index for cost tracking and management (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The usage policy ID to associate with this index for cost tracking and management.

  • index_subtype (str) – The subtype of the index. “HYBRID” (default) supports semantic search with embeddings and keyword search. “FULL_TEXT” supports keyword search only, without embeddings.

Note

BETA FEATURE: The index_subtype parameter is in beta and subject to change. “FULL_TEXT” enables keyword-only search without embeddings. Currently only supported for STORAGE_OPTIMIZED endpoints.

Warning

Only STORAGE_OPTIMIZED endpoints support index_subtype=”FULL_TEXT”. STANDARD endpoints will return an error if FULL_TEXT is specified.

create_delta_sync_index_and_wait(endpoint_name, index_name, primary_key, source_table_name, pipeline_type, embedding_dimension=None, embedding_vector_column=None, embedding_source_column=None, embedding_model_endpoint_name=None, sync_computed_embeddings=False, columns_to_sync=None, model_endpoint_name_for_query=None, budget_policy_id=None, usage_policy_id=None, verbose=False, timeout=datetime.timedelta(days=1), index_subtype=None)

Create a delta sync index and wait for it to be ready.

Parameters:
  • columns_to_sync (str) – The columns that would be synced to the vector index with the primary key and vector column always being synced. If the field is not defined, all columns will be synced.

  • endpoint_name (str) – The name of the endpoint.

  • index_name (str) – The name of the index.

  • primary_key (str) – The primary key of the index.

  • source_table_name (str) – The name of the source table.

  • pipeline_type (str) – The type of the pipeline. Must be CONTINUOUS or TRIGGERED.

  • embedding_dimension (int) – The dimension of the embedding vector.

  • embedding_vector_column (str) – The name of the embedding vector column.

  • embedding_source_column (str) – The name of the embedding source column.

  • embedding_model_endpoint_name (str) – The name of the embedding model endpoint.

  • verbose (bool) – Whether to print status messages.

  • timeout (datetime.timedelta) – The time allowed until we timeout with an Exception.

  • sync_computed_embeddings (bool) – Whether to automatically sync the vector index contents and computed embeddings to a new UC table, table name will be ${index_name}_writeback_table.

  • model_endpoint_name_for_query (str) – The name of the embedding model endpoint to be used for querying, not ingestion.

  • budget_policy_id (str) – The budget policy ID to associate with this index for cost tracking and management (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The usage policy ID to associate with this index for cost tracking and management.

  • index_subtype (str) – The subtype of the index. “HYBRID” (default) supports semantic search with embeddings and keyword search. “FULL_TEXT” supports keyword search only, without embeddings.

Note

BETA FEATURE: The index_subtype parameter is in beta and subject to change. “FULL_TEXT” enables keyword-only search without embeddings. Currently only supported for STORAGE_OPTIMIZED endpoints.

Warning

Only STORAGE_OPTIMIZED endpoints support index_subtype=”FULL_TEXT”. STANDARD endpoints will return an error if FULL_TEXT is specified.

create_direct_access_index(endpoint_name, index_name, primary_key, embedding_dimension, embedding_vector_column, schema, embedding_model_endpoint_name=None, budget_policy_id=None, usage_policy_id=None)

Create a direct access index.

Parameters:
  • endpoint_name (str) – The name of the endpoint.

  • index_name (str) – The name of the index.

  • primary_key (str) – The primary key of the index.

  • embedding_dimension (int) – The dimension of the embedding vector.

  • embedding_vector_column (str) – The name of the embedding vector column.

  • schema (dict) – The schema of the index.

  • embedding_model_endpoint_name (str) – The name of the optional embedding model endpoint to use when querying.

  • budget_policy_id (str) – The budget policy id to be applied to the index (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The usage policy id to be applied to the index.

create_endpoint(name: str, endpoint_type: str = 'STANDARD', budget_policy_id: str | None = None, usage_policy_id: str | None = None, target_qps: int | None = None) Dict[str, Any]

Create an endpoint.

Parameters:
  • name (str) – The name of the endpoint.

  • endpoint_type (str) – The type of the endpoint. Must be STANDARD or STORAGE_OPTIMIZED.

  • budget_policy_id (str) – The id of the budget policy to assign to the endpoint (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The id of the usage policy to assign to the endpoint.

  • target_qps (int) – Target queries per second for the endpoint. Must be a positive integer. Capacity is automatically adjusted at index creation/sync time to best match this throughput target (best-effort, not guaranteed). Optional.

Note

BETA FEATURE: The target_qps parameter is in beta and subject to change. It enables automatic capacity scaling based on desired throughput. Currently only supported for STANDARD endpoints.

Warning

Only STANDARD endpoints support target_qps. Storage Optimized endpoints will return an error if target_qps is specified.

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient()

# Create endpoint with target_qps
endpoint = client.create_endpoint(
    name="high-throughput-endpoint",
    endpoint_type="STANDARD",
    target_qps=200
)

# Check the scaling info
scaling_info = endpoint.get("endpoint", {}).get("scaling_info", {})
print(f"Requested target QPS: {scaling_info.get('requested_target_qps')}")
create_endpoint_and_wait(name: str, endpoint_type: str = 'STANDARD', budget_policy_id: str | None = None, usage_policy_id: str | None = None, target_qps: int | None = None, verbose: bool = False, timeout: timedelta = datetime.timedelta(seconds=3600)) None

Create an endpoint and wait for it to be online.

Parameters:
  • name (str) – The name of the endpoint.

  • endpoint_type (str) – The type of the endpoint. Must be STANDARD or STORAGE_OPTIMIZED.

  • budget_policy_id (str) – The id of the budget policy to assign to the endpoint (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The id of the usage policy to assign to the endpoint.

  • target_qps (int) – Target queries per second for the endpoint. Must be a positive integer. Capacity is automatically adjusted at index creation/sync time to best match this throughput target (best-effort, not guaranteed). Optional.

  • verbose (bool) – Whether to print status messages.

  • timeout (datetime.timedelta) – The time allowed until we timeout with an Exception.

Note

BETA FEATURE: The target_qps parameter is in beta and subject to change. It enables automatic capacity scaling based on desired throughput. Currently only supported for STANDARD endpoints.

Warning

Only STANDARD endpoints support target_qps. Storage Optimized endpoints will return an error if target_qps is specified.

delete_endpoint(name)

Delete an endpoint.

Parameters:

name (str) – The name of the endpoint.

delete_index(endpoint_name=None, index_name=None)

Delete an index.

Parameters:
  • endpoint_name (Option[str]) – The optional name of the endpoint.

  • index_name (str) – The name of the index.

endpoint_exists(name)

Check if an endpoint exists.

This method provides a cleaner alternative to catching NotFound exceptions when checking for resource existence.

Parameters:

name (str) – The name of the endpoint.

Returns:

True if the endpoint exists, False otherwise.

Return type:

bool

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient()
if client.endpoint_exists("my-endpoint"):
    endpoint = client.get_endpoint("my-endpoint")
else:
    endpoint = client.create_endpoint("my-endpoint")
get_async_index(endpoint_name=None, index_name=None)

Get an async index for use in async/await code.

Returns an AsyncAISearchIndex that mirrors the data-plane methods of AISearchIndex (upsert, delete, similarity_search, scan, describe, sync) over httpx.AsyncClient. Control-plane index lifecycle (create / delete / wait_until_ready) remains on the sync client.

Parameters:
  • endpoint_name (Option[str]) – The optional name of the endpoint.

  • index_name (str) – The name of the index.

Returns:

An async index bound to the event loop where its first method is awaited. Sharing across loops is unsupported.

Return type:

databricks.ai_search.async_index.AsyncAISearchIndex

Raises:

Note

This method issues a synchronous HTTP request to resolve the index URL. In async contexts (e.g. FastAPI handlers), wrap the call in asyncio.to_thread:

idx = await asyncio.to_thread(
    client.get_async_index, endpoint_name=..., index_name=...
)

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient()
async with client.get_async_index(
    endpoint_name="my-endpoint", index_name="catalog.schema.my_index"
) as idx:
    results = await idx.similarity_search(
        query_text="hello", columns=["id"], num_results=10
    )
get_endpoint(name)

Get an endpoint.

Parameters:

name (str) – The name of the endpoint.

Raises:
get_index(endpoint_name=None, index_name=None)

Get an index.

Parameters:
  • endpoint_name (Option[str]) – The optional name of the endpoint.

  • index_name (str) – The name of the index.

Raises:
index_exists(endpoint_name=None, index_name=None)

Check if an index exists.

This method provides a cleaner alternative to catching NotFound exceptions when checking for resource existence.

Parameters:
  • endpoint_name (Option[str]) – The optional name of the endpoint.

  • index_name (str) – The name of the index.

Returns:

True if the index exists, False otherwise.

Return type:

bool

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient()
if client.index_exists(index_name="my-index"):
    index = client.get_index(index_name="my-index")
else:
    index = client.create_delta_sync_index(...)
list_endpoints()

List all endpoints.

list_indexes(name)

List all indexes for an endpoint.

Parameters:

name (str) – The name of the endpoint.

update_endpoint(name: str, target_qps: int | None = None) Dict[str, Any]

Update an endpoint’s configuration.

Parameters:
  • name (str) – The name of the endpoint. Required.

  • target_qps (int) – Target queries per second for the endpoint. Must be a positive integer. Optional.

Note

BETA FEATURE: The target_qps parameter is in beta and subject to change. It enables automatic capacity scaling based on desired throughput. Currently only supported for STANDARD endpoints.

Warning

Only STANDARD endpoints support target_qps. Storage Optimized endpoints will return an error if target_qps is specified.

Example:

from databricks.ai_search.client import AISearchClient

client = AISearchClient()

# Update existing endpoint with target_qps
response = client.update_endpoint(name="my-endpoint", target_qps=100)

# Check scaling state
scaling_info = response.get("endpoint", {}).get("scaling_info", {})
state = scaling_info.get("state")

if state == "SCALING_CHANGE_APPLIED":
    print("Scaling configuration applied")
elif state == "SCALING_CHANGE_IN_PROGRESS":
    print("Scaling change in progress, capacity will be adjusted at next index sync")
update_endpoint_budget_policy(name, budget_policy_id=None, usage_policy_id=None)

Update an endpoint’s budget/usage policy.

Parameters:
  • name (str) – The name of the endpoint.

  • budget_policy_id (str) – The id of the budget policy to assign to the endpoint (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The id of the usage policy to assign to the endpoint.

update_endpoint_usage_policy(name, usage_policy_id)

Update an endpoint’s usage policy (alias for update_endpoint_budget_policy).

Parameters:
  • name (str) – The name of the endpoint.

  • usage_policy_id (str) – The id of the usage policy to assign to the endpoint.

update_index_budget_policy(index_name, budget_policy_id=None, usage_policy_id=None)

Update the budget/usage policy of an index.

Parameters:
  • index_name (str) – The name of the index.

  • budget_policy_id (str) – The budget policy id to be applied to the index (deprecated, use usage_policy_id).

  • usage_policy_id (str) – The usage policy id to be applied to the index.

update_index_usage_policy(index_name, usage_policy_id)

Update the usage policy of an index (alias for update_index_budget_policy).

Parameters:
  • index_name (str) – The name of the index.

  • usage_policy_id (str) – The usage policy id to be applied to the index.

validate(disable_notice=False)

Validate the client configuration.

Parameters:

disable_notice (bool) – Whether to disable the authentication notice message.

wait_for_endpoint(name, verbose=False, timeout=datetime.timedelta(seconds=3600))

Wait for an endpoint to be online.

Parameters:
  • name (str) – The name of the endpoint.

  • verbose (bool) – Whether to print status messages.

  • timeout (datetime.timedelta) – The time allowed until we timeout with an Exception.

class databricks.ai_search.index.AISearchIndex(workspace_url: str, index_url: str, name: str, endpoint_name: str, mlserving_endpoint_name: str | None = None, personal_access_token: str | None = None, service_principal_client_id: str | None = None, service_principal_client_secret: str | None = None, azure_tenant_id: str | None = None, azure_login_id: str | None = None, use_user_passed_credentials: bool = False, credential_strategy: CredentialStrategy | None = None, get_reranker_url_callable: callable | None = None, mlserving_endpoint_name_for_query: str | None = None, total_retries: int = 3, backoff_factor: float = 1, backoff_jitter: float = 0.2)

Bases: object

AISearchIndex is a helper class that represents a AI Search Index.

Those who wish to use this class should not instantiate it directly, but rather use the AISearchClient class.

Initialize a AISearchIndex instance.

Parameters:
  • workspace_url (str) – The URL of the Databricks workspace.

  • index_url (str) – The direct URL to the vector search index endpoint.

  • name (str) – The name of the vector search index.

  • endpoint_name (str) – The name of the vector search endpoint.

  • mlserving_endpoint_name (str) – The name of the model serving endpoint used for embedding generation during ingestion.

  • personal_access_token (str) – Personal access token for authentication.

  • service_principal_client_id (str) – Service principal client ID for authentication.

  • service_principal_client_secret (str) – Service principal client secret for authentication.

  • azure_tenant_id (str) – Azure tenant ID for Azure-based authentication.

  • azure_login_id (str) – Azure login ID (Databricks Azure Application ID) for authentication.

  • use_user_passed_credentials (bool) – Whether credentials were explicitly provided by the user (True) or inferred automatically (False).

  • credential_strategy (CredentialStrategy) – The credential strategy to use for authentication.

  • get_reranker_url_callable (callable) – A callable function to retrieve the reranker-compatible index URL when needed.

  • mlserving_endpoint_name_for_query (str) – The name of the model serving endpoint to use for queries (if different from ingestion endpoint).

  • total_retries (int) – Total number of retries for requests. Defaults to 3.

  • backoff_factor (float) – Backoff factor for retry delays. Defaults to 1.

  • backoff_jitter (float) – Random jitter proportion (0-1) to add to backoff delays. Defaults to 0.2.

delete(primary_keys)

Delete data from the index.

Parameters:

primary_keys – List of primary keys to delete from the index.

describe()

Describe the index. This returns metadata about the index.

scan(num_results=10, last_primary_key=None)

Given all the data in the index sorted by primary key, this returns the next num_results data after the primary key specified by last_primary_key. If last_primary_key is None , it returns the first num_results.

Please note if there’s ongoing updates to the index, the scan results may not be consistent.

Parameters:
  • num_results – Number of results to return.

  • last_primary_key – last primary key from previous pagination, it will be used as the exclusive starting primary key.

scan_index(num_results=10, last_primary_key=None)

Deprecated since version 0.36: This will be removed in 0.37. Use the scan function instead

Perform a similarity search on the index. This returns the top K results that are most similar to the query.

Parameters:
  • columns – List of column names to return in the results.

  • query_text – Query text to search for.

  • query_vector – Query vector to search for.

  • filters – Filters to apply to the query.

  • num_results – Number of results to return.

  • debug_level – Debug level to use for the query.

  • score_threshold – Score threshold to use for the query. If reranker is used, the score threshold is applied before reranking.

  • query_type – Query type of this query. Choices are “ANN”, “HYBRID”, and “FULL_TEXT”.

  • query_columns – Text columns to search for query_text. When empty, all text columns are searched.

  • sort_columns – Sort results by column values instead of the default relevance ordering. Each clause has the form “<column> ASC” or “<column> DESC”, for example [“rating DESC”, “price ASC”].

  • facets – Facets to compute over the matched results. Each entry is one of: “<column>” (top 10 distinct values by count), “<column> TOP <n>” (top n distinct values, n > 0), or “<column> BUCKETS [[from,to],…]” (inclusive numeric ranges). TOP and BUCKETS are case-insensitive; a column may appear at most once.

  • columns_to_rerank – (Deprecated) List of column names to use for reranking the results. Use the reranker parameter instead.

  • disable_notice – Whether to disable the notice message.

  • reranker (Optional[databricks.ai_search.reranker.Reranker]) – Optional reranker to apply on the top results. Pass an instance of databricks.ai_search.reranker.DatabricksReranker with columns_to_rerank=[...]. The reranker reorders the initial results using the specified text columns.

  • total_retries – Total number of retries for the request. Set to 0 to disable retries.

  • backoff_factor – Backoff factor to apply between retry attempts. The delay between retries is calculated as {backoff_factor} * (2 ** (retry_count - 1)) seconds. For example, with backoff_factor=1, delays are 0.5s, 1s, 2s, 4s, etc.

  • backoff_jitter – Random jitter to add to backoff delays to avoid thundering herd problem. Value between 0 and 1 representing the proportion of jitter to apply.

Example:

Use the Databricks reranker to improve the ordering of hybrid search results:

from databricks.ai_search.reranker import DatabricksReranker

results = index.similarity_search(
    query_text="How to create a AI Search index",
    columns=["id", "text", "parent_doc_summary", "date"],
    # The final number of results to return. The reranker will automatically overfetch 50 documents and rerank them.
    num_results=10,
    query_type="hybrid",
    # Needed for debug info to get any warnings and time to rerank the results.
    debug_level=1,
    # The text reranked will be concatenated and if it is longer than 2000 characters, it will be truncated.
    # Include shorter, important columns first.
    reranker=DatabricksReranker(columns_to_rerank=["parent_doc_summary", "text", "other_column"]),
)
# Check if reranking was successful and how much additional time it took to rerank the results.
if "warnings" in results['debug_info']:
    print(results['debug_info']['warnings'])
else:
    print(f"Reranking was successful and took {results['debug_info']['reranker_time']}ms")
sync()

Sync the index. This is used to sync the index with the source delta table. This only works with managed delta sync index with pipeline type=”TRIGGERED”.

upsert(inputs)

Upsert data into the index.

Parameters:

inputs – List of dictionaries to upsert into the index.

wait_until_ready(verbose=False, timeout=datetime.timedelta(days=1), wait_for_updates=False)

Wait for the index to be online.

Parameters:
  • verbose (bool) – Whether to print status messages.

  • timeout (datetime.timedelta) – The time allowed until we timeout with an Exception.

  • wait_for_updates (bool) – If true, the index will also wait for any updates to be completed.

class databricks.ai_search.async_index.AsyncAISearchIndex(workspace_url: str, index_url: str, name: str, endpoint_name: str, mlserving_endpoint_name: str | None = None, personal_access_token: str | None = None, service_principal_client_id: str | None = None, service_principal_client_secret: str | None = None, azure_tenant_id: str | None = None, azure_login_id: str | None = None, use_user_passed_credentials: bool = False, credential_strategy: CredentialStrategy | None = None, get_reranker_url_callable: callable | None = None, mlserving_endpoint_name_for_query: str | None = None, total_retries: int = 3, backoff_factor: float = 1, backoff_jitter: float = 0.2)

Bases: object

AsyncAISearchIndex is a helper class that represents a AI Search Index for async/await code.

Those who wish to use this class should not instantiate it directly, but rather use AISearchClient.get_async_index() to obtain an instance.

Index lifecycle operations (create, delete, wait_until_ready) are available only on AISearchClient.

Example:

async with client.get_async_index(
    endpoint_name="my-endpoint", index_name="my-index"
) as idx:
    results = await idx.similarity_search(
        query_vector=[0.1, 0.2, ...], columns=["id"], num_results=10
    )

Initialize an AsyncAISearchIndex instance.

Parameters:
  • workspace_url (str) – The URL of the Databricks workspace.

  • index_url (str) – The direct URL to the vector search index endpoint.

  • name (str) – The name of the vector search index.

  • endpoint_name (str) – The name of the vector search endpoint.

  • mlserving_endpoint_name (str) – The name of the model serving endpoint used for embedding generation during ingestion.

  • personal_access_token (str) – Personal access token for authentication.

  • service_principal_client_id (str) – Service principal client ID for authentication.

  • service_principal_client_secret (str) – Service principal client secret for authentication.

  • azure_tenant_id (str) – Azure tenant ID for Azure-based authentication.

  • azure_login_id (str) – Azure login ID (Databricks Azure Application ID) for authentication.

  • use_user_passed_credentials (bool) – Whether credentials were explicitly provided by the user (True) or inferred automatically (False).

  • credential_strategy (CredentialStrategy) – The credential strategy to use for authentication.

  • get_reranker_url_callable (callable) – A callable function to retrieve the reranker-compatible index URL when needed.

  • mlserving_endpoint_name_for_query (str) – The name of the model serving endpoint to use for queries (if different from ingestion endpoint).

  • total_retries (int) – Total number of retries for requests. Defaults to 3.

  • backoff_factor (float) – Backoff factor for retry delays. Defaults to 1.

  • backoff_jitter (float) – Random jitter proportion (0-1) to add to backoff delays. Defaults to 0.2.

async aclose()

Close this index instance.

async delete(primary_keys)

Delete data from the index.

Parameters:

primary_keys – List of primary keys to delete from the index.

async describe()

Describe the index. This returns metadata about the index.

async scan(num_results=10, last_primary_key=None)

Given all the data in the index sorted by primary key, this returns the next num_results data after the primary key specified by last_primary_key. If last_primary_key is None, it returns the first num_results.

Please note if there’s ongoing updates to the index, the scan results may not be consistent.

Parameters:
  • num_results – Number of results to return.

  • last_primary_key – Last primary key from previous pagination, used as the exclusive starting primary key.

Perform a similarity search on the index. This returns the top K results that are most similar to the query.

Parameters:
  • columns – List of column names to return in the results.

  • query_text – Query text to search for.

  • query_vector – Query vector to search for.

  • filters – Filters to apply to the query.

  • num_results – Number of results to return.

  • debug_level – Debug level to use for the query.

  • score_threshold – Score threshold to use for the query. If reranker is used, the score threshold is applied before reranking.

  • query_type – Query type of this query. Choices are “ANN”, “HYBRID”, and “FULL_TEXT”.

  • query_columns – Text columns to search for query_text. When empty, all text columns are searched.

  • sort_columns – Sort results by column values instead of the default relevance ordering. Each clause has the form “<column> ASC” or “<column> DESC”, for example [“rating DESC”, “price ASC”].

  • facets – Facets to compute over the matched results. Each entry is one of: “<column>” (top 10 distinct values by count), “<column> TOP <n>” (top n distinct values, n > 0), or “<column> BUCKETS [[from,to],…]” (inclusive numeric ranges). TOP and BUCKETS are case-insensitive; a column may appear at most once.

  • columns_to_rerank – (Deprecated) List of column names to use for reranking the results. Use the reranker parameter instead.

  • disable_notice – Whether to disable the notice message.

  • reranker (Optional[databricks.ai_search.reranker.Reranker]) – Optional reranker to apply on the top results. Pass an instance of databricks.ai_search.reranker.DatabricksReranker with columns_to_rerank=[...]. The reranker reorders the initial results using the specified text columns.

  • total_retries – Total number of retries for the request. Set to 0 to disable retries.

  • backoff_factor – Backoff factor to apply between retry attempts. The delay between retries is calculated as {backoff_factor} * (2 ** (retry_count - 1)) seconds. For example, with backoff_factor=1, delays are 0.5s, 1s, 2s, 4s, etc.

  • backoff_jitter – Random jitter to add to backoff delays to avoid thundering herd problem. Value between 0 and 1 representing the proportion of jitter to apply.

Example:

Use the Databricks reranker to improve the ordering of hybrid search results:

from databricks.ai_search.reranker import DatabricksReranker

results = index.similarity_search(
    query_text="How to create a AI Search index",
    columns=["id", "text", "parent_doc_summary", "date"],
    # The final number of results to return. The reranker will automatically overfetch 50 documents and rerank them.
    num_results=10,
    query_type="hybrid",
    # Needed for debug info to get any warnings and time to rerank the results.
    debug_level=1,
    # The text reranked will be concatenated and if it is longer than 2000 characters, it will be truncated.
    # Include shorter, important columns first.
    reranker=DatabricksReranker(columns_to_rerank=["parent_doc_summary", "text", "other_column"]),
)
# Check if reranking was successful and how much additional time it took to rerank the results.
if "warnings" in results['debug_info']:
    print(results['debug_info']['warnings'])
else:
    print(f"Reranking was successful and took {results['debug_info']['reranker_time']}ms")
async sync()

Sync the index. This is used to sync the index with the source delta table. This only works with managed delta sync index with pipeline type=”TRIGGERED”.

async upsert(inputs)

Upsert data into the index.

Parameters:

inputs – List of dictionaries to upsert into the index.

class databricks.ai_search.reranker.DatabricksReranker(columns_to_rerank: list[str])

Bases: Reranker

Initialize a DatabricksReranker config object.

Args:

columns_to_rerank: A list of column names to use for reranking the results.

class databricks.ai_search.reranker.ExperimentalDatabricksFinetunedReranker(columns_to_rerank: list[str], endpoint_name: str | None = None)

Bases: Reranker

EXPERIMENTAL. Not covered by SDK compatibility guarantees.

Rerank query results using a finetuned reranker model hosted on a Model Serving endpoint in the caller’s workspace (typically created by the reranker finetuning job).

Initialize an ExperimentalDatabricksFinetunedReranker.

Args:
columns_to_rerank: List of column names to concatenate and send to

the reranker model for each document.

endpoint_name: Name of the Model Serving endpoint hosting the

finetuned reranker. If None, the index resolves it at query time via _default_finetuned_endpoint_name() using the index’s UC table UUID (see that function for the exact formula). This requires one extra describe() round-trip the first time the index handles a finetuned-reranker query; the UUID is cached on the index for subsequent queries.

exception databricks.ai_search.exceptions.AISearchException(message, status_code=None, response_content=None)

Bases: Exception

Base exception for all AI Search SDK errors.

Attributes:

status_code: HTTP status code if applicable response_content: Raw response content from the API

exception databricks.ai_search.exceptions.BadRequest(message, status_code=None, response_content=None)

Bases: AISearchException

exception databricks.ai_search.exceptions.InvalidInputException(message, status_code=None, response_content=None)

Bases: AISearchException

exception databricks.ai_search.exceptions.NotFound(message, status_code=None, response_content=None)

Bases: AISearchException

exception databricks.ai_search.exceptions.PermissionDenied(message, status_code=None, response_content=None)

Bases: AISearchException

exception databricks.ai_search.exceptions.ResourceConflict(message, status_code=None, response_content=None)

Bases: AISearchException

databricks.ai_search.exceptions.ResourceDoesNotExist

alias of NotFound

exception databricks.ai_search.exceptions.TooManyRequests(message, status_code=None, response_content=None)

Bases: AISearchException

databricks.ai_search.exceptions.VectorSearchException

alias of AISearchException

class databricks.ai_search.utils.CredentialStrategy(*values)

Bases: Enum

MODEL_SERVING_USER_CREDENTIALS = 1