Lakehouse Monitoring for GenAI

For more details see What is Lakehouse Monitoring for generative AI?

databricks.agents.monitoring.create_external_monitor(*, catalog_name: str, schema_name: str, assessments_config: AssessmentsSuiteConfig | dict, experiment_id: str | None = None, experiment_name: str | None = None) → ExternalMonitor

Create a monitor for a GenAI application served outside Databricks.

Parameters:

catalog_name (str) – The name of the catalog in UC to create the trace archive table in.
schema_name (str) – The name of the schema in UC to create the trace archive table in.
assessments_config (AssessmentsSuiteConfig | dict) – The configuration for the suite of assessments to be run on traces from the GenAI application.
experiment_id (str | None, optional) – ID of Mlflow experiment that the monitor should be associated with. Defaults to the currently active experiment.
experiment_name (str | None, optional) – The name of the Mlflow experiment that the monitor should be associated with. Defaults to the currently active experiment.

Returns:

The created monitor.

Return type:

ExternalMonitor

databricks.agents.monitoring.delete_external_monitor(*, experiment_id: str | None = None, experiment_name: str | None = None) → None

Deletes the monitor for a GenAI application served outside Databricks.

Parameters:

experiment_id (str | None, optional) – ID of the Mlflow experiment that the monitor is associated with. Defaults to None.
experiment_name (str | None, optional) – Name of the Mlflow experiment that the monitor is associated with. Defaults to None.

databricks.agents.monitoring.get_external_monitor(*, experiment_id: str | None = None, experiment_name: str | None = None) → ExternalMonitor

Gets the monitor for a GenAI application served outside Databricks.

Parameters:

experiment_id (str | None, optional) – ID of the Mlflow experiment that the monitor is associated with. Defaults to None.
experiment_name (str | None, optional) – Name of the Mlflow experiment that the monitor is associated with. Defaults to None.

Raises:

ValueError – When neither experiment_id nor experiment_name is provided.
ValueError – When no monitor is found for the given experiment_id or experiment_name.

Returns:

The retrieved external monitor.

Return type:

entities.ExternalMonitor

databricks.agents.monitoring.update_external_monitor(*, experiment_id: str | None = None, experiment_name: str | None = None, assessments_config: AssessmentsSuiteConfig | dict) → ExternalMonitor

Updates the monitor for an GenAI application served outside Databricks.

Parameters:

assessments_config (assessments.AssessmentsSuiteConfig) – The updated configuration for the suite of assessments to be run on traces from the AI system. Partial updates of arrays is not supported, so assessments specified here will override your monitor’s assessments. If unspecified, non-nested fields like sample will not be updated.
experiment_id (str | None, optional) – ID of the Mlflow experiment that the monitor is associated with. Defaults to None.
experiment_name (str | None, optional) – Name of the Mlflow experiment that the monitor is associated with. Defaults to None.

Raises:

ValueError – When assessments_config is not provided.

Returns:

The updated external monitor.

Return type:

entities.ExternalMonitor

Create a monitor for a Databricks serving endpoint.

Parameters:

endpoint_name – The name of the serving endpoint.
assessments_config – The configuration for the suite of assessments to be run on traces.
experiment_id – The experiment ID to log the monitoring results. Defaults to the currently active MLflow experiment.
monitoring_config – Deprecated. The monitoring configuration.

Returns:

The monitor for the serving endpoint.

databricks.agents.monitoring.delete_monitor(*, endpoint_name: str | None = None) → None

Deletes a monitor for a Databricks serving endpoint.

Parameters:: endpoint_name (str, optional) – The name of the agent’s serving endpoint.

databricks.agents.monitoring.get_monitor(*, endpoint_name: str) → Monitor

Retrieves a monitor for a Databricks serving endpoint.

Parameters:: endpoint_name (str, optional) – The name of the agent’s serving endpoint.
Returns:: Monitor | ExternalMonitor metadata. For external monitors, this will include the status of the ingestion endpoint.

databricks.agents.monitoring.update_monitor(*, endpoint_name: str, assessments_config: dict | AssessmentsSuiteConfig | None = None, monitoring_config: dict | MonitoringConfig | None = None) → Monitor

Partially update a monitor for a serving endpoint.

Parameters:

endpoint_name (str, optional) – The name of the agent’s serving endpoint. Only supported for agents served on Databricks.
assessments_config – The updated configuration for the suite of assessments to be run on traces. Partial updates of arrays is not supported, so assessments specified here will override your monitor’s assessments. If unspecified, non-nested fields like sample will not be updated.
monitoring_config – Deprecated. The configuration change, using upsert semantics.

Returns:

The updated monitor for the serving endpoint.

Return type:

Monitor

class databricks.agents.monitoring.AssessmentsSuiteConfig(sample: float | None = None, paused: bool | None = None, assessments: list[AssessmentConfig] | None = None)

Bases: object

Configuration for a suite of assessments to be run on traces from a GenAI application.

Raises:

ValueError – When sample is not between 0.0 (exclusive) and 1.0 (inclusive).
ValueError – When more than one guidelines judge is found.
ValueError – When duplicate builtin judges are found.

sample: float | None = None

paused: bool | None = None

assessments: list[AssessmentConfig] | None = None

classmethod from_dict(data: dict)

get_guidelines_judge() → GuidelinesJudge | None: Get the first GuidelinesJudge from the assessments list, or None if not found.

class databricks.agents.monitoring.CustomMetric(metric_fn: CustomMetric, sample_rate: float | None = None)

Bases: AssessmentConfig

Configuration for a custom metric to be run on traces from a GenAI application.

Raises:: ValueError – When the provided function is not annotated with @metric.

metric_fn: CustomMetric

sample_rate: float | None = None

class databricks.agents.monitoring.GuidelinesJudge(guidelines: dict[str, list[str]], sample_rate: float | None = None)

Bases: AssessmentConfig

Configuration for a guideline adherence judge to be run on traces from an GenAI application.

Raises:

ValueError – When there are duplicate keys in guidelines dict.
ValueError – When there are duplicate values for a key in guidelines dict.

name: Literal['guideline_adherence'] = 'guideline_adherence'

guidelines: dict[str, list[str]]

sample_rate: float | None = None

class databricks.agents.monitoring.BuiltinJudge(name: Literal['safety', 'groundedness', 'relevance_to_query', 'chunk_relevance'], sample_rate: float | None = None)

Bases: AssessmentConfig

Configuration for a builtin judge to be run on traces from a GenAI application.

Raises:: ValueError – When the judge name is invalid.

name: Literal['safety', 'groundedness', 'relevance_to_query', 'chunk_relevance']

sample_rate: float | None = None

class databricks.agents.monitoring.Monitor(experiment_id: str, endpoint_name: str, assessments_config: AssessmentsSuiteConfig, evaluated_traces_table: str, trace_archive_table: str | None, current_backfill_job_id: str | None = None)

Bases: object

The monitor for a serving endpoint.

experiment_id: str

endpoint_name: str

assessments_config: AssessmentsSuiteConfig

evaluated_traces_table: str

trace_archive_table: str | None

current_backfill_job_id: str | None = None

property monitoring_page_url: str

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: str | int | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str

class databricks.agents.monitoring.MonitoringConfig(sample: float | None = None, metrics: list[str] | None = None, paused: bool | None = None, global_guidelines: dict[str, list[str]] | None = None)

Bases: object

Deprecated. Configuration for monitoring an GenAI application. All fields are optional for upsert semantics.

sample: float | None = None

metrics: list[str] | None = None

paused: bool | None = None

global_guidelines: dict[str, list[str]] | None = None

classmethod from_assessments_suite_config(assessments_suite_config: AssessmentsSuiteConfig) → MonitoringConfig

to_assessments_suite_config() → AssessmentsSuiteConfig

classmethod from_dict(kvs: dict | list | str | int | float | bool | None, *, infer_missing=False) → A

classmethod from_json(s: str | bytes | bytearray, *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) → A

classmethod schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) → SchemaF[A]

to_dict(encode_json=False) → Dict[str, dict | list | str | int | float | bool | None]

to_json(*, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: str | int | None = None, separators: Tuple[str, str] | None = None, default: Callable | None = None, sort_keys: bool = False, **kw) → str

class databricks.agents.monitoring.SchedulePauseStatus(value)

Bases: StrEnum

An enumeration.

UNPAUSED = 'UNPAUSED'

PAUSED = 'PAUSED'