Training Set

Backward compatibility module for TrainingSet.

This module provides backward compatibility by importing TrainingSet from its new location in entities. New code should import directly from databricks.ml_features.entities.training_set

class databricks.ml_features.training_set.TrainingSet(feature_spec: FeatureSpec, df: DataFrame, labels: List[str], feature_table_metadata_map: Dict[str, FeatureTable], feature_table_data_map: Dict[str, DataFrame], uc_function_infos: Dict[str, FunctionInfo], use_spark_native_join: Optional[bool] = False)

Bases: object

Note

Aliases: databricks.feature_engineering.training_set.TrainingSet, databricks.feature_store.training_set.TrainingSet

Class that defines TrainingSet objects.

Note

The TrainingSet constructor should not be called directly. Instead, call create_training_set().

get_output_columns() List[str]

Get the list of output columns that should be included in the final DataFrame.

This method determines which columns should be included based on the feature_spec configuration. If feature_spec has column_infos, it returns only the columns marked for inclusion plus labels. Otherwise, it returns an empty list.

Returns

List of column names to include in the output

load_df() DataFrame

Load a DataFrame.

Return a DataFrame for training.

The returned DataFrame has columns specified in the feature_spec and labels parameters provided in create_training_set().

Returns

A DataFrame for training