pyspark.pandas.CategoricalIndex.remove_unused_categories

CategoricalIndex.remove_unused_categories(inplace: bool = False) → Optional[pyspark.pandas.indexes.category.CategoricalIndex]

Remove categories which are not used.

Parameters
inplacebool, default False

Whether or not to drop unused categories inplace or return a copy of this categorical with unused categories dropped.

Returns
catCategoricalIndex or None

Categorical with unused categories dropped or None if inplace=True.

See also

rename_categories

Rename categories.

reorder_categories

Reorder categories.

add_categories

Add new categories.

remove_categories

Remove the specified categories.

set_categories

Set the categories to the specified ones.

Examples

>>> idx = ps.CategoricalIndex(list("abbccc"), categories=['a', 'b', 'c', 'd'])
>>> idx  
CategoricalIndex(['a', 'b', 'b', 'c', 'c', 'c'],
                 categories=['a', 'b', 'c', 'd'], ordered=False, dtype='category')
>>> idx.remove_unused_categories()  
CategoricalIndex(['a', 'b', 'b', 'c', 'c', 'c'],
                 categories=['a', 'b', 'c'], ordered=False, dtype='category')