pyspark.pandas.Series.cat.remove_unused_categories

cat.remove_unused_categories(inplace: bool = False) → Optional[ps.Series]

Remove categories which are not used.

Parameters
inplacebool, default False

Whether or not to drop unused categories inplace or return a copy of this categorical with unused categories dropped.

Returns
catSeries or None

Categorical with unused categories dropped or None if inplace=True.

See also

rename_categories

Rename categories.

reorder_categories

Reorder categories.

add_categories

Add new categories.

remove_categories

Remove the specified categories.

set_categories

Set the categories to the specified ones.

Examples

>>> s = ps.Series(pd.Categorical(list("abbccc"), categories=['a', 'b', 'c', 'd']))
>>> s  
0    a
1    b
2    b
3    c
4    c
5    c
dtype: category
Categories (4, object): ['a', 'b', 'c', 'd']
>>> s.cat.remove_unused_categories()  
0    a
1    b
2    b
3    c
4    c
5    c
dtype: category
Categories (3, object): ['a', 'b', 'c']