pyspark.sql.DataFrame.freqItems¶
-
DataFrame.
freqItems
(cols: Union[List[str], Tuple[str]], support: Optional[float] = None) → pyspark.sql.dataframe.DataFrame¶ Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in “https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou”.
DataFrame.freqItems()
andDataFrameStatFunctions.freqItems()
are aliases.- Parameters
- colslist or tuple
Names of the columns to calculate frequent items for as a list or tuple of strings.
- supportfloat, optional
The frequency with which to consider an item ‘frequent’. Default is 1%. The support must be greater than 1e-4.
Notes
This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting
DataFrame
.