pyspark.sql.DataFrame.filter

DataFrame.filter(condition: ColumnOrName) → DataFrame

Filters rows using the given condition.

where() is an alias for filter().

Parameters
conditionColumn or str

a Column of types.BooleanType or a string of SQL expression.

Examples

>>> df.filter(df.age > 3).collect()
[Row(age=5, name='Bob')]
>>> df.where(df.age == 2).collect()
[Row(age=2, name='Alice')]
>>> df.filter("age > 3").collect()
[Row(age=5, name='Bob')]
>>> df.where("age = 2").collect()
[Row(age=2, name='Alice')]