pyspark.pandas.DataFrame.ffill¶
-
DataFrame.
ffill
(axis: Union[int, str, None] = None, inplace: bool = False, limit: Optional[int] = None) → FrameLike¶ Synonym for DataFrame.fillna() or Series.fillna() with
method=`ffill`
.Note
the current implementation of ‘ffill’ uses Spark’s Window without specifying partition specification. This leads to move all data into single partition in single machine and could cause serious performance degradation. Avoid this method against very large dataset.
- Parameters
- axis{0 or index}
1 and columns are not supported.
- inplaceboolean, default False
Fill in place (do not create a new object)
- limitint, default None
If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None
- Returns
- DataFrame or Series
DataFrame or Series with NA entries filled.
Examples
>>> psdf = ps.DataFrame({ ... 'A': [None, 3, None, None], ... 'B': [2, 4, None, 3], ... 'C': [None, None, None, 1], ... 'D': [0, 1, 5, 4] ... }, ... columns=['A', 'B', 'C', 'D']) >>> psdf A B C D 0 NaN 2.0 NaN 0 1 3.0 4.0 NaN 1 2 NaN NaN NaN 5 3 NaN 3.0 1.0 4
Propagate non-null values forward.
>>> psdf.ffill() A B C D 0 NaN 2.0 NaN 0 1 3.0 4.0 NaN 1 2 3.0 4.0 NaN 5 3 3.0 3.0 1.0 4
For Series
>>> psser = ps.Series([2, 4, None, 3]) >>> psser 0 2.0 1 4.0 2 NaN 3 3.0 dtype: float64
>>> psser.ffill() 0 2.0 1 4.0 2 4.0 3 3.0 dtype: float64