pyspark.pandas.DataFrame.last_valid_index¶

DataFrame.last_valid_index() → Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Tuple[Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None], …]]¶

Return index for last non-NA/null value.

Returns

scalar, tuple, or None

Notes

This API only works with PySpark >= 3.0.

Examples

Support for DataFrame

>>> psdf = ps.DataFrame({'a': [1, 2, 3, None],
...                     'b': [1.0, 2.0, 3.0, None],
...                     'c': [100, 200, 400, None]},
...                     index=['Q', 'W', 'E', 'R'])
>>> psdf
     a    b      c
Q  1.0  1.0  100.0
W  2.0  2.0  200.0
E  3.0  3.0  400.0
R  NaN  NaN    NaN

>>> psdf.last_valid_index()  
'E'

Support for MultiIndex columns

>>> psdf.columns = pd.MultiIndex.from_tuples([('a', 'x'), ('b', 'y'), ('c', 'z')])
>>> psdf
     a    b      c
     x    y      z
Q  1.0  1.0  100.0
W  2.0  2.0  200.0
E  3.0  3.0  400.0
R  NaN  NaN    NaN

>>> psdf.last_valid_index()  
'E'

Support for Series.

>>> s = ps.Series([1, 2, 3, None, None], index=[100, 200, 300, 400, 500])
>>> s
100    1.0
200    2.0
300    3.0
400    NaN
500    NaN
dtype: float64

>>> s.last_valid_index()  
300

Support for MultiIndex

>>> midx = pd.MultiIndex([['lama', 'cow', 'falcon'],
...                       ['speed', 'weight', 'length']],
...                      [[0, 0, 0, 1, 1, 1, 2, 2, 2],
...                       [0, 1, 2, 0, 1, 2, 0, 1, 2]])
>>> s = ps.Series([250, 1.5, 320, 1, 0.3, None, None, None, None], index=midx)
>>> s
lama    speed     250.0
        weight      1.5
        length    320.0
cow     speed       1.0
        weight      0.3
        length      NaN
falcon  speed       NaN
        weight      NaN
        length      NaN
dtype: float64

>>> s.last_valid_index()  
('cow', 'weight')

pyspark.pandas.DataFrame.first_valid_index

pyspark.pandas.DataFrame.from_records