pyspark.pandas.DataFrame.update¶

DataFrame.update(other: pyspark.pandas.frame.DataFrame, join: str = 'left', overwrite: bool = True) → None¶

Modify in place using non-NA values from another DataFrame. Aligns on indices. There is no return value.

Parameters

otherDataFrame, or Series

join‘left’, default ‘left’

Only left join is implemented, keeping the index and columns of the original object.

overwritebool, default True

How to handle non-NA values for overlapping keys:

True: overwrite original DataFrame’s values with values from other.
False: only update values that are NA in the original DataFrame.

Returns

Nonemethod directly changes calling object

See also

DataFrame.merge: For column(s)-on-columns(s) operations.
DataFrame.join: Join columns of another DataFrame.
DataFrame.hint: Specifies some hint on the current DataFrame.
broadcast: Marks a DataFrame as small enough for use in broadcast joins.

Examples

>>> df = ps.DataFrame({'A': [1, 2, 3], 'B': [400, 500, 600]}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': [4, 5, 6], 'C': [7, 8, 9]}, columns=['B', 'C'])
>>> df.update(new_df)
>>> df.sort_index()
   A  B
0  1  4
1  2  5
2  3  6

The DataFrame’s length does not increase as a result of the update, only values at matching index/column labels are updated.

>>> df = ps.DataFrame({'A': ['a', 'b', 'c'], 'B': ['x', 'y', 'z']}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': ['d', 'e', 'f', 'g', 'h', 'i']}, columns=['B'])
>>> df.update(new_df)
>>> df.sort_index()
   A  B
0  a  d
1  b  e
2  c  f

For Series, it’s name attribute must be set.

>>> df = ps.DataFrame({'A': ['a', 'b', 'c'], 'B': ['x', 'y', 'z']}, columns=['A', 'B'])
>>> new_column = ps.Series(['d', 'e'], name='B', index=[0, 2])
>>> df.update(new_column)
>>> df.sort_index()
   A  B
0  a  d
1  b  y
2  c  e

If other contains None the corresponding values are not updated in the original dataframe.

>>> df = ps.DataFrame({'A': [1, 2, 3], 'B': [400, 500, 600]}, columns=['A', 'B'])
>>> new_df = ps.DataFrame({'B': [4, None, 6]}, columns=['B'])
>>> df.update(new_df)
>>> df.sort_index()
   A      B
0  1    4.0
1  2  500.0
2  3    6.0

pyspark.pandas.DataFrame.join

pyspark.pandas.DataFrame.insert