pyspark.pandas.broadcast

pyspark.pandas.broadcast(obj: pyspark.pandas.frame.DataFrame) → pyspark.pandas.frame.DataFrame

Marks a DataFrame as small enough for use in broadcast joins.

Parameters
objDataFrame
Returns
retDataFrame with broadcast hint.

See also

DataFrame.merge

Merge DataFrame objects with a database-style join.

DataFrame.join

Join columns of another DataFrame.

DataFrame.update

Modify in place using non-NA values from another DataFrame.

DataFrame.hint

Specifies some hint on the current DataFrame.

Examples

>>> df1 = ps.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [1, 2, 3, 5]},
...                    columns=['lkey', 'value']).set_index('lkey')
>>> df2 = ps.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
...                     'value': [5, 6, 7, 8]},
...                    columns=['rkey', 'value']).set_index('rkey')
>>> merged = df1.merge(ps.broadcast(df2), left_index=True, right_index=True)
>>> merged.spark.explain()  
== Physical Plan ==
...
...BroadcastHashJoin...
...