pyspark.sql.DataFrame.join¶
-
DataFrame.
join
(other: pyspark.sql.dataframe.DataFrame, on: Union[str, List[str], pyspark.sql.column.Column, List[pyspark.sql.column.Column], None] = None, how: Optional[str] = None) → pyspark.sql.dataframe.DataFrame¶ Joins with another
DataFrame
, using the given join expression.- Parameters
- other
DataFrame
Right side of the join
- onstr, list or
Column
, optional a string for the join column name, a list of column names, a join expression (Column), or a list of Columns. If on is a string or a list of strings indicating the name of the join column(s), the column(s) must exist on both sides, and this performs an equi-join.
- howstr, optional
default
inner
. Must be one of:inner
,cross
,outer
,full
,fullouter
,full_outer
,left
,leftouter
,left_outer
,right
,rightouter
,right_outer
,semi
,leftsemi
,left_semi
,anti
,leftanti
andleft_anti
.
- other
Examples
The following performs a full outer join between
df1
anddf2
.>>> from pyspark.sql.functions import desc >>> df.join(df2, df.name == df2.name, 'outer').select(df.name, df2.height) .sort(desc("name")).collect() [Row(name='Bob', height=85), Row(name='Alice', height=None), Row(name=None, height=80)]
>>> df.join(df2, 'name', 'outer').select('name', 'height').sort(desc("name")).collect() [Row(name='Tom', height=80), Row(name='Bob', height=85), Row(name='Alice', height=None)]
>>> cond = [df.name == df3.name, df.age == df3.age] >>> df.join(df3, cond, 'outer').select(df.name, df3.age).collect() [Row(name='Alice', age=2), Row(name='Bob', age=5)]
>>> df.join(df2, 'name').select(df.name, df2.height).collect() [Row(name='Bob', height=85)]
>>> df.join(df4, ['name', 'age']).select(df.name, df.age).collect() [Row(name='Bob', age=5)]