Spark Session

The entry point to programming Spark with the Dataset and DataFrame API. To create a Spark session, you should use SparkSession.builder attribute. See also SparkSession.


Sets a name for the application, which will be shown in the Spark web UI.

SparkSession.builder.config([key, value, conf])

Sets a config option.


Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions.


Gets an existing SparkSession or, if there is no existing one, creates a new one based on the options set in this builder.


Sets the Spark master URL to connect to, such as “local” to run locally, “local[4]” to run locally with 4 cores, or “spark://master:7077” to run on a Spark standalone cluster.


Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.


Runtime configuration interface for Spark.

SparkSession.createDataFrame(data[, schema, …])

Creates a DataFrame from an RDD, a list, a pandas.DataFrame or a numpy.ndarray.


Returns the active SparkSession for the current thread, returned by the builder


Returns a new SparkSession as new session, that has separate SQLConf, registered temporary views and UDFs, but shared SparkContext and table cache.

SparkSession.range(start[, end, step, …])

Create a DataFrame with single pyspark.sql.types.LongType column named id, containing elements in a range from start to end (exclusive) with step value step.

Returns a DataFrameReader that can be used to read data in as a DataFrame.


Returns a DataStreamReader that can be used to read data streams as a streaming DataFrame.


Returns the underlying SparkContext.

SparkSession.sql(sqlQuery, args, **kwargs)

Returns a DataFrame representing the result of the given query.


Stop the underlying SparkContext.


Returns a StreamingQueryManager that allows managing all the StreamingQuery instances active on this context.


Returns the specified table as a DataFrame.


Returns a UDFRegistration for UDF registration.


The version of Spark on which this application is running.