Spark Session¶

The entry point to programming Spark with the Dataset and DataFrame API. To create a Spark session, you should use SparkSession.builder attribute. See also SparkSession.

`SparkSession.builder.appName`(name)	Sets a name for the application, which will be shown in the Spark web UI.
`SparkSession.builder.config`([key, value, conf])	Sets a config option.
`SparkSession.builder.enableHiveSupport`()	Enables Hive support, including connectivity to a persistent Hive metastore, support for Hive SerDes, and Hive user-defined functions.
`SparkSession.builder.getOrCreate`()	Gets an existing `SparkSession` or, if there is no existing one, creates a new one based on the options set in this builder.
`SparkSession.builder.master`(master)	Sets the Spark master URL to connect to, such as “local” to run locally, “local[4]” to run locally with 4 cores, or “spark://master:7077” to run on a Spark standalone cluster.
`SparkSession.catalog`	Interface through which the user may create, drop, alter or query underlying databases, tables, functions, etc.
`SparkSession.conf`	Runtime configuration interface for Spark.
`SparkSession.createDataFrame`(data[, schema, …])	Creates a `DataFrame` from an `RDD`, a list, a `pandas.DataFrame` or a `numpy.ndarray`.
`SparkSession.getActiveSession`()	Returns the active `SparkSession` for the current thread, returned by the builder
`SparkSession.newSession`()	Returns a new `SparkSession` as new session, that has separate SQLConf, registered temporary views and UDFs, but shared `SparkContext` and table cache.
`SparkSession.range`(start[, end, step, …])	Create a `DataFrame` with single `pyspark.sql.types.LongType` column named `id`, containing elements in a range from `start` to `end` (exclusive) with step value `step`.
`SparkSession.read`	Returns a `DataFrameReader` that can be used to read data in as a `DataFrame`.
`SparkSession.readStream`	Returns a `DataStreamReader` that can be used to read data streams as a streaming `DataFrame`.
`SparkSession.sparkContext`	Returns the underlying `SparkContext`.
`SparkSession.sql`(sqlQuery, args, **kwargs)	Returns a `DataFrame` representing the result of the given query.
`SparkSession.stop`()	Stop the underlying `SparkContext`.
`SparkSession.streams`	Returns a `StreamingQueryManager` that allows managing all the `StreamingQuery` instances active on this context.
`SparkSession.table`(tableName)	Returns the specified table as a `DataFrame`.
`SparkSession.udf`	Returns a `UDFRegistration` for UDF registration.
`SparkSession.version`	The version of Spark on which this application is running.

pyspark.sql.DataFrameWriter

pyspark.sql.SparkSession.builder.appName