pyspark.sql.Window.rowsBetween¶
- 
static 
Window.rowsBetween(start: int, end: int) → pyspark.sql.window.WindowSpec¶ Creates a
WindowSpecwith the frame boundaries defined, from start (inclusive) to end (inclusive).Both start and end are relative positions from the current row. For example, “0” means “current row”, while “-1” means the row before the current row, and “5” means the fifth row after the current row.
We recommend users use
Window.unboundedPreceding,Window.unboundedFollowing, andWindow.currentRowto specify special boundary values, rather than using integral values directly.A row based boundary is based on the position of the row within the partition. An offset indicates the number of rows above or below the current row, the frame for the current row starts or ends. For instance, given a row based sliding frame with a lower bound offset of -1 and a upper bound offset of +2. The frame for row with index 5 would range from index 4 to index 7.
- Parameters
 - startint
 boundary start, inclusive. The frame is unbounded if this is
Window.unboundedPreceding, or any value less than or equal to -9223372036854775808.- endint
 boundary end, inclusive. The frame is unbounded if this is
Window.unboundedFollowing, or any value greater than or equal to 9223372036854775807.
Examples
>>> from pyspark.sql import Window >>> from pyspark.sql import functions as func >>> from pyspark.sql import SQLContext >>> sc = SparkContext.getOrCreate() >>> sqlContext = SQLContext(sc) >>> tup = [(1, "a"), (1, "a"), (2, "a"), (1, "b"), (2, "b"), (3, "b")] >>> df = sqlContext.createDataFrame(tup, ["id", "category"]) >>> window = Window.partitionBy("category").orderBy("id").rowsBetween(Window.currentRow, 1) >>> df.withColumn("sum", func.sum("id").over(window)).sort("id", "category", "sum").show() +---+--------+---+ | id|category|sum| +---+--------+---+ | 1| a| 2| | 1| a| 3| | 1| b| 3| | 2| a| 2| | 2| b| 5| | 3| b| 3| +---+--------+---+