CoordinateMatrix¶

class pyspark.mllib.linalg.distributed.CoordinateMatrix(entries: pyspark.rdd.RDD[Union[Tuple[int, int, float], pyspark.mllib.linalg.distributed.MatrixEntry]], numRows: int = 0, numCols: int = 0)¶

Represents a matrix in coordinate format.

Parameters

entriespyspark.RDD: An RDD of MatrixEntry inputs or (int, int, float) tuples.
numRowsint, optional: Number of rows in the matrix. A non-positive value means unknown, at which point the number of rows will be determined by the max row index plus one.
numColsint, optional: Number of columns in the matrix. A non-positive value means unknown, at which point the number of columns will be determined by the max row index plus one.

Methods

`numCols`()	Get or compute the number of cols.
`numRows`()	Get or compute the number of rows.
`toBlockMatrix`([rowsPerBlock, colsPerBlock])	Convert this matrix to a BlockMatrix.
`toIndexedRowMatrix`()	Convert this matrix to an IndexedRowMatrix.
`toRowMatrix`()	Convert this matrix to a RowMatrix.
`transpose`()	Transpose this CoordinateMatrix.

Attributes

entries

Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.

Methods Documentation

numCols() → int¶

Get or compute the number of cols.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])

>>> mat = CoordinateMatrix(entries)
>>> print(mat.numCols())
2

>>> mat = CoordinateMatrix(entries, 7, 6)
>>> print(mat.numCols())
6

numRows() → int¶

Get or compute the number of rows.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])

>>> mat = CoordinateMatrix(entries)
>>> print(mat.numRows())
3

>>> mat = CoordinateMatrix(entries, 7, 6)
>>> print(mat.numRows())
7

toBlockMatrix(rowsPerBlock: int = 1024, colsPerBlock: int = 1024) → pyspark.mllib.linalg.distributed.BlockMatrix ¶

Convert this matrix to a BlockMatrix.

Parameters

rowsPerBlockint, optional: Number of rows that make up each block. The blocks forming the final rows are not required to have the given number of rows.
colsPerBlockint, optional: Number of columns that make up each block. The blocks forming the final columns are not required to have the given number of columns.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toBlockMatrix()

>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, and the ensuing
>>> # BlockMatrix will have 7 rows as well.
>>> print(mat.numRows())
7

>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing
>>> # BlockMatrix will have 5 columns as well.
>>> print(mat.numCols())
5

toIndexedRowMatrix() → pyspark.mllib.linalg.distributed.IndexedRowMatrix ¶

Convert this matrix to an IndexedRowMatrix.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toIndexedRowMatrix()

>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, and the ensuing
>>> # IndexedRowMatrix will have 7 rows as well.
>>> print(mat.numRows())
7

>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing
>>> # IndexedRowMatrix will have 5 columns as well.
>>> print(mat.numCols())
5

toRowMatrix() → pyspark.mllib.linalg.distributed.RowMatrix ¶

Convert this matrix to a RowMatrix.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(6, 4, 2.1)])
>>> mat = CoordinateMatrix(entries).toRowMatrix()

>>> # This CoordinateMatrix will have 7 effective rows, due to
>>> # the highest row index being 6, but the ensuing RowMatrix
>>> # will only have 2 rows since there are only entries on 2
>>> # unique rows.
>>> print(mat.numRows())
2

>>> # This CoordinateMatrix will have 5 columns, due to the
>>> # highest column index being 4, and the ensuing RowMatrix
>>> # will have 5 columns as well.
>>> print(mat.numCols())
5

transpose() → pyspark.mllib.linalg.distributed.CoordinateMatrix ¶

Transpose this CoordinateMatrix.

Examples

>>> entries = sc.parallelize([MatrixEntry(0, 0, 1.2),
...                           MatrixEntry(1, 0, 2),
...                           MatrixEntry(2, 1, 3.7)])
>>> mat = CoordinateMatrix(entries)
>>> mat_transposed = mat.transpose()

>>> print(mat_transposed.numRows())
2

>>> print(mat_transposed.numCols())
3

Attributes Documentation

entries¶

Entries of the CoordinateMatrix stored as an RDD of MatrixEntries.

Examples

>>> mat = CoordinateMatrix(sc.parallelize([MatrixEntry(0, 0, 1.2),
...                                        MatrixEntry(6, 4, 2.1)]))
>>> entries = mat.entries
>>> entries.first()
MatrixEntry(0, 0, 1.2)

BlockMatrix

DistributedMatrix