ALS¶

class pyspark.mllib.recommendation.ALS¶

Alternating Least Squares matrix factorization

Methods

`train`(ratings, rank[, iterations, lambda_, …])	Train a matrix factorization model given an RDD of ratings by users for a subset of products.
`trainImplicit`(ratings, rank[, iterations, …])	Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products.

Methods Documentation

classmethod train(ratings: Union[pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating], pyspark.rdd.RDD[Tuple[int, int, float]]], rank: int, iterations: int = 5, lambda_: float = 0.01, blocks: int = - 1, nonnegative: bool = False, seed: Optional[int] = None) → pyspark.mllib.recommendation.MatrixFactorizationModel ¶

Train a matrix factorization model given an RDD of ratings by users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

Parameters

ratingspyspark.RDD: RDD of Rating or (userID, productID, rating) tuple.
rankint: Number of features to use (also referred to as the number of latent factors).
iterationsint, optional: Number of iterations of ALS. (default: 5)
lambda_float, optional: Regularization parameter. (default: 0.01)
blocksint, optional: Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
nonnegativebool, optional: A value of True will solve least-squares with nonnegativity constraints. (default: False)
seedbool, optional: Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

classmethod trainImplicit(ratings: Union[pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating], pyspark.rdd.RDD[Tuple[int, int, float]]], rank: int, iterations: int = 5, lambda_: float = 0.01, blocks: int = - 1, alpha: float = 0.01, nonnegative: bool = False, seed: Optional[int] = None) → pyspark.mllib.recommendation.MatrixFactorizationModel ¶

Train a matrix factorization model given an RDD of ‘implicit preferences’ of users for a subset of products. The ratings matrix is approximated as the product of two lower-rank matrices of a given rank (number of features). To solve for these features, ALS is run iteratively with a configurable level of parallelism.

Parameters

ratingspyspark.RDD: RDD of Rating or (userID, productID, rating) tuple.
rankint: Number of features to use (also referred to as the number of latent factors).
iterationsint, optional: Number of iterations of ALS. (default: 5)
lambda_float, optional: Regularization parameter. (default: 0.01)
blocksint, optional: Number of blocks used to parallelize the computation. A value of -1 will use an auto-configured number of blocks. (default: -1)
alphafloat, optional: A constant used in computing confidence. (default: 0.01)
nonnegativebool, optional: A value of True will solve least-squares with nonnegativity constraints. (default: False)
seedint, optional: Random seed for initial matrix factorization model. A value of None will use system time as the seed. (default: None)

MatrixFactorizationModel

Rating