
class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model: py4j.java_gateway.JavaObject)

A matrix factorisation model trained by regularized alternating least-squares.


>>> r1 = (1, 1, 1.0)
>>> r2 = (1, 2, 2.0)
>>> r3 = (2, 1, 2.0)
>>> ratings = sc.parallelize([r1, r2, r3])
>>> model = ALS.trainImplicit(ratings, 1, seed=10)
>>> model.predict(2, 2)
>>> testset = sc.parallelize([(1, 2), (1, 1)])
>>> model = ALS.train(ratings, 2, seed=0)
>>> model.predictAll(testset).collect()
[Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]
>>> model = ALS.train(ratings, 4, seed=10)
>>> model.userFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
>>> model.recommendUsers(1, 2)
[Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.recommendProducts(1, 2)
[Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.rank
>>> first_user = model.userFeatures().take(1)[0]
>>> latents = first_user[1]
>>> len(latents)
>>> model.productFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
>>> first_product = model.productFeatures().take(1)[0]
>>> latents = first_product[1]
>>> len(latents)
>>> products_for_users = model.recommendProductsForUsers(1).collect()
>>> len(products_for_users)
>>> products_for_users[0]
(1, (Rating(user=1, product=2, rating=...),))
>>> users_for_products = model.recommendUsersForProducts(1).collect()
>>> len(users_for_products)
>>> users_for_products[0]
(1, (Rating(user=2, product=1, rating=...),))
>>> model = ALS.train(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)])
>>> model = ALS.train(df, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>>, path)
>>> sameModel = MatrixFactorizationModel.load(sc, path)
>>> sameModel.predict(2, 2)
>>> sameModel.predictAll(testset).collect()
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass


Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

classmethod load(sc: pyspark.context.SparkContext, path: str)pyspark.mllib.recommendation.MatrixFactorizationModel

Load a model from the given path

predict(user: int, product: int) → float

Predicts rating for the given user and product.

predictAll(user_product: pyspark.rdd.RDD[Tuple[int, int]]) → pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating]

Returns a list of predicted ratings for input user and product pairs.

productFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]]

Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

recommendProducts(user: int, num: int) → List[pyspark.mllib.recommendation.Rating]

Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendProductsForUsers(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]]

Recommends the top “num” number of products for all users. The number of recommendations returned per user may be less than “num”.

recommendUsers(product: int, num: int) → List[pyspark.mllib.recommendation.Rating]

Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendUsersForProducts(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]]

Recommends the top “num” number of users for all products. The number of recommendations returned per product may be less than “num”.

save(sc: pyspark.context.SparkContext, path: str) → None

Save this model to the given path.

userFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]]

Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

Attributes Documentation


Rank for the features in this model