MatrixFactorizationModel

class pyspark.mllib.recommendation.MatrixFactorizationModel(java_model: py4j.java_gateway.JavaObject)

A matrix factorisation model trained by regularized alternating least-squares.

Examples

>>> r1 = (1, 1, 1.0)
>>> r2 = (1, 2, 2.0)
>>> r3 = (2, 1, 2.0)
>>> ratings = sc.parallelize([r1, r2, r3])
>>> model = ALS.trainImplicit(ratings, 1, seed=10)
>>> model.predict(2, 2)
0.4...
>>> testset = sc.parallelize([(1, 2), (1, 1)])
>>> model = ALS.train(ratings, 2, seed=0)
>>> model.predictAll(testset).collect()
[Rating(user=1, product=1, rating=1.0...), Rating(user=1, product=2, rating=1.9...)]
>>> model = ALS.train(ratings, 4, seed=10)
>>> model.userFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
>>> model.recommendUsers(1, 2)
[Rating(user=2, product=1, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.recommendProducts(1, 2)
[Rating(user=1, product=2, rating=1.9...), Rating(user=1, product=1, rating=1.0...)]
>>> model.rank
4
>>> first_user = model.userFeatures().take(1)[0]
>>> latents = first_user[1]
>>> len(latents)
4
>>> model.productFeatures().collect()
[(1, array('d', [...])), (2, array('d', [...]))]
>>> first_product = model.productFeatures().take(1)[0]
>>> latents = first_product[1]
>>> len(latents)
4
>>> products_for_users = model.recommendProductsForUsers(1).collect()
>>> len(products_for_users)
2
>>> products_for_users[0]
(1, (Rating(user=1, product=2, rating=...),))
>>> users_for_products = model.recommendUsersForProducts(1).collect()
>>> len(users_for_products)
2
>>> users_for_products[0]
(1, (Rating(user=2, product=1, rating=...),))
>>> model = ALS.train(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
3.73...
>>> df = sqlContext.createDataFrame([Rating(1, 1, 1.0), Rating(1, 2, 2.0), Rating(2, 1, 2.0)])
>>> model = ALS.train(df, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
3.73...
>>> model = ALS.trainImplicit(ratings, 1, nonnegative=True, seed=123456789)
>>> model.predict(2, 2)
0.4...
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)
>>> sameModel = MatrixFactorizationModel.load(sc, path)
>>> sameModel.predict(2, 2)
0.4...
>>> sameModel.predictAll(testset).collect()
[Rating(...
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass

Methods

call(name, *a)

Call method of java_model

load(sc, path)

Load a model from the given path

predict(user, product)

Predicts rating for the given user and product.

predictAll(user_product)

Returns a list of predicted ratings for input user and product pairs.

productFeatures()

Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

recommendProducts(user, num)

Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendProductsForUsers(num)

Recommends the top “num” number of products for all users.

recommendUsers(product, num)

Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendUsersForProducts(num)

Recommends the top “num” number of users for all products.

save(sc, path)

Save this model to the given path.

userFeatures()

Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

Attributes

rank

Rank for the features in this model

Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

classmethod load(sc: pyspark.context.SparkContext, path: str)pyspark.mllib.recommendation.MatrixFactorizationModel

Load a model from the given path

predict(user: int, product: int) → float

Predicts rating for the given user and product.

predictAll(user_product: pyspark.rdd.RDD[Tuple[int, int]]) → pyspark.rdd.RDD[pyspark.mllib.recommendation.Rating]

Returns a list of predicted ratings for input user and product pairs.

productFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]]

Returns a paired RDD, where the first element is the product and the second is an array of features corresponding to that product.

recommendProducts(user: int, num: int) → List[pyspark.mllib.recommendation.Rating]

Recommends the top “num” number of products for a given user and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendProductsForUsers(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]]

Recommends the top “num” number of products for all users. The number of recommendations returned per user may be less than “num”.

recommendUsers(product: int, num: int) → List[pyspark.mllib.recommendation.Rating]

Recommends the top “num” number of users for a given product and returns a list of Rating objects sorted by the predicted rating in descending order.

recommendUsersForProducts(num: int) → pyspark.rdd.RDD[Tuple[int, Tuple[pyspark.mllib.recommendation.Rating, …]]]

Recommends the top “num” number of users for all products. The number of recommendations returned per product may be less than “num”.

save(sc: pyspark.context.SparkContext, path: str) → None

Save this model to the given path.

userFeatures() → pyspark.rdd.RDD[Tuple[int, array.array]]

Returns a paired RDD, where the first element is the user and the second is an array of features corresponding to that user.

Attributes Documentation

rank

Rank for the features in this model