PowerIterationClusteringModel

class pyspark.mllib.clustering.PowerIterationClusteringModel(java_model: py4j.java_gateway.JavaObject)

Model produced by PowerIterationClustering.

Examples

>>> import math
>>> def genCircle(r, n):
...     points = []
...     for i in range(0, n):
...         theta = 2.0 * math.pi * i / n
...         points.append((r * math.cos(theta), r * math.sin(theta)))
...     return points
>>> def sim(x, y):
...     dist2 = (x[0] - y[0]) * (x[0] - y[0]) + (x[1] - y[1]) * (x[1] - y[1])
...     return math.exp(-dist2 / 2.0)
>>> r1 = 1.0
>>> n1 = 10
>>> r2 = 4.0
>>> n2 = 40
>>> n = n1 + n2
>>> points = genCircle(r1, n1) + genCircle(r2, n2)
>>> similarities = [(i, j, sim(points[i], points[j])) for i in range(1, n) for j in range(0, i)]
>>> rdd = sc.parallelize(similarities, 2)
>>> model = PowerIterationClustering.train(rdd, 2, 40)
>>> model.k
2
>>> result = sorted(model.assignments().collect(), key=lambda x: x.id)
>>> result[0].cluster == result[1].cluster == result[2].cluster == result[3].cluster
True
>>> result[4].cluster == result[5].cluster == result[6].cluster == result[7].cluster
True
>>> import os, tempfile
>>> path = tempfile.mkdtemp()
>>> model.save(sc, path)
>>> sameModel = PowerIterationClusteringModel.load(sc, path)
>>> sameModel.k
2
>>> result = sorted(model.assignments().collect(), key=lambda x: x.id)
>>> result[0].cluster == result[1].cluster == result[2].cluster == result[3].cluster
True
>>> result[4].cluster == result[5].cluster == result[6].cluster == result[7].cluster
True
>>> from shutil import rmtree
>>> try:
...     rmtree(path)
... except OSError:
...     pass

Methods

assignments()

Returns the cluster assignments of this model.

call(name, *a)

Call method of java_model

load(sc, path)

Load a model from the given path.

save(sc, path)

Save this model to the given path.

Attributes

k

Returns the number of clusters.

Methods Documentation

assignments() → pyspark.rdd.RDD[pyspark.mllib.clustering.PowerIterationClustering.Assignment]

Returns the cluster assignments of this model.

call(name: str, *a: Any) → Any

Call method of java_model

classmethod load(sc: pyspark.context.SparkContext, path: str)pyspark.mllib.clustering.PowerIterationClusteringModel

Load a model from the given path.

save(sc: pyspark.context.SparkContext, path: str) → None

Save this model to the given path.

Attributes Documentation

k

Returns the number of clusters.