BisectingKMeansModel¶

class pyspark.mllib.clustering.BisectingKMeansModel(java_model: JavaObject)¶

A clustering model derived from the bisecting k-means method.

Examples

>>> data = array([0.0,0.0, 1.0,1.0, 9.0,8.0, 8.0,9.0]).reshape(4, 2)
>>> bskm = BisectingKMeans()
>>> model = bskm.train(sc.parallelize(data, 2), k=4)
>>> p = array([0.0, 0.0])
>>> model.predict(p)
0
>>> model.k
4
>>> model.computeCost(p)
0.0

Methods

`call`(name, *a)	Call method of java_model
`computeCost`(x)	Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data.
`predict`(x)	Find the cluster that each of the points belongs to in this model.

Attributes

`clusterCenters`	Get the cluster centers, represented as a list of NumPy arrays.
`k`	Get the number of clusters

Methods Documentation

call(name: str, *a: Any) → Any¶: Call method of java_model

computeCost(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → float¶

Return the Bisecting K-means cost (sum of squared distances of points to their nearest center) for this model on the given data. If provided with an RDD of points returns the sum.

Parameters

pointpyspark.mllib.linalg.Vector or pyspark.RDD: A data point (or RDD of points) to compute the cost(s). pyspark.mllib.linalg.Vector can be replaced with equivalent objects (list, tuple, numpy.ndarray).

predict(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[int, pyspark.rdd.RDD[int]]¶

Find the cluster that each of the points belongs to in this model.

Parameters

xpyspark.mllib.linalg.Vector or pyspark.RDD: A data point (or RDD of points) to determine cluster index. pyspark.mllib.linalg.Vector can be replaced with equivalent objects (list, tuple, numpy.ndarray).

Returns

int or pyspark.RDD of int: Predicted cluster index or an RDD of predicted cluster indices if the input is an RDD.

Attributes Documentation

clusterCenters¶: Get the cluster centers, represented as a list of NumPy arrays.

k¶: Get the number of clusters

StreamingLogisticRegressionWithSGD

BisectingKMeans