class pyspark.mllib.clustering.GaussianMixture

Learning algorithm for Gaussian Mixtures using the expectation-maximization algorithm.


train(rdd, k[, convergenceTol, …])

Train a Gaussian Mixture clustering model.

Methods Documentation

classmethod train(rdd: pyspark.rdd.RDD[VectorLike], k: int, convergenceTol: float = 0.001, maxIterations: int = 100, seed: Optional[int] = None, initialModel: Optional[pyspark.mllib.clustering.GaussianMixtureModel] = None)pyspark.mllib.clustering.GaussianMixtureModel

Train a Gaussian Mixture clustering model.


Training points as an RDD of pyspark.mllib.linalg.Vector or convertible sequence types.


Number of independent Gaussians in the mixture model.

convergenceTolfloat, optional

Maximum change in log-likelihood at which convergence is considered to have occurred. (default: 1e-3)

maxIterationsint, optional

Maximum number of iterations allowed. (default: 100)

seedint, optional

Random seed for initial Gaussian distribution. Set as None to generate seed based on system time. (default: None)

initialModelGaussianMixtureModel, optional

Initial GMM starting point, bypassing the random initialization. (default: None)