NaiveBayes

class pyspark.mllib.classification.NaiveBayes

Train a Multinomial Naive Bayes model.

Methods

train(data[, lambda_])

Train a Naive Bayes model given an RDD of (label, features) vectors.

Methods Documentation

classmethod train(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], lambda_: float = 1.0)pyspark.mllib.classification.NaiveBayesModel

Train a Naive Bayes model given an RDD of (label, features) vectors.

This is the Multinomial NB which can handle all kinds of discrete data. For example, by converting documents into TF-IDF vectors, it can be used for document classification. By making every vector a 0-1 vector, it can also be used as Bernoulli NB. The input feature values must be nonnegative.

Parameters
datapyspark.RDD

The training data, an RDD of pyspark.mllib.regression.LabeledPoint.

lambda_float, optional

The smoothing parameter. (default: 1.0)