LogisticRegressionWithSGD¶

class pyspark.mllib.classification.LogisticRegressionWithSGD¶

Train a classification model for Binary Logistic Regression using Stochastic Gradient Descent.

Use ml.classification.LogisticRegression or LogisticRegressionWithLBFGS.

Methods

train(data[, iterations, step, …])

Train a logistic regression model on the given data.

Methods Documentation

classmethod train(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], iterations: int = 100, step: float = 1.0, miniBatchFraction: float = 1.0, initialWeights: Optional[VectorLike] = None, regParam: float = 0.01, regType: str = 'l2', intercept: bool = False, validateData: bool = True, convergenceTol: float = 0.001) → pyspark.mllib.classification.LogisticRegressionModel ¶

Train a logistic regression model on the given data.

Parameters

datapyspark.RDD

The training data, an RDD of pyspark.mllib.regression.LabeledPoint.

iterationsint, optional

The number of iterations. (default: 100)

stepfloat, optional

The step parameter used in SGD. (default: 1.0)

miniBatchFractionfloat, optional

Fraction of data to be used for each SGD iteration. (default: 1.0)

initialWeightspyspark.mllib.linalg.Vector or convertible, optional

The initial weights. (default: None)

regParamfloat, optional

The regularizer parameter. (default: 0.01)

regTypestr, optional

The type of regularizer used for training our model. Supported values:

“l1” for using L1 regularization
“l2” for using L2 regularization (default)
None for no regularization

interceptbool, optional

Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False)

validateDatabool, optional

Boolean parameter which indicates if the algorithm should validate data before training. (default: True)

convergenceTolfloat, optional

A condition which decides iteration termination. (default: 0.001)

LogisticRegressionModel

LogisticRegressionWithLBFGS