RidgeRegressionWithSGD¶

class pyspark.mllib.regression.RidgeRegressionWithSGD¶

Train a regression model with L2-regularization using Stochastic Gradient Descent.

Use pyspark.ml.regression.LinearRegression with elasticNetParam = 0.0. Note the default regParam is 0.01 for RidgeRegressionWithSGD, but is 0.0 for LinearRegression.

Methods

train(data[, iterations, step, regParam, …])

Train a regression model with L2-regularization using Stochastic Gradient Descent.

Methods Documentation

classmethod train(data: pyspark.rdd.RDD[pyspark.mllib.regression.LabeledPoint], iterations: int = 100, step: float = 1.0, regParam: float = 0.01, miniBatchFraction: float = 1.0, initialWeights: Optional[VectorLike] = None, intercept: bool = False, validateData: bool = True, convergenceTol: float = 0.001) → pyspark.mllib.regression.RidgeRegressionModel ¶

Train a regression model with L2-regularization using Stochastic Gradient Descent. This solves the l2-regularized least squares regression formulation

f(weights) = 1/(2n) ||A weights - y||^2 + regParam/2 ||weights||^2

Here the data matrix has n rows, and the input RDD holds the set of rows of A, each with its corresponding right hand side label y. See also the documentation for the precise formulation.

Parameters

datapyspark.RDD: The training data, an RDD of LabeledPoint.
iterationsint, optional: The number of iterations. (default: 100)
stepfloat, optional: The step parameter used in SGD. (default: 1.0)
regParamfloat, optional: The regularizer parameter. (default: 0.01)
miniBatchFractionfloat, optional: Fraction of data to be used for each SGD iteration. (default: 1.0)
initialWeightspyspark.mllib.linalg.Vector or convertible, optional: The initial weights. (default: None)
interceptbool, optional: Boolean parameter which indicates the use or not of the augmented representation for training data (i.e. whether bias features are activated or not). (default: False)
validateDatabool, optional: Boolean parameter which indicates if the algorithm should validate data before training. (default: True)
convergenceTolfloat, optional: A condition which decides iteration termination. (default: 0.001)

RidgeRegressionModel

LassoModel