LinearDataGenerator

class pyspark.mllib.util.LinearDataGenerator

Utils for generating linear data.

Methods

generateLinearInput(intercept, weights, …)

Parameters

generateLinearRDD(sc, nexamples, nfeatures, eps)

Generate an RDD of LabeledPoints.

Methods Documentation

static generateLinearInput(intercept: float, weights: VectorLike, xMean: VectorLike, xVariance: VectorLike, nPoints: int, seed: int, eps: float) → List[LabeledPoint]
Parameters
interceptfloat

bias factor, the term c in X’w + c

weightspyspark.mllib.linalg.Vector or convertible

feature vector, the term w in X’w + c

xMeanpyspark.mllib.linalg.Vector or convertible

Point around which the data X is centered.

xVariancepyspark.mllib.linalg.Vector or convertible

Variance of the given data

nPointsint

Number of points to be generated

seedint

Random Seed

epsfloat

Used to scale the noise. If eps is set high, the amount of gaussian noise added is more.

Returns
list

of pyspark.mllib.regression.LabeledPoints of length nPoints

static generateLinearRDD(sc: pyspark.context.SparkContext, nexamples: int, nfeatures: int, eps: float, nParts: int = 2, intercept: float = 0.0) → pyspark.rdd.RDD[LabeledPoint]

Generate an RDD of LabeledPoints.