LinearDataGenerator¶
-
class
pyspark.mllib.util.LinearDataGenerator¶ Utils for generating linear data.
Methods
generateLinearInput(intercept, weights, …)- Parameters
generateLinearRDD(sc, nexamples, nfeatures, eps)Generate an RDD of LabeledPoints.
Methods Documentation
-
static
generateLinearInput(intercept: float, weights: VectorLike, xMean: VectorLike, xVariance: VectorLike, nPoints: int, seed: int, eps: float) → List[LabeledPoint]¶ - Parameters
- interceptfloat
bias factor, the term c in X’w + c
- weights
pyspark.mllib.linalg.Vectoror convertible feature vector, the term w in X’w + c
- xMean
pyspark.mllib.linalg.Vectoror convertible Point around which the data X is centered.
- xVariance
pyspark.mllib.linalg.Vectoror convertible Variance of the given data
- nPointsint
Number of points to be generated
- seedint
Random Seed
- epsfloat
Used to scale the noise. If eps is set high, the amount of gaussian noise added is more.
- Returns
- list
of
pyspark.mllib.regression.LabeledPointsof length nPoints
-
static
generateLinearRDD(sc: pyspark.context.SparkContext, nexamples: int, nfeatures: int, eps: float, nParts: int = 2, intercept: float = 0.0) → pyspark.rdd.RDD[LabeledPoint]¶ Generate an RDD of LabeledPoints.