LinearDataGenerator¶
-
class
pyspark.mllib.util.
LinearDataGenerator
¶ Utils for generating linear data.
Methods
generateLinearInput
(intercept, weights, …)- Parameters
generateLinearRDD
(sc, nexamples, nfeatures, eps)Generate an RDD of LabeledPoints.
Methods Documentation
-
static
generateLinearInput
(intercept: float, weights: VectorLike, xMean: VectorLike, xVariance: VectorLike, nPoints: int, seed: int, eps: float) → List[LabeledPoint]¶ - Parameters
- interceptfloat
bias factor, the term c in X’w + c
- weights
pyspark.mllib.linalg.Vector
or convertible feature vector, the term w in X’w + c
- xMean
pyspark.mllib.linalg.Vector
or convertible Point around which the data X is centered.
- xVariance
pyspark.mllib.linalg.Vector
or convertible Variance of the given data
- nPointsint
Number of points to be generated
- seedint
Random Seed
- epsfloat
Used to scale the noise. If eps is set high, the amount of gaussian noise added is more.
- Returns
- list
of
pyspark.mllib.regression.LabeledPoints
of length nPoints
-
static
generateLinearRDD
(sc: pyspark.context.SparkContext, nexamples: int, nfeatures: int, eps: float, nParts: int = 2, intercept: float = 0.0) → pyspark.rdd.RDD[LabeledPoint]¶ Generate an RDD of LabeledPoints.