object Statistics
API for statistical functions in MLlib.
 Annotations
 @Since( "1.1.0" )
 Alphabetic
 By Inheritance
 Statistics
 AnyRef
 Any
 Hide All
 Show All
 Public
 All
Value Members

final
def
!=(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

final
def
##(): Int
 Definition Classes
 AnyRef → Any

final
def
==(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

final
def
asInstanceOf[T0]: T0
 Definition Classes
 Any

def
chiSqTest(data: JavaRDD[LabeledPoint]): Array[ChiSqTestResult]
Javafriendly version of
chiSqTest()
Javafriendly version of
chiSqTest()
 Annotations
 @Since( "1.5.0" )

def
chiSqTest(data: RDD[LabeledPoint]): Array[ChiSqTestResult]
Conduct Pearson's independence test for every feature against the label across the input RDD.
Conduct Pearson's independence test for every feature against the label across the input RDD. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the chisquared statistic is computed. All label and feature values must be categorical.
 data
an
RDD[LabeledPoint]
containing the labeled dataset with categorical features. Realvalued features will be treated as categorical for each distinct value. returns
an array containing the ChiSquaredTestResult for every feature against the label. The order of the elements in the returned array reflects the order of input features.
 Annotations
 @Since( "1.1.0" )

def
chiSqTest(observed: Matrix): ChiSqTestResult
Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0.
Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0.
 observed
The contingency matrix (containing either counts or relative frequencies).
 returns
ChiSquaredTest object containing the test statistic, degrees of freedom, pvalue, the method used, and the null hypothesis.
 Annotations
 @Since( "1.1.0" )

def
chiSqTest(observed: Vector): ChiSqTestResult
Conduct Pearson's chisquared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of
1 / observed.size
.Conduct Pearson's chisquared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of
1 / observed.size
. observed
Vector containing the observed categorical counts/relative frequencies.
 returns
ChiSquaredTest object containing the test statistic, degrees of freedom, pvalue, the method used, and the null hypothesis.
 Annotations
 @Since( "1.1.0" )
 Note
observed
cannot contain negative values.

def
chiSqTest(observed: Vector, expected: Vector): ChiSqTestResult
Conduct Pearson's chisquared goodness of fit test of the observed data against the expected distribution.
Conduct Pearson's chisquared goodness of fit test of the observed data against the expected distribution.
 observed
Vector containing the observed categorical counts/relative frequencies.
 expected
Vector containing the expected categorical counts/relative frequencies.
expected
is rescaled if theexpected
sum differs from theobserved
sum. returns
ChiSquaredTest object containing the test statistic, degrees of freedom, pvalue, the method used, and the null hypothesis.
 Annotations
 @Since( "1.1.0" )
 Note
The two input Vectors need to have the same size.
observed
cannot contain negative values.expected
cannot contain nonpositive values.

def
clone(): AnyRef
 Attributes
 protected[lang]
 Definition Classes
 AnyRef
 Annotations
 @throws( ... ) @native()

def
colStats(X: RDD[Vector]): MultivariateStatisticalSummary
Computes columnwise summary statistics for the input RDD[Vector].
Computes columnwise summary statistics for the input RDD[Vector].
 X
an RDD[Vector] for which columnwise summary statistics are to be computed.
 returns
MultivariateStatisticalSummary object containing columnwise summary statistics.
 Annotations
 @Since( "1.1.0" )

def
corr(x: JavaRDD[Double], y: JavaRDD[Double], method: String): Double
Javafriendly version of
corr()
Javafriendly version of
corr()
 Annotations
 @Since( "1.4.1" )

def
corr(x: RDD[Double], y: RDD[Double], method: String): Double
Compute the correlation for the input RDDs using the specified method.
Compute the correlation for the input RDDs using the specified method. Methods currently supported:
pearson
(default),spearman
. x
RDD[Double] of the same cardinality as y.
 y
RDD[Double] of the same cardinality as x.
 method
String specifying the method to use for computing correlation. Supported:
pearson
(default),spearman
 returns
A Double containing the correlation between the two input RDD[Double]s using the specified method.
 Annotations
 @Since( "1.1.0" )
 Note
The two input RDDs need to have the same number of partitions and the same number of elements in each partition.

def
corr(x: JavaRDD[Double], y: JavaRDD[Double]): Double
Javafriendly version of
corr()
Javafriendly version of
corr()
 Annotations
 @Since( "1.4.1" )

def
corr(x: RDD[Double], y: RDD[Double]): Double
Compute the Pearson correlation for the input RDDs.
Compute the Pearson correlation for the input RDDs. Returns NaN if either vector has 0 variance.
 x
RDD[Double] of the same cardinality as y.
 y
RDD[Double] of the same cardinality as x.
 returns
A Double containing the Pearson correlation between the two input RDD[Double]s
 Annotations
 @Since( "1.1.0" )
 Note
The two input RDDs need to have the same number of partitions and the same number of elements in each partition.

def
corr(X: RDD[Vector], method: String): Matrix
Compute the correlation matrix for the input RDD of Vectors using the specified method.
Compute the correlation matrix for the input RDD of Vectors using the specified method. Methods currently supported:
pearson
(default),spearman
. X
an RDD[Vector] for which the correlation matrix is to be computed.
 method
String specifying the method to use for computing correlation. Supported:
pearson
(default),spearman
 returns
Correlation matrix comparing columns in X.
 Annotations
 @Since( "1.1.0" )
 Note
For Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input RDD before calling corr with
method = "spearman"
to avoid recomputing the common lineage.

def
corr(X: RDD[Vector]): Matrix
Compute the Pearson correlation matrix for the input RDD of Vectors.
Compute the Pearson correlation matrix for the input RDD of Vectors. Columns with 0 covariance produce NaN entries in the correlation matrix.
 X
an RDD[Vector] for which the correlation matrix is to be computed.
 returns
Pearson correlation matrix comparing columns in X.
 Annotations
 @Since( "1.1.0" )

final
def
eq(arg0: AnyRef): Boolean
 Definition Classes
 AnyRef

def
equals(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

def
finalize(): Unit
 Attributes
 protected[lang]
 Definition Classes
 AnyRef
 Annotations
 @throws( classOf[java.lang.Throwable] )

final
def
getClass(): Class[_]
 Definition Classes
 AnyRef → Any
 Annotations
 @native()

def
hashCode(): Int
 Definition Classes
 AnyRef → Any
 Annotations
 @native()

final
def
isInstanceOf[T0]: Boolean
 Definition Classes
 Any

def
kolmogorovSmirnovTest(data: JavaDoubleRDD, distName: String, params: Double*): KolmogorovSmirnovTestResult
Javafriendly version of
kolmogorovSmirnovTest()
Javafriendly version of
kolmogorovSmirnovTest()
 Annotations
 @Since( "1.5.0" ) @varargs()

def
kolmogorovSmirnovTest(data: RDD[Double], distName: String, params: Double*): KolmogorovSmirnovTestResult
Convenience function to conduct a onesample, twosided KolmogorovSmirnov test for probability distribution equality.
Convenience function to conduct a onesample, twosided KolmogorovSmirnov test for probability distribution equality. Currently supports the normal distribution, taking as parameters the mean and standard deviation. (distName = "norm")
 data
an
RDD[Double]
containing the sample of data to test distName
a
String
name for a theoretical distribution params
Double*
specifying the parameters to be used for the theoretical distribution returns
org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult object containing test statistic, pvalue, and null hypothesis.
 Annotations
 @Since( "1.5.0" ) @varargs()

def
kolmogorovSmirnovTest(data: RDD[Double], cdf: (Double) ⇒ Double): KolmogorovSmirnovTestResult
Conduct the twosided KolmogorovSmirnov (KS) test for data sampled from a continuous distribution.
Conduct the twosided KolmogorovSmirnov (KS) test for data sampled from a continuous distribution. By comparing the largest difference between the empirical cumulative distribution of the sample data and the theoretical distribution we can provide a test for the the null hypothesis that the sample data comes from that theoretical distribution. For more information on KS Test:
 data
an
RDD[Double]
containing the sample of data to test cdf
a
Double => Double
function to calculate the theoretical CDF at a given value returns
org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult object containing test statistic, pvalue, and null hypothesis.
 Annotations
 @Since( "1.5.0" )
 See also

final
def
ne(arg0: AnyRef): Boolean
 Definition Classes
 AnyRef

final
def
notify(): Unit
 Definition Classes
 AnyRef
 Annotations
 @native()

final
def
notifyAll(): Unit
 Definition Classes
 AnyRef
 Annotations
 @native()

final
def
synchronized[T0](arg0: ⇒ T0): T0
 Definition Classes
 AnyRef

def
toString(): String
 Definition Classes
 AnyRef → Any

final
def
wait(): Unit
 Definition Classes
 AnyRef
 Annotations
 @throws( ... )

final
def
wait(arg0: Long, arg1: Int): Unit
 Definition Classes
 AnyRef
 Annotations
 @throws( ... )

final
def
wait(arg0: Long): Unit
 Definition Classes
 AnyRef
 Annotations
 @throws( ... ) @native()