BinaryLogisticRegressionSummary

class pyspark.ml.classification.BinaryLogisticRegressionSummary(java_obj: Optional[JavaObject] = None)

Binary Logistic regression results for a given model.

Methods

fMeasureByLabel([beta])

Returns f-measure for each label (category).

weightedFMeasure([beta])

Returns weighted averaged f-measure.

Attributes

accuracy

Returns accuracy.

areaUnderROC

Computes the area under the receiver operating characteristic (ROC) curve.

fMeasureByThreshold

Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.

falsePositiveRateByLabel

Returns false positive rate for each label (category).

featuresCol

Field in “predictions” which gives the features of each instance as a vector.

labelCol

Field in “predictions” which gives the true label of each instance.

labels

Returns the sequence of labels in ascending order.

pr

Returns the precision-recall curve, which is a Dataframe containing two fields recall, precision with (0.0, 1.0) prepended to it.

precisionByLabel

Returns precision for each label (category).

precisionByThreshold

Returns a dataframe with two fields (threshold, precision) curve.

predictionCol

Field in “predictions” which gives the prediction of each class.

predictions

Dataframe outputted by the model’s transform method.

probabilityCol

Field in “predictions” which gives the probability of each class as a vector.

recallByLabel

Returns recall for each label (category).

recallByThreshold

Returns a dataframe with two fields (threshold, recall) curve.

roc

Returns the receiver operating characteristic (ROC) curve, which is a Dataframe having two fields (FPR, TPR) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.

scoreCol

Field in “predictions” which gives the probability or raw prediction of each class as a vector.

truePositiveRateByLabel

Returns true positive rate for each label (category).

weightCol

Field in “predictions” which gives the weight of each instance as a vector.

weightedFalsePositiveRate

Returns weighted false positive rate.

weightedPrecision

Returns weighted averaged precision.

weightedRecall

Returns weighted averaged recall.

weightedTruePositiveRate

Returns weighted true positive rate.

Methods Documentation

fMeasureByLabel(beta: float = 1.0) → List[float]

Returns f-measure for each label (category).

weightedFMeasure(beta: float = 1.0) → float

Returns weighted averaged f-measure.

Attributes Documentation

accuracy

Returns accuracy. (equals to the total number of correctly classified instances out of the total number of instances.)

areaUnderROC

Computes the area under the receiver operating characteristic (ROC) curve.

fMeasureByThreshold

Returns a dataframe with two fields (threshold, F-Measure) curve with beta = 1.0.

falsePositiveRateByLabel

Returns false positive rate for each label (category).

featuresCol

Field in “predictions” which gives the features of each instance as a vector.

labelCol

Field in “predictions” which gives the true label of each instance.

labels

Returns the sequence of labels in ascending order. This order matches the order used in metrics which are specified as arrays over labels, e.g., truePositiveRateByLabel.

Notes

In most cases, it will be values {0.0, 1.0, …, numClasses-1}, However, if the training set is missing a label, then all of the arrays over labels (e.g., from truePositiveRateByLabel) will be of length numClasses-1 instead of the expected numClasses.

pr

Returns the precision-recall curve, which is a Dataframe containing two fields recall, precision with (0.0, 1.0) prepended to it.

precisionByLabel

Returns precision for each label (category).

precisionByThreshold

Returns a dataframe with two fields (threshold, precision) curve. Every possible probability obtained in transforming the dataset are used as thresholds used in calculating the precision.

predictionCol

Field in “predictions” which gives the prediction of each class.

predictions

Dataframe outputted by the model’s transform method.

probabilityCol

Field in “predictions” which gives the probability of each class as a vector.

recallByLabel

Returns recall for each label (category).

recallByThreshold

Returns a dataframe with two fields (threshold, recall) curve. Every possible probability obtained in transforming the dataset are used as thresholds used in calculating the recall.

roc

Returns the receiver operating characteristic (ROC) curve, which is a Dataframe having two fields (FPR, TPR) with (0.0, 0.0) prepended and (1.0, 1.0) appended to it.

Notes

Wikipedia reference

scoreCol

Field in “predictions” which gives the probability or raw prediction of each class as a vector.

truePositiveRateByLabel

Returns true positive rate for each label (category).

weightCol

Field in “predictions” which gives the weight of each instance as a vector.

weightedFalsePositiveRate

Returns weighted false positive rate.

weightedPrecision

Returns weighted averaged precision.

weightedRecall

Returns weighted averaged recall. (equals to precision, recall and f-measure)

weightedTruePositiveRate

Returns weighted true positive rate. (equals to precision, recall and f-measure)