IDFModel

class pyspark.mllib.feature.IDFModel(java_model: py4j.java_gateway.JavaObject)

Represents an IDF model that can transform term frequency vectors.

Methods

call(name, *a)

Call method of java_model

docFreq()

Returns the document frequency.

idf()

Returns the current IDF vector.

numDocs()

Returns number of documents evaluated to compute idf

transform(x)

Transforms term frequency (TF) vectors to TF-IDF vectors.

Methods Documentation

call(name: str, *a: Any) → Any

Call method of java_model

docFreq() → List[int]

Returns the document frequency.

idf()pyspark.mllib.linalg.Vector

Returns the current IDF vector.

numDocs() → int

Returns number of documents evaluated to compute idf

transform(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]]

Transforms term frequency (TF) vectors to TF-IDF vectors.

If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.

Parameters
xpyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of term frequency vectors or a term frequency vector

Returns
pyspark.mllib.linalg.Vector or pyspark.RDD

an RDD of TF-IDF vectors or a TF-IDF vector

Notes

In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.