IDFModel¶
-
class
pyspark.mllib.feature.
IDFModel
(java_model: py4j.java_gateway.JavaObject)¶ Represents an IDF model that can transform term frequency vectors.
Methods
call
(name, *a)Call method of java_model
docFreq
()Returns the document frequency.
idf
()Returns the current IDF vector.
numDocs
()Returns number of documents evaluated to compute idf
transform
(x)Transforms term frequency (TF) vectors to TF-IDF vectors.
Methods Documentation
-
call
(name: str, *a: Any) → Any¶ Call method of java_model
-
docFreq
() → List[int]¶ Returns the document frequency.
-
idf
() → pyspark.mllib.linalg.Vector¶ Returns the current IDF vector.
-
numDocs
() → int¶ Returns number of documents evaluated to compute idf
-
transform
(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]]¶ Transforms term frequency (TF) vectors to TF-IDF vectors.
If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0.
- Parameters
- x
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of term frequency vectors or a term frequency vector
- x
- Returns
pyspark.mllib.linalg.Vector
orpyspark.RDD
an RDD of TF-IDF vectors or a TF-IDF vector
Notes
In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead.
-