pyspark.pandas.read_orc¶
-
pyspark.pandas.
read_orc
(path: str, columns: Optional[List[str]] = None, index_col: Union[str, List[str], None] = None, **options: Any) → pyspark.pandas.frame.DataFrame¶ Load an ORC object from the file path, returning a DataFrame.
- Parameters
- pathstr
The path string storing the ORC file to be read.
- columnslist, default None
If not None, only these columns will be read from the file.
- index_colstr or list of str, optional, default: None
Index column of table in Spark.
- optionsdict
All other options passed directly into Spark’s data source.
- Returns
- DataFrame
Examples
>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path) >>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id']) id 0 0
You can preserve the index in the roundtrip as below.
>>> ps.range(1).to_orc('%s/read_spark_io/data.orc' % path, index_col="index") >>> ps.read_orc('%s/read_spark_io/data.orc' % path, columns=['id'], index_col="index") ... id index 0 0