pyspark.pandas.extensions.register_index_accessor¶
-
pyspark.pandas.extensions.
register_index_accessor
(name: str) → Callable[[Type[T]], Type[T]]¶ Register a custom accessor with an Index
- Parameters
- namestr
name used when calling the accessor after its registered
- Returns
- callable
A class decorator.
See also
register_dataframe_accessor
Register a custom accessor on DataFrame objects
register_series_accessor
Register a custom accessor on Series objects
Notes
When accessed, your accessor will be initialiazed with the pandas-on-Spark object the user is interacting with. The code signature must be:
def __init__(self, pandas_on_spark_obj): # constructor logic ...
In the pandas API, if data passed to your accessor has an incorrect dtype, it’s recommended to raise an
AttributeError
for consistency purposes. In pandas-on-Spark,ValueError
is more frequently used to annotate when a value’s datatype is unexpected for a given method/function.Ultimately, you can structure this however you like, but pandas-on-Spark would likely do something like this:
>>> ps.Series(['a', 'b']).dt ... Traceback (most recent call last): ... ValueError: Cannot call DatetimeMethods on type StringType()
Examples
In your library code:
from pyspark.pandas.extensions import register_index_accessor @register_index_accessor("foo") class CustomAccessor: def __init__(self, pandas_on_spark_obj): self._obj = pandas_on_spark_obj self.item = "baz" @property def bar(self): # return item value return self.item
Then, in an ipython session:
>>> ## Import if the accessor is in the other file. >>> # from my_ext_lib import CustomAccessor >>> psdf = ps.DataFrame({"longitude": np.linspace(0,10), ... "latitude": np.linspace(0, 20)}) >>> psdf.index.foo.bar 'baz'