Series

Constructor

Series([data, index, dtype, name, copy, …])

pandas-on-Spark Series that corresponds to pandas Series logically.

Attributes

Series.index

The index (axis labels) Column of the Series.

Series.dtype

Return the dtype object of the underlying data.

Series.dtypes

Return the dtype object of the underlying data.

Series.ndim

Return an int representing the number of array dimensions.

Series.name

Return name of the Series.

Series.shape

Return a tuple of the shape of the underlying data.

Series.axes

Return a list of the row axis labels.

Series.size

Return an int representing the number of elements in this object.

Series.empty

Returns true if the current object is empty.

Series.T

Return the transpose, which is by definition self.

Series.hasnans

Return True if it has any missing values.

Series.values

Return a Numpy representation of the DataFrame or the Series.

Conversion

Series.astype(dtype)

Cast a pandas-on-Spark object to a specified dtype dtype.

Series.copy([deep])

Make a copy of this object’s indices and data.

Series.bool()

Return the bool of a single element in the current object.

Indexing, iteration

Series.at

Access a single value for a row/column label pair.

Series.iat

Access a single value for a row/column pair by integer position.

Series.loc

Access a group of rows and columns by label(s) or a boolean Series.

Series.iloc

Purely integer-location based indexing for selection by position.

Series.keys()

Return alias for index.

Series.pop(item)

Return item and drop from series.

Series.items()

This is an alias of iteritems.

Series.iteritems()

Lazily iterate over (index, value) tuples.

Series.item()

Return the first element of the underlying data as a Python scalar.

Series.xs(key[, level])

Return cross-section from the Series.

Series.get(key[, default])

Get item from object for given key (DataFrame column, Panel slice, etc.).

Binary operator functions

Series.add(other)

Return Addition of series and other, element-wise (binary operator +).

Series.div(other)

Return Floating division of series and other, element-wise (binary operator /).

Series.mul(other)

Return Multiplication of series and other, element-wise (binary operator *).

Series.radd(other)

Return Reverse Addition of series and other, element-wise (binary operator +).

Series.rdiv(other)

Return Reverse Floating division of series and other, element-wise (binary operator /).

Series.rmul(other)

Return Reverse Multiplication of series and other, element-wise (binary operator *).

Series.rsub(other)

Return Reverse Subtraction of series and other, element-wise (binary operator -).

Series.rtruediv(other)

Return Reverse Floating division of series and other, element-wise (binary operator /).

Series.sub(other)

Return Subtraction of series and other, element-wise (binary operator -).

Series.truediv(other)

Return Floating division of series and other, element-wise (binary operator /).

Series.pow(other)

Return Exponential power of series of series and other, element-wise (binary operator **).

Series.rpow(other)

Return Reverse Exponential power of series and other, element-wise (binary operator **).

Series.mod(other)

Return Modulo of series and other, element-wise (binary operator %).

Series.rmod(other)

Return Reverse Modulo of series and other, element-wise (binary operator %).

Series.floordiv(other)

Return Integer division of series and other, element-wise (binary operator //).

Series.rfloordiv(other)

Return Reverse Integer division of series and other, element-wise (binary operator //).

Series.divmod(other)

Return Integer division and modulo of series and other, element-wise (binary operator divmod).

Series.rdivmod(other)

Return Integer division and modulo of series and other, element-wise (binary operator rdivmod).

Series.combine_first(other)

Combine Series values, choosing the calling Series’s values first.

Series.lt(other)

Compare if the current value is less than the other.

Series.gt(other)

Compare if the current value is greater than the other.

Series.le(other)

Compare if the current value is less than or equal to the other.

Series.ge(other)

Compare if the current value is greater than or equal to the other.

Series.ne(other)

Compare if the current value is not equal to the other.

Series.eq(other)

Compare if the current value is equal to the other.

Series.product([axis, skipna, numeric_only, …])

Return the product of the values.

Series.dot(other)

Compute the dot product between the Series and the columns of other.

Function application, GroupBy & Window

Series.apply(func[, args])

Invoke function on values of Series.

Series.agg(func)

Aggregate using one or more operations over the specified axis.

Series.aggregate(func)

Aggregate using one or more operations over the specified axis.

Series.transform(func[, axis])

Call func producing the same type as self with transformed values and that has the same axis length as input.

Series.map(arg[, na_action])

Map values of Series according to input correspondence.

Series.groupby(by[, axis, as_index, dropna])

Group DataFrame or Series using a Series of columns.

Series.rolling(window[, min_periods])

Provide rolling transformations.

Series.expanding([min_periods])

Provide expanding transformations.

Series.pipe(func, *args, **kwargs)

Apply func(self, *args, **kwargs).

Computations / Descriptive Stats

Series.abs()

Return a Series/DataFrame with absolute numeric value of each element.

Series.all([axis, skipna])

Return whether all elements are True.

Series.any([axis])

Return whether any element is True.

Series.autocorr([periods])

Compute the lag-N autocorrelation.

Series.between(left, right[, inclusive])

Return boolean Series equivalent to left <= series <= right.

Series.clip([lower, upper, inplace])

Trim values at input threshold(s).

Series.corr(other[, method])

Compute correlation with other Series, excluding missing values.

Series.count([axis, numeric_only])

Count non-NA cells for each column.

Series.cov(other[, min_periods])

Compute covariance with Series, excluding missing values.

Series.cummax([skipna])

Return cumulative maximum over a DataFrame or Series axis.

Series.cummin([skipna])

Return cumulative minimum over a DataFrame or Series axis.

Series.cumsum([skipna])

Return cumulative sum over a DataFrame or Series axis.

Series.cumprod([skipna])

Return cumulative product over a DataFrame or Series axis.

Series.describe([percentiles])

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Series.filter([items, like, regex, axis])

Subset rows or columns of dataframe according to labels in the specified index.

Series.kurt([axis, skipna, numeric_only])

Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

Series.mad()

Return the mean absolute deviation of values.

Series.max([axis, skipna, numeric_only])

Return the maximum of the values.

Series.mean([axis, skipna, numeric_only])

Return the mean of the values.

Series.min([axis, skipna, numeric_only])

Return the minimum of the values.

Series.mode([dropna])

Return the mode(s) of the dataset.

Series.nlargest([n])

Return the largest n elements.

Series.nsmallest([n])

Return the smallest n elements.

Series.pct_change([periods])

Percentage change between the current and a prior element.

Series.prod([axis, skipna, numeric_only, …])

Return the product of the values.

Series.nunique([dropna, approx, rsd])

Return number of unique elements in the object.

Series.is_unique

Return boolean if values in the object are unique

Series.quantile([q, accuracy])

Return value at the given quantile.

Series.rank([method, ascending, numeric_only])

Compute numerical data ranks (1 through n) along axis.

Series.sem([axis, skipna, ddof, numeric_only])

Return unbiased standard error of the mean over requested axis.

Series.skew([axis, skipna, numeric_only])

Return unbiased skew normalized by N-1.

Series.std([axis, skipna, ddof, numeric_only])

Return sample standard deviation.

Series.sum([axis, skipna, numeric_only, …])

Return the sum of the values.

Series.median([axis, skipna, numeric_only, …])

Return the median of the values for the requested axis.

Series.var([axis, ddof, numeric_only])

Return unbiased variance.

Series.kurtosis([axis, skipna, numeric_only])

Return unbiased kurtosis using Fisher’s definition of kurtosis (kurtosis of normal == 0.0).

Series.unique()

Return unique values of Series object.

Series.value_counts([normalize, sort, …])

Return a Series containing counts of unique values.

Series.round([decimals])

Round each value in a Series to the given number of decimals.

Series.diff([periods])

First discrete difference of element.

Series.is_monotonic

Return boolean if values in the object are monotonically increasing.

Series.is_monotonic_increasing

Return boolean if values in the object are monotonically increasing.

Series.is_monotonic_decreasing

Return boolean if values in the object are monotonically decreasing.

Reindexing / Selection / Label manipulation

Series.align(other[, join, axis, copy])

Align two objects on their axes with the specified join method.

Series.drop([labels, index, columns, level, …])

Return Series with specified index labels removed.

Series.droplevel(level)

Return Series with requested index level(s) removed.

Series.drop_duplicates([keep, inplace])

Return Series with duplicate values removed.

Series.duplicated([keep])

Indicate duplicate Series values.

Series.equals(other)

Compare if the current value is equal to the other.

Series.add_prefix(prefix)

Prefix labels with string prefix.

Series.add_suffix(suffix)

Suffix labels with string suffix.

Series.first(offset)

Select first periods of time series data based on a date offset.

Series.head([n])

Return the first n rows.

Series.idxmax([skipna])

Return the row label of the maximum value.

Series.idxmin([skipna])

Return the row label of the minimum value.

Series.isin(values)

Check whether values are contained in Series or Index.

Series.last(offset)

Select final periods of time series data based on a date offset.

Series.rename([index])

Alter Series index labels or name.

Series.rename_axis([mapper, index, inplace])

Set the name of the axis for the index or columns.

Series.reindex([index, fill_value])

Conform Series to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index.

Series.reindex_like(other)

Return a Series with matching indices as other object.

Series.reset_index([level, drop, name, inplace])

Generate a new DataFrame or Series with the index reset.

Series.sample([n, frac, replace, …])

Return a random sample of items from an axis of object.

Series.swaplevel([i, j, copy])

Swap levels i and j in a MultiIndex.

Series.swapaxes(i, j[, copy])

Interchange axes and swap values axes appropriately.

Series.take(indices)

Return the elements in the given positional indices along an axis.

Series.tail([n])

Return the last n rows.

Series.where(cond[, other])

Replace values where the condition is False.

Series.mask(cond[, other])

Replace values where the condition is True.

Series.truncate([before, after, axis, copy])

Truncate a Series or DataFrame before and after some index value.

Missing data handling

Series.backfill([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.

Series.bfill([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`bfill`.

Series.isna()

Detect existing (non-missing) values.

Series.isnull()

Detect existing (non-missing) values.

Series.notna()

Detect existing (non-missing) values.

Series.notnull()

Detect existing (non-missing) values.

Series.pad([axis, inplace, limit])

Synonym for DataFrame.fillna() or Series.fillna() with method=`ffill`.

Series.dropna([axis, inplace])

Return a new Series with missing values removed.

Series.fillna([value, method, axis, …])

Fill NA/NaN values.

Series.interpolate([method, limit, …])

Fill NaN values using an interpolation method.

Reshaping, sorting, transposing

Series.argsort()

Return the integer indices that would sort the Series values.

Series.argmin()

Return int position of the smallest value in the Series.

Series.argmax([axis, skipna])

Return int position of the largest value in the Series.

Series.sort_index([axis, level, ascending, …])

Sort object by labels (along an axis)

Series.sort_values([ascending, inplace, …])

Sort by the values.

Series.unstack([level])

Unstack, a.k.a.

Series.explode()

Transform each element of a list-like to a row.

Series.repeat(repeats)

Repeat elements of a Series.

Series.squeeze([axis])

Squeeze 1 dimensional axis objects into scalars.

Series.factorize([sort, na_sentinel])

Encode the object as an enumerated type or categorical variable.

Combining / joining / merging

Series.append(to_append[, ignore_index, …])

Concatenate two or more Series.

Series.compare(other[, keep_shape, keep_equal])

Compare to another Series and show the differences.

Series.replace([to_replace, value, regex])

Replace values given in to_replace with value.

Series.update(other)

Modify Series in place using non-NA values from passed Series.

Accessors

Pandas API on Spark provides dtype-specific methods under various accessors. These are separate namespaces within Series that only apply to specific data types.

Data Type

Accessor

Datetime

dt

String

str

Categorical

cat

Date Time Handling

Series.dt can be used to access the values of the series as datetimelike and return several properties. These can be accessed like Series.dt.<property>.

Datetime Properties

Series.dt.date

Returns a Series of python datetime.date objects (namely, the date part of Timestamps without timezone information).

Series.dt.year

The year of the datetime.

Series.dt.month

The month of the timestamp as January = 1 December = 12.

Series.dt.day

The days of the datetime.

Series.dt.hour

The hours of the datetime.

Series.dt.minute

The minutes of the datetime.

Series.dt.second

The seconds of the datetime.

Series.dt.microsecond

The microseconds of the datetime.

Series.dt.week

The week ordinal of the year.

Series.dt.weekofyear

The week ordinal of the year.

Series.dt.dayofweek

The day of the week with Monday=0, Sunday=6.

Series.dt.weekday

The day of the week with Monday=0, Sunday=6.

Series.dt.dayofyear

The ordinal day of the year.

Series.dt.quarter

The quarter of the date.

Series.dt.is_month_start

Indicates whether the date is the first day of the month.

Series.dt.is_month_end

Indicates whether the date is the last day of the month.

Series.dt.is_quarter_start

Indicator for whether the date is the first day of a quarter.

Series.dt.is_quarter_end

Indicator for whether the date is the last day of a quarter.

Series.dt.is_year_start

Indicate whether the date is the first day of a year.

Series.dt.is_year_end

Indicate whether the date is the last day of the year.

Series.dt.is_leap_year

Boolean indicator if the date belongs to a leap year.

Series.dt.daysinmonth

The number of days in the month.

Series.dt.days_in_month

The number of days in the month.

Datetime Methods

Series.dt.normalize()

Convert times to midnight.

Series.dt.strftime(date_format)

Convert to a string Series using specified date_format.

Series.dt.round(freq, *args, **kwargs)

Perform round operation on the data to the specified freq.

Series.dt.floor(freq, *args, **kwargs)

Perform floor operation on the data to the specified freq.

Series.dt.ceil(freq, *args, **kwargs)

Perform ceil operation on the data to the specified freq.

Series.dt.month_name([locale])

Return the month names of the series with specified locale.

Series.dt.day_name([locale])

Return the day names of the series with specified locale.

String Handling

Series.str can be used to access the values of the series as strings and apply several methods to it. These can be accessed like Series.str.<function/property>.

Series.str.capitalize()

Convert Strings in the series to be capitalized.

Series.str.cat([others, sep, na_rep, join])

Not supported.

Series.str.center(width[, fillchar])

Filling left and right side of strings in the Series/Index with an additional character.

Series.str.contains(pat[, case, flags, na, …])

Test if pattern or regex is contained within a string of a Series.

Series.str.count(pat[, flags])

Count occurrences of pattern in each string of the Series.

Series.str.decode(encoding[, errors])

Not supported.

Series.str.encode(encoding[, errors])

Not supported.

Series.str.endswith(pattern[, na])

Test if the end of each string element matches a pattern.

Series.str.extract(pat[, flags, expand])

Not supported.

Series.str.extractall(pat[, flags])

Not supported.

Series.str.find(sub[, start, end])

Return lowest indexes in each strings in the Series where the substring is fully contained between [start:end].

Series.str.findall(pat[, flags])

Find all occurrences of pattern or regular expression in the Series.

Series.str.get(i)

Extract element from each string or string list/tuple in the Series at the specified position.

Series.str.get_dummies([sep])

Not supported.

Series.str.index(sub[, start, end])

Return lowest indexes in each strings where the substring is fully contained between [start:end].

Series.str.isalnum()

Check whether all characters in each string are alphanumeric.

Series.str.isalpha()

Check whether all characters in each string are alphabetic.

Series.str.isdigit()

Check whether all characters in each string are digits.

Series.str.isspace()

Check whether all characters in each string are whitespaces.

Series.str.islower()

Check whether all characters in each string are lowercase.

Series.str.isupper()

Check whether all characters in each string are uppercase.

Series.str.istitle()

Check whether all characters in each string are titlecase.

Series.str.isnumeric()

Check whether all characters in each string are numeric.

Series.str.isdecimal()

Check whether all characters in each string are decimals.

Series.str.join(sep)

Join lists contained as elements in the Series with passed delimiter.

Series.str.len()

Computes the length of each element in the Series.

Series.str.ljust(width[, fillchar])

Filling right side of strings in the Series with an additional character.

Series.str.lower()

Convert strings in the Series/Index to all lowercase.

Series.str.lstrip([to_strip])

Remove leading characters.

Series.str.match(pat[, case, flags, na])

Determine if each string matches a regular expression.

Series.str.normalize(form)

Return the Unicode normal form for the strings in the Series.

Series.str.pad(width[, side, fillchar])

Pad strings in the Series up to width.

Series.str.partition([sep, expand])

Not supported.

Series.str.repeat(repeats)

Duplicate each string in the Series.

Series.str.replace(pat, repl[, n, case, …])

Replace occurrences of pattern/regex in the Series with some other string.

Series.str.rfind(sub[, start, end])

Return highest indexes in each strings in the Series where the substring is fully contained between [start:end].

Series.str.rindex(sub[, start, end])

Return highest indexes in each strings where the substring is fully contained between [start:end].

Series.str.rjust(width[, fillchar])

Filling left side of strings in the Series with an additional character.

Series.str.rpartition([sep, expand])

Not supported.

Series.str.rsplit([pat, n, expand])

Split strings around given separator/delimiter.

Series.str.rstrip([to_strip])

Remove trailing characters.

Series.str.slice([start, stop, step])

Slice substrings from each element in the Series.

Series.str.slice_replace([start, stop, repl])

Slice substrings from each element in the Series.

Series.str.split([pat, n, expand])

Split strings around given separator/delimiter.

Series.str.startswith(pattern[, na])

Test if the start of each string element matches a pattern.

Series.str.strip([to_strip])

Remove leading and trailing characters.

Series.str.swapcase()

Convert strings in the Series/Index to be swapcased.

Series.str.title()

Convert Strings in the series to be titlecase.

Series.str.translate(table)

Map all characters in the string through the given mapping table.

Series.str.upper()

Convert strings in the Series/Index to all uppercase.

Series.str.wrap(width, **kwargs)

Wrap long strings in the Series to be formatted in paragraphs with length less than a given width.

Series.str.zfill(width)

Pad strings in the Series by prepending ‘0’ characters.

Categorical accessor

Categorical-dtype specific methods and attributes are available under the Series.cat accessor.

Series.cat.categories

The categories of this categorical.

Series.cat.ordered

Whether the categories have an ordered relationship.

Series.cat.codes

Return Series of codes as well as the index.

Series.cat.rename_categories(new_categories)

Rename categories.

Series.cat.reorder_categories(new_categories)

Reorder categories as specified in new_categories.

Series.cat.add_categories(new_categories[, …])

Add new categories.

Series.cat.remove_categories(removals[, inplace])

Remove the specified categories.

Series.cat.remove_unused_categories([inplace])

Remove categories which are not used.

Series.cat.set_categories(new_categories[, …])

Set the categories to the specified new_categories.

Series.cat.as_ordered([inplace])

Set the Categorical to be ordered.

Series.cat.as_unordered([inplace])

Set the Categorical to be unordered.

Plotting

Series.plot is both a callable method and a namespace attribute for specific plotting methods of the form Series.plot.<kind>.

Series.plot

alias of pyspark.pandas.plot.core.PandasOnSparkPlotAccessor

Series.plot.area([x, y])

Draw a stacked area plot.

Series.plot.bar([x, y])

Vertical bar plot.

Series.plot.barh([x, y])

Make a horizontal bar plot.

Series.plot.box(**kwds)

Make a box plot of the Series columns.

Series.plot.density([bw_method, ind])

Generate Kernel Density Estimate plot using Gaussian kernels.

Series.plot.hist([bins])

Draw one histogram of the DataFrame’s columns.

Series.plot.line([x, y])

Plot DataFrame/Series as lines.

Series.plot.pie(**kwds)

Generate a pie plot.

Series.plot.kde([bw_method, ind])

Generate Kernel Density Estimate plot using Gaussian kernels.

Series.hist([bins])

Draw one histogram of the DataFrame’s columns.

Serialization / IO / Conversion

Series.to_pandas()

Return a pandas Series.

Series.to_numpy()

A NumPy ndarray representing the values in this DataFrame or Series.

Series.to_list()

Return a list of the values.

Series.to_string([buf, na_rep, …])

Render a string representation of the Series.

Series.to_dict([into])

Convert Series to {label -> value} dict or dict-like object.

Series.to_clipboard([excel, sep])

Copy object to the system clipboard.

Series.to_latex([buf, columns, col_space, …])

Render an object to a LaTeX tabular environment table.

Series.to_markdown([buf, mode])

Print Series or DataFrame in Markdown-friendly format.

Series.to_json([path, compression, …])

Convert the object to a JSON string.

Series.to_csv([path, sep, na_rep, columns, …])

Write object to a comma-separated values (csv) file.

Series.to_excel(excel_writer[, sheet_name, …])

Write object to an Excel sheet.

Series.to_frame([name])

Convert Series to DataFrame.

Pandas-on-Spark specific

Series.pandas_on_spark provides pandas-on-Spark specific features that exists only in pandas API on Spark. These can be accessed by Series.pandas_on_spark.<function/property>.

Series.pandas_on_spark.transform_batch(func, …)

Transform the data with the function that takes pandas Series and outputs pandas Series.