Python Pandas Series

A Series is an ordered collection of somewhat-uniform data that can be indexed.

The type is fully specified as pandas.core.series.Series.

Contents

Python Pandas Series

Example

import pandas as pd

pd.Series(["foo", "bar", "baz"])  # 0   foo
                                  # 1   bar
                                  # 2   baz
                                  # dtype: object

Data Model

A Series can be instantiated with any iterable.

Index

By default, a series is indexed by a sequential integer (beginning at 0).

Certain iterables are interpreted as pairs of indices and values.

d = {"First": "foo", "Second": "bar", "Third": "baz"}
s = pd.Series(d)  # First    foo
                  # Second   bar
                  # Third    baz
                  # dtype: object

A second iterable can be specified as explicit indices.

d = ["foo", "bar", "baz"]
i = ["First", "Second", "Third"]
s = pd.Series(d, i)
s = pd.Series(d, index=i)
s = pd.Series(data=d, index=i)

DType

A series without significant consistency of data types will initialize with a dtype of object. Alternatives include:

int64
float64
datetime64
bool
category

Dunder Methods

Series objects support all of the dunder methods implied by a sequence, e.g. len() and sorted().

They also support mathematical operators as member-wise operations.

s + 10 adds 10 to each member value of s
s - 10 subtracts 10
s * 10 multiples by 10
s / 10 divides by 10
s // 10 performs integer division by 10

Note that these operations return a new Series, rather than mutating the data in-place.

Attributes

Method	Meaning
`axes`	list containing the `index` attribute's value
`iloc`	indexable accessor of member values
`index`	RangeIndex containing the member indices
`is_unique`	bool representing if all member values are unique
`hasnans`	`bool` representing if any member values are NaN
`loc`	indexable accessor of member values
`shape`	tuple of 1 int representing number of member values
`size`	`int` count of member values
`values`	numpy.ndarray containing the member values

Methods

These methods return numpy.float64 values unless otherwise specified.

Method	Meaning	Example
`add`	return a new `Series` with `N` added to each member value	`s.add(10)`
`apply`	return a new `Series` with `f` applied to each member value	`s.apply(len)`
`copy`	return a copy of the `Series`
`count`	return a count of member non-missing values
`describe`	return a new `Series` containing descriptive statistics
`div`	return a new `Series` with each member value divided by `N`	`s.div(10)`
`get`	return the value from an index or a default	`s.get("foo", default=None)`
`head`	return a `Series` view of the first N member values	`s.head(5)`
`info`	print information including types and null values
`map`	return a new `Series` with mapped values or NaN	`s.map({True: 1})`
`max`	return greatest value
`mean`	return mean value
`median`	return median value
`min`	return least value
`mode`	return modal value
`mul`	return a new `Series` with each member value multiplied by `N`	`s.mul(10)`
`product`	return product from multiplying all member values
`sort_index`	return a `Series` view sorted by indices	`s.sort_index(ascending=True)`
`sort_values`	return a `Series` view sorted by values	`s.sort_values(ascending=True)`
`std`	return standard deviation of values
`sub`	return a new `Series` with `N` subtracted from each member value	`s.sub(10)`
`sum`	return sum from adding all member values
`tail`	return a `Series` of the last `N` member values	`s.tail(5)`
`value_counts`	return a new `Series` containing counts of unique values

Note that the get method can take a list of indices to look up. A new Series will be returned if all indices exist, and the singleton default will be returned otherwise.

The describe method returns a specifically-formatted Series which can be used.

s = pd.Series([1,2,3,4,5,6,7,8,9,10])
s.describe()  # count    10.00000
              # mean      5.50000
              # std       3.02765
              # min       1.00000
              # 25%       3.25000
              # 50%       5.50000
              # 75%       7.75000
              # max      10.00000
              # dtype: float64

s.describe().loc["75%"]  # 7.75000

The Series created by the value_counts method is particularly useful.

s = pd.Series("Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.".lower().replace('.', '').replace(',', '').replace('—', ' ').split())

s.value_counts(ascending=True).head()  # call          1
                                       # nothing       1
                                       # particular    1
                                       # to            1
                                       # interest      1
                                       # Name: count, dtype: int64

s.value_counts(normalize=True).head()  # and       0.046512
                                       # the       0.046512
                                       # i         0.046512
                                       # little    0.046512
                                       # me        0.046512
                                       # Name: proportion, dtype: float64

CategoryRicottone

Python/Pandas/Series