Python Pandas Series
A Series is an ordered collection of somewhat-uniform data that can be indexed.
The type is fully specified as pandas.core.series.Series.
Example
import pandas as pd pd.Series(["foo", "bar", "baz"]) # 0 foo # 1 bar # 2 baz # dtype: object
Data Model
A Series can be instantiated with any iterable.
Index
By default, a series is indexed by a sequential integer (beginning at 0).
Certain iterables are interpreted as pairs of indices and values.
d = {"First": "foo", "Second": "bar", "Third": "baz"} s = pd.Series(d) # First foo # Second bar # Third baz # dtype: object
A second iterable can be specified as explicit indices.
d = ["foo", "bar", "baz"] i = ["First", "Second", "Third"] s = pd.Series(d, i) s = pd.Series(d, index=i) s = pd.Series(data=d, index=i)
DType
A series without significant consistency of data types will initialize with a dtype of object. Alternatives include:
int64
float64
datetime64
bool
category
Dunder Methods
Series objects support all of the dunder methods implied by a sequence, e.g. len() and sorted().
They also support mathematical operators as member-wise operations.
s + 10 adds 10 to each member value of s
s - 10 subtracts 10
s * 10 multiples by 10
s / 10 divides by 10
s // 10 performs integer division by 10
Note that these operations return a new Series, rather than mutating the data in-place.
Attributes
Method |
Meaning |
axes |
list containing the index attribute's value |
iloc |
|
index |
RangeIndex containing the member indices |
is_unique |
bool representing if all member values are unique |
hasnans |
bool representing if any member values are NaN |
loc |
|
shape |
|
size |
int count of member values |
values |
numpy.ndarray containing the member values |
Methods
These methods return numpy.float64 values unless otherwise specified.
Method |
Meaning |
Example |
add |
return a new Series with N added to each member value |
s.add(10) |
apply |
return a new Series with f applied to each member value |
s.apply(len) |
copy |
return a copy of the Series |
|
count |
return a count of member non-missing values |
|
describe |
return a new Series containing descriptive statistics |
|
div |
return a new Series with each member value divided by N |
s.div(10) |
get |
return the value from an index or a default |
s.get("foo", default=None) |
head |
return a Series view of the first N member values |
s.head(5) |
info |
print information including types and null values |
|
map |
return a new Series with mapped values or NaN |
s.map({True: 1}) |
max |
return greatest value |
|
mean |
return mean value |
|
median |
return median value |
|
min |
return least value |
|
mode |
return modal value |
|
mul |
return a new Series with each member value multiplied by N |
s.mul(10) |
product |
return product from multiplying all member values |
|
sort_index |
return a Series view sorted by indices |
s.sort_index(ascending=True) |
sort_values |
return a Series view sorted by values |
s.sort_values(ascending=True) |
std |
return standard deviation of values |
|
sub |
return a new Series with N subtracted from each member value |
s.sub(10) |
sum |
return sum from adding all member values |
|
tail |
return a Series of the last N member values |
s.tail(5) |
value_counts |
return a new Series containing counts of unique values |
|
Note that the get method can take a list of indices to look up. A new Series will be returned if all indices exist, and the singleton default will be returned otherwise.
The describe method returns a specifically-formatted Series which can be used.
s = pd.Series([1,2,3,4,5,6,7,8,9,10]) s.describe() # count 10.00000 # mean 5.50000 # std 3.02765 # min 1.00000 # 25% 3.25000 # 50% 5.50000 # 75% 7.75000 # max 10.00000 # dtype: float64 s.describe().loc["75%"] # 7.75000
The Series created by the value_counts method is particularly useful.
s = pd.Series("Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.".lower().replace('.', '').replace(',', '').replace('—', ' ').split()) s.value_counts(ascending=True).head() # call 1 # nothing 1 # particular 1 # to 1 # interest 1 # Name: count, dtype: int64 s.value_counts(normalize=True).head() # and 0.046512 # the 0.046512 # i 0.046512 # little 0.046512 # me 0.046512 # Name: proportion, dtype: float64