Differences between revisions 15 and 17 (spanning 2 versions)
Revision 15 as of 2024-01-16 03:13:10
Size: 8722
Comment: Added method
Revision 17 as of 2025-12-23 05:06:57
Size: 0
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Python Pandas Series =

A '''`Series`''' is an ordered collection of somewhat-uniform data that can be indexed.

The [[Python/Pandas/Types|type]] is fully specified as `pandas.core.series.Series`.

<<TableOfContents>>

----



== Example ==

{{{
import pandas as pd

pd.Series(["foo", "bar", "baz"]) # 0 foo
                                  # 1 bar
                                  # 2 baz
                                  # dtype: object
}}}

----



== Data Model ==

A `Series` can be instantiated with any [[Python/Collections/Abc#Iterable|iterable]].



=== Index ===

By default, a series is indexed by a sequential integer (beginning at 0).

Certain iterables are interpreted as pairs of indices and values.

{{{
d = {"First": "foo", "Second": "bar", "Third": "baz"}
s = pd.Series(d) # First foo
                  # Second bar
                  # Third baz
                  # dtype: object
}}}

A second iterable can be specified as explicit indices.

{{{
d = ["foo", "bar", "baz"]
i = ["First", "Second", "Third"]
s = pd.Series(d, i)
s = pd.Series(d, index=i)
s = pd.Series(data=d, index=i)
}}}



=== DType ===

A series without significant consistency of data types will initialize with a [[Python/NumPy/Types#ObjectDType|dtype]] of `object`. Alternatives include:

 * `int64`
 * `float64`
 * `datetime64`
 * `bool`
 * `category`



=== Dunder Methods ===

`Series` objects support all of the [[Python/DunderMethod|dunder methods]] implied by a [[Python/Collections/Abc#Sequence|sequence]], e.g. `len()` and `sorted()`.

They also support mathematical [[Python/Builtins/Operators|operators]] as member-wise operations.

 * `s + 10` adds 10 to each member value of `s`
 * `s - 10` subtracts 10
 * `s * 10` multiples by 10
 * `s / 10` divides by 10
 * `s // 10` performs integer division by 10

Note that these operations return a new `Series`, rather than mutating the data in-place.

----



== Attributes ==

||'''Method'''||'''Meaning''' ||
||`axes` ||[[Python/Builtins/Types#List|list]] containing the `index` attribute's value ||
||`iloc` ||[[Python/Pandas/Types#A_ILocIndexer|indexable accessor of member values]] ||
||`index` ||[[Python/Pandas/Types|RangeIndex]] containing the member indices ||
||`is_unique` ||[[Python/Builtins/Types#Bool|bool]] representing if all member values are unique ||
||`hasnans` ||`bool` representing if any member values are [[Python/Builtins/Types#Float|NaN]] ||
||`loc` ||[[Python/Pandas/Types#A_LocIndexer|indexable accessor of member values]] ||
||`shape` ||[[Python/Builtins/Types#Tuple|tuple]] of 1 [[Python/Builtins/Types#Int|int]] representing number of member values||
||`size` ||`int` count of member values ||
||`values` ||[[Python/NumPy/Types#NDArray|numpy.ndarray]] containing the member values ||

----



== Methods ==

These methods return [[Python/NumPy/Types|numpy.float64]] values unless otherwise specified.

||'''Method''' ||'''Meaning''' ||'''Example''' ||
||`add` ||return a new `Series` with `N` added to each member value ||`s.add(10)` ||
||`apply` ||return a new `Series` with `f` applied to each member value ||`s.apply(len)` ||
||`copy` ||return a copy of the `Series` || ||
||`count` ||return a count of member non-missing values || ||
||`describe` ||return a new `Series` containing descriptive statistics || ||
||`div` ||return a new `Series` with each member value divided by `N` ||`s.div(10)` ||
||`get` ||return the value from an index or a default ||`s.get("foo", default=None)` ||
||`head` ||return a `Series` view of the first N member values ||`s.head(5)` ||
||`info` ||print information including types and null values || ||
||`map` ||return a new `Series` with mapped values or [[Python/Builtins/Types#Float|NaN]]||`s.map({True: 1})` ||
||`max` ||return greatest value || ||
||`mean` ||return mean value || ||
||`median` ||return median value || ||
||`min` ||return least value || ||
||`mode` ||return modal value || ||
||`mul` ||return a new `Series` with each member value multiplied by `N` ||`s.mul(10)` ||
||`product` ||return product from multiplying all member values || ||
||`sort_index` ||return a `Series` view sorted by indices ||`s.sort_index(ascending=True)` ||
||`sort_values` ||return a `Series` view sorted by values ||`s.sort_values(ascending=True)` ||
||`std` ||return standard deviation of values || ||
||`sub` ||return a new `Series` with `N` subtracted from each member value ||`s.sub(10)` ||
||`sum` ||return sum from adding all member values || ||
||`tail` ||return a `Series` of the last `N` member values ||`s.tail(5)` ||
||`value_counts`||return a new `Series` containing counts of unique values || ||

Note that the `get` method can take a list of indices to look up. A new `Series` will be returned if all indices exist, and the singleton default will be returned otherwise.

The '''`describe`''' method returns a specifically-formatted `Series` which ''can'' be used.

{{{
s = pd.Series([1,2,3,4,5,6,7,8,9,10])
s.describe() # count 10.00000
              # mean 5.50000
              # std 3.02765
              # min 1.00000
              # 25% 3.25000
              # 50% 5.50000
              # 75% 7.75000
              # max 10.00000
              # dtype: float64

s.describe().loc["75%"] # 7.75000
}}}

The `Series` created by the `value_counts` method ''is'' particularly useful.

{{{
s = pd.Series("Call me Ishmael. Some years ago—never mind how long precisely—having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.".lower().replace('.', '').replace(',', '').replace('—', ' ').split())

s.value_counts(ascending=True).head() # call 1
                                       # nothing 1
                                       # particular 1
                                       # to 1
                                       # interest 1
                                       # Name: count, dtype: int64

s.value_counts(normalize=True).head() # and 0.046512
                                       # the 0.046512
                                       # i 0.046512
                                       # little 0.046512
                                       # me 0.046512
                                       # Name: proportion, dtype: float64
}}}



----
CategoryRicottone