Python Pandas

pandas is a library for data manipulation.


Example

To compute statistics:

import pandas as pd

s = pd.Series([10, 20, 30])
mu = s.mean()
sigma = s.std()

To import a CSV file:

df = pd.read_csv("example.csv", index_col="UniqueID", usecols=["UniqueID", "Height", "Weight"])


Series

The core data type is pandas.core.series.Series. These are arrays indexed from 0. Ideally they store elements of the same data type (a.k.a. dtype), although if an efficient type cannot be inferred, it falls back to object.

The builtin Python operators perform element-wise math.

import pandas as pd

pd.Series(["foo", "bar", "baz"])  # 0   foo
                                  # 1   bar
                                  # 2   baz
                                  # dtype: object

Series objects have these attributes:

Attribute Name

Description

axes

iloc

index

RangeIndex of indices

is_unique

are all elements unique?

hasnans

are any elements NaN?

loc

shape

(rows)

size

count of elements

values

internal numpy.ndarray storing the elements

These methods are descriptive, rather than being general programming utilities.

Method Names

Description

Example

describe

Creates a Series with descriptive statistics

head

First N elements

s.head(5)

info

Prints descriptive statistics

tail

Last N elements

s.tail(5)

value_counts

Creates a Series with counts of unique values

These methods create and return a new Series.

Method Names

Description

Example

add

Element-wise addition

apply

Element-wise function mapping

s.apply(len)

copy

div

Element-wise division

map

Element-wise value mapping; NaN if no match

s.map({True: 1})

mul

Element-wise multiplication

sort_index

Sorted by indices

s.sort_index(ascending=True)

sort_values

Sorted by values

s.sort_values(ascending=True)

sub

Element-wise subtraction

These methods return a scalar value computed from the Series:

Method Names

Description

count

Count non-missing elements

get

max

mean

median

min

mode

product

std

sum

Describe

s = pd.Series([1,2,3,4,5,6,7,8,9,10])
s.describe()  # count    10.00000
              # mean      5.50000
              # std       3.02765
              # min       1.00000
              # 25%       3.25000
              # 50%       5.50000
              # 75%       7.75000
              # max      10.00000
              # dtype: float64

Get

The get method returns one or more elements based on index matching.

There is an optional default keyword argument.

Note that the get method can take a list of indices. A new Series will be returned only if all matches are found, and the singleton default will be returned otherwise.


Data Frames

Building upon Series is pandas.core.frame.DataFrame.

For example, the Series methods which return a scalar value are instead defined for a DataFrame to return a Series: a scalar value for each column.

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
df.mean()  # a   1.5
           # b   2.5

DataFrame objects have these attributes:

Attribute Names

Description

axes

columns

Index of column names

dtypes

Series of dtypes

iloc

index

loc

shape

(rows, columns)

size

count of elements

values

internal numpy.ndarray storing the elements


Others

The module also exposes several implementation details of Series and DataFrame objects: pandas.core.indexes.base.Index (generally returned by a column attribute), pandas.core.indexes.base.RangeIndex (generally returned by an index attribute), pandas.core.indexing._LocIndexer (generally returned by a loc method), and pandas.core.indexing._iLocIndexer (generally returned by an iloc method).


CategoryRicottone

Python/Pandas (last edited 2025-12-23 05:19:05 by DominicRicottone)