= Python Pandas =

'''pandas''' is a library for data manipulation.

<<TableOfContents>>

----



== Example ==

To compute statistics:

{{{
import pandas as pd

s = pd.Series([10, 20, 30])
mu = s.mean()
sigma = s.std()
}}}

To import a CSV file:

{{{
df = pd.read_csv("example.csv", index_col="UniqueID", usecols=["UniqueID", "Height", "Weight"])
}}}

----



== Series ==

The core data type is `pandas.core.series.Series`. These are arrays indexed from 0. Ideally they store elements of the same data type (a.k.a. [[Python/NumPy/Types#ObjectDType|dtype]]), although if an efficient type cannot be inferred, it falls back to `object`.

The [[Python/Builtins/Operators|builtin Python operators]] perform element-wise math.

{{{
import pandas as pd

pd.Series(["foo", "bar", "baz"])  # 0   foo
                                  # 1   bar
                                  # 2   baz
                                  # dtype: object
}}}

`Series` objects have these attributes:

||'''Attribute Name'''||'''Description'''                                    ||
||`axes`              || ||
||`iloc`              || ||
||`index`             ||`RangeIndex` of indices                              ||
||`is_unique`         ||are all elements unique?                             ||
||`hasnans`           ||are any elements [[Python/Builtins/Types#Float|NaN]]?||
||`loc`               || ||
||`shape`             ||`(rows)`                                             ||
||`size`              ||count of elements                                    ||
||`values`            ||internal [[Python/NumPy/Types#NDArray|numpy.ndarray]] storing the elements||

These methods are descriptive, rather than being general programming utilities.

||'''Method Names'''||'''Description'''                              ||'''Example'''||
||`describe`        ||Creates a `Series` with descriptive statistics || ||
||`head`            ||First N elements                               ||`s.head(5)`  ||
||`info`            ||Prints descriptive statistics                  || ||
||`tail`            ||Last N elements                                ||`s.tail(5)`  ||
||`value_counts`    ||Creates a `Series` with counts of unique values|| ||

These methods create and return a new `Series`.

||'''Method Names'''||'''Description'''            ||'''Example'''||
||`add`             ||Element-wise addition        || ||
||`apply`           ||Element-wise function mapping||`s.apply(len)`||
||`copy`            ||                             || ||
||`div`             ||Element-wise division        || ||
||`map`             ||Element-wise value mapping; [[Python/Builtins/Types#Float|NaN]] if no match||`s.map({True: 1})`||
||`mul`             ||Element-wise multiplication  || ||
||`sort_index`      ||Sorted by indices            ||`s.sort_index(ascending=True)`||
||`sort_values`     ||Sorted by values             ||`s.sort_values(ascending=True)`||
||`sub`             ||Element-wise subtraction     || ||

These methods return a scalar value computed from the `Series`:

||'''Method Names'''||'''Description'''         ||
||`count`           ||Count non-missing elements||
||`get`             || ||
||`max`             || ||
||`mean`            || ||
||`median`          || ||
||`min`             || ||
||`mode`            || ||
||`product`         || ||
||`std`             || ||
||`sum`             || ||




=== Describe ===

{{{
s = pd.Series([1,2,3,4,5,6,7,8,9,10])
s.describe()  # count    10.00000
              # mean      5.50000
              # std       3.02765
              # min       1.00000
              # 25%       3.25000
              # 50%       5.50000
              # 75%       7.75000
              # max      10.00000
              # dtype: float64
}}}



=== Get ===

The `get` method returns one or more elements based on index matching.

There is an optional `default` keyword argument.

Note that the `get` method can take a list of indices. A new `Series` will be returned only if ''all'' matches are found, and the singleton default will be returned otherwise.

----



== Data Frames ==

Building upon `Series` is `pandas.core.frame.DataFrame`.

For example, the `Series` methods which return a scalar value are instead defined for a `DataFrame` to return a `Series`: a scalar value for each column.

{{{
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
df.mean()  # a   1.5
           # b   2.5
}}}

`DataFrame` objects have these attributes: 

||'''Attribute Names'''||'''Description'''      ||
||`axes`               || ||
||`columns`            ||`Index` of column names||
||`dtypes`             ||`Series` of dtypes     ||
||`iloc`               || ||
||`index`              || ||
||`loc`                || ||
||`shape`              ||`(rows, columns)`      ||
||`size`               ||count of elements      ||
||`values`             ||internal [[Python/NumPy/Types#NDArray|numpy.ndarray]] storing the elements||

----



== Others ==

The module also exposes several implementation details of `Series` and `DataFrame` objects: `pandas.core.indexes.base.Index` (generally returned by a `column` attribute), `pandas.core.indexes.base.RangeIndex` (generally returned by an `index` attribute), `pandas.core.indexing._LocIndexer` (generally returned by a `loc` method), and `pandas.core.indexing._iLocIndexer` (generally returned by an `iloc` method).



----
CategoryRicottone