= Python Pandas = '''pandas''' is a library for data manipulation. <> ---- == Example == To compute statistics: {{{ import pandas as pd s = pd.Series([10, 20, 30]) mu = s.mean() sigma = s.std() }}} To import a CSV file: {{{ df = pd.read_csv("example.csv", index_col="UniqueID", usecols=["UniqueID", "Height", "Weight"]) }}} ---- == Series == The core data type is `pandas.core.series.Series`. These are arrays indexed from 0. Ideally they store elements of the same data type (a.k.a. [[Python/NumPy/Types#ObjectDType|dtype]]), although if an efficient type cannot be inferred, it falls back to `object`. The [[Python/Builtins/Operators|builtin Python operators]] perform element-wise math. {{{ import pandas as pd pd.Series(["foo", "bar", "baz"]) # 0 foo # 1 bar # 2 baz # dtype: object }}} `Series` objects have these attributes: ||'''Attribute Name'''||'''Description''' || ||`axes` || || ||`iloc` || || ||`index` ||`RangeIndex` of indices || ||`is_unique` ||are all elements unique? || ||`hasnans` ||are any elements [[Python/Builtins/Types#Float|NaN]]?|| ||`loc` || || ||`shape` ||`(rows)` || ||`size` ||count of elements || ||`values` ||internal [[Python/NumPy/Types#NDArray|numpy.ndarray]] storing the elements|| These methods are descriptive, rather than being general programming utilities. ||'''Method Names'''||'''Description''' ||'''Example'''|| ||`describe` ||Creates a `Series` with descriptive statistics || || ||`head` ||First N elements ||`s.head(5)` || ||`info` ||Prints descriptive statistics || || ||`tail` ||Last N elements ||`s.tail(5)` || ||`value_counts` ||Creates a `Series` with counts of unique values|| || These methods create and return a new `Series`. ||'''Method Names'''||'''Description''' ||'''Example'''|| ||`add` ||Element-wise addition || || ||`apply` ||Element-wise function mapping||`s.apply(len)`|| ||`copy` || || || ||`div` ||Element-wise division || || ||`map` ||Element-wise value mapping; [[Python/Builtins/Types#Float|NaN]] if no match||`s.map({True: 1})`|| ||`mul` ||Element-wise multiplication || || ||`sort_index` ||Sorted by indices ||`s.sort_index(ascending=True)`|| ||`sort_values` ||Sorted by values ||`s.sort_values(ascending=True)`|| ||`sub` ||Element-wise subtraction || || These methods return a scalar value computed from the `Series`: ||'''Method Names'''||'''Description''' || ||`count` ||Count non-missing elements|| ||`get` || || ||`max` || || ||`mean` || || ||`median` || || ||`min` || || ||`mode` || || ||`product` || || ||`std` || || ||`sum` || || === Describe === {{{ s = pd.Series([1,2,3,4,5,6,7,8,9,10]) s.describe() # count 10.00000 # mean 5.50000 # std 3.02765 # min 1.00000 # 25% 3.25000 # 50% 5.50000 # 75% 7.75000 # max 10.00000 # dtype: float64 }}} === Get === The `get` method returns one or more elements based on index matching. There is an optional `default` keyword argument. Note that the `get` method can take a list of indices. A new `Series` will be returned only if ''all'' matches are found, and the singleton default will be returned otherwise. ---- == Data Frames == Building upon `Series` is `pandas.core.frame.DataFrame`. For example, the `Series` methods which return a scalar value are instead defined for a `DataFrame` to return a `Series`: a scalar value for each column. {{{ df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra']) df.mean() # a 1.5 # b 2.5 }}} `DataFrame` objects have these attributes: ||'''Attribute Names'''||'''Description''' || ||`axes` || || ||`columns` ||`Index` of column names|| ||`dtypes` ||`Series` of dtypes || ||`iloc` || || ||`index` || || ||`loc` || || ||`shape` ||`(rows, columns)` || ||`size` ||count of elements || ||`values` ||internal [[Python/NumPy/Types#NDArray|numpy.ndarray]] storing the elements|| ---- == Others == The module also exposes several implementation details of `Series` and `DataFrame` objects: `pandas.core.indexes.base.Index` (generally returned by a `column` attribute), `pandas.core.indexes.base.RangeIndex` (generally returned by an `index` attribute), `pandas.core.indexing._LocIndexer` (generally returned by a `loc` method), and `pandas.core.indexing._iLocIndexer` (generally returned by an `iloc` method). ---- CategoryRicottone