{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas\n", "\n", "- Methods to deal with tabular data\n", "- These methods are to replicate what `dplyr` in R is capable of\n", "- The `statsmodels` can download R datasets from https://vincentarelbundock.github.io/Rdatasets/datasets.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import statsmodels.api as sm\n", "import matplotlib\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing/Exporting Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Importing:\n", "\n", "- `pd.read_csv(filename)`: From a CSV file\n", "- `pd.read_table(filename)`: From a delimited text file (like TSV)\n", "- `pd.read_excel(filename)`: From an Excel file\n", "- `pd.read_sql(query, connection_object)`: Read from a SQL table/database\n", "- `pd.read_json(json_string)`: Read from a JSON formatted string, URL or file.\n", "- `pd.read_html(url)`: Parses an html URL, string or file and extracts tables to a list of dataframes\n", "- `pd.read_clipboard()`: Takes the contents of your clipboard and passes it to read_table()\n", "- `pd.DataFrame(dict)`: From a dict, keys for columns names, values for data as lists\n", "- `pd.DataFrame(list of tuples)`: From a list, which includes the records of each row\n", "\n", "Exporting:\n", "\n", "- `df.to_csv(filename)` \n", "- `df.to_excel(filename)` \n", "- `df.to_sql(table_name, connection_object)` \n", "- `df.to_json(filename)`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "DEMO_DATA_DIR = '../../../RepositoryData/data/titanic/'\n", "iris = sm.datasets.get_rdataset('iris').data\n", "titanic = pd.read_csv(DEMO_DATA_DIR+'train.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
01234
15678
29101112
\n", "
" ], "text/plain": [ " A B C D\n", "0 1 2 3 4\n", "1 5 6 7 8\n", "2 9 10 11 12" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x= [(1,2,3,4),\n", " (5,6,7,8),\n", " (9,10,11,12)]\n", "pd.DataFrame(x,columns=[\"A\",\"B\",\"C\",\"D\"])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABC
0159
12610
23711
34812
\n", "
" ], "text/plain": [ " A B C\n", "0 1 5 9\n", "1 2 6 10\n", "2 3 7 11\n", "3 4 8 12" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = {\"A\":[1,2,3,4],\n", " \"B\":[5,6,7,8],\n", " \"C\":[9,10,11,12]}\n", "pd.DataFrame(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "When you have data of the **columns**, use **dict**; when you have the data of the **rows**, use **list** as the source data structures of a data frame.\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Inspecting Data Frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `df.head(n)`: First n rows of the DataFrame\n", "- `df.tail(n)`: Last n rows of the DataFrame\n", "- `df.shape`: Number of rows and columns\n", "- `df.info()`: Index, Datatype and Memory information\n", "- `df.describe()`: Summary statistics for numerical columns\n", "- `s.value_counts(dropna=False)`: View unique values and counts\n", "- `df.apply(pd.Series.value_counts)`: Unique values and counts for all columns\n", "- `df.columns`\n", "- `df.index`\n", "- `df.dtypes`\n", "- `df.set_index('column_name')`: Set a column as the index " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 150 entries, 0 to 149\n", "Data columns (total 5 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Sepal.Length 150 non-null float64\n", " 1 Sepal.Width 150 non-null float64\n", " 2 Petal.Length 150 non-null float64\n", " 3 Petal.Width 150 non-null float64\n", " 4 Species 150 non-null object \n", "dtypes: float64(4), object(1)\n", "memory usage: 6.0+ KB\n" ] } ], "source": [ "iris.info()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
count150.000000150.000000150.000000150.000000
mean5.8433333.0573333.7580001.199333
std0.8280660.4358661.7652980.762238
min4.3000002.0000001.0000000.100000
25%5.1000002.8000001.6000000.300000
50%5.8000003.0000004.3500001.300000
75%6.4000003.3000005.1000001.800000
max7.9000004.4000006.9000002.500000
\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width\n", "count 150.000000 150.000000 150.000000 150.000000\n", "mean 5.843333 3.057333 3.758000 1.199333\n", "std 0.828066 0.435866 1.765298 0.762238\n", "min 4.300000 2.000000 1.000000 0.100000\n", "25% 5.100000 2.800000 1.600000 0.300000\n", "50% 5.800000 3.000000 4.350000 1.300000\n", "75% 6.400000 3.300000 5.100000 1.800000\n", "max 7.900000 4.400000 6.900000 2.500000" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.describe()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "code_folding": [], "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(150, 5)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
05.13.51.40.2setosa
14.93.01.40.2setosa
24.73.21.30.2setosa
\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "0 5.1 3.5 1.4 0.2 setosa\n", "1 4.9 3.0 1.4 0.2 setosa\n", "2 4.7 3.2 1.3 0.2 setosa" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(iris.shape)\n", "iris.head(3)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "code_folding": [], "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.45NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.00C148C
89089103Dooley, Mr. Patrickmale32.0003703767.75NaNQ
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "889 890 1 1 Behr, Mr. Karl Howell \n", "890 891 0 3 Dooley, Mr. Patrick \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "888 female NaN 1 2 W./C. 6607 23.45 NaN S \n", "889 male 26.0 0 0 111369 30.00 C148 C \n", "890 male 32.0 0 0 370376 7.75 NaN Q " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.tail(3)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "virginica 50\n", "versicolor 50\n", "setosa 50\n", "Name: Species, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris['Species'].value_counts()" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0NaN549.0NaNNaNNaNNaN608.0678.0NaN15.0NaNNaN
0.42NaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaN
0.67NaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaN
0.75NaNNaNNaNNaNNaN2.0NaNNaNNaNNaNNaNNaN
0.83NaNNaNNaNNaNNaN2.0NaNNaNNaNNaNNaNNaN
.......................................
del Carlo, Mr. SebastianoNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN
femaleNaNNaNNaNNaN314.0NaNNaNNaNNaNNaNNaNNaN
maleNaNNaNNaNNaN577.0NaNNaNNaNNaNNaNNaNNaN
van Billiard, Mr. Austin BlylerNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN
van Melkebeke, Mr. PhilemonNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN
\n", "

2848 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name Sex \\\n", "0 NaN 549.0 NaN NaN NaN \n", "0.42 NaN NaN NaN NaN NaN \n", "0.67 NaN NaN NaN NaN NaN \n", "0.75 NaN NaN NaN NaN NaN \n", "0.83 NaN NaN NaN NaN NaN \n", "... ... ... ... ... ... \n", "del Carlo, Mr. Sebastiano NaN NaN NaN 1.0 NaN \n", "female NaN NaN NaN NaN 314.0 \n", "male NaN NaN NaN NaN 577.0 \n", "van Billiard, Mr. Austin Blyler NaN NaN NaN 1.0 NaN \n", "van Melkebeke, Mr. Philemon NaN NaN NaN 1.0 NaN \n", "\n", " Age SibSp Parch Ticket Fare Cabin \\\n", "0 NaN 608.0 678.0 NaN 15.0 NaN \n", "0.42 1.0 NaN NaN NaN NaN NaN \n", "0.67 1.0 NaN NaN NaN NaN NaN \n", "0.75 2.0 NaN NaN NaN NaN NaN \n", "0.83 2.0 NaN NaN NaN NaN NaN \n", "... ... ... ... ... ... ... \n", "del Carlo, Mr. Sebastiano NaN NaN NaN NaN NaN NaN \n", "female NaN NaN NaN NaN NaN NaN \n", "male NaN NaN NaN NaN NaN NaN \n", "van Billiard, Mr. Austin Blyler NaN NaN NaN NaN NaN NaN \n", "van Melkebeke, Mr. Philemon NaN NaN NaN NaN NaN NaN \n", "\n", " Embarked \n", "0 NaN \n", "0.42 NaN \n", "0.67 NaN \n", "0.75 NaN \n", "0.83 NaN \n", "... ... \n", "del Carlo, Mr. Sebastiano NaN \n", "female NaN \n", "male NaN \n", "van Billiard, Mr. Austin Blyler NaN \n", "van Melkebeke, Mr. Philemon NaN \n", "\n", "[2848 rows x 12 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.apply(pd.Series.value_counts)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width',\n", " 'Species'],\n", " dtype='object')\n", "Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',\n", " 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],\n", " dtype='object')\n", "RangeIndex(start=0, stop=150, step=1)\n" ] } ], "source": [ "print(iris.columns)\n", "print(titanic.columns)\n", "print(iris.index)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Sepal.Length float64\n", "Sepal.Width float64\n", "Petal.Length float64\n", "Petal.Width float64\n", "Species object\n", "dtype: object\n", "PassengerId int64\n", "Survived int64\n", "Pclass int64\n", "Name object\n", "Sex object\n", "Age float64\n", "SibSp int64\n", "Parch int64\n", "Ticket object\n", "Fare float64\n", "Cabin object\n", "Embarked object\n", "dtype: object\n" ] } ], "source": [ "print(iris.dtypes)\n", "print(titanic.dtypes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Functions" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "code_folding": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 150 entries, 0 to 149\n", "Data columns (total 5 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Sepal.Length 150 non-null float64\n", " 1 Sepal.Width 150 non-null float64\n", " 2 Petal.Length 150 non-null float64\n", " 3 Petal.Width 150 non-null float64\n", " 4 Species 150 non-null object \n", "dtypes: float64(4), object(1)\n", "memory usage: 6.0+ KB\n" ] }, { "data": { "text/plain": [ "Sepal.Length float64\n", "Sepal.Width float64\n", "Petal.Length float64\n", "Petal.Width float64\n", "Species object\n", "dtype: object" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## DataFrame attributes\n", "iris.shape\n", "iris.columns\n", "iris.index\n", "iris.info()\n", "iris.describe()\n", "iris.dtypes # check column data types\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Subsetting Data Frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `df[col]`: Returns column with label col as Series\n", "- `df[[col1, col2]]`: Returns columns as a new DataFrame\n", "- `s.iloc[0]`: Selection by position\n", "- `s.loc['index_one']`: Selection by index\n", "- `df.iloc[0,:]`: First row\n", "- `df.iloc[0,0]`: First element of first column" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 setosa\n", "1 setosa\n", "2 setosa\n", "3 setosa\n", "4 setosa\n", "5 setosa\n", "Name: Species, dtype: object" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.loc[:5, 'Species'] # first six rows of 'Species' column" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 setosa\n", "1 setosa\n", "2 setosa\n", "3 setosa\n", "4 setosa\n", "Name: Species, dtype: object" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.iloc[:5, 4] # same as above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploration\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How to perform the key functions provided in R `dplyr`?\n", "\n", "- `dplyr` Key Verbs\n", " - `filter()`\n", " - `select()`\n", " - `mutate()`\n", " - `arrange()`\n", " - `summarize()`\n", " - `group_by()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NA Values\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Functions to take care of `NA` values:\n", " \n", "- `df.isnull()`\n", "- `df.notnull()`\n", "- `df.dropna()`: Drop rows with null values\n", "- `df.dropna(axis=1)`: Drop columns with null values\n", "- `df.dropna(axis=1, thresh=n)`: Drop all columns have less than n non-values\n", "- `df.fillna(x)`: Replaces all null values with `x`\n", "- `s.fillna(s.mean())`: Replace the null values of a Series with its mean score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Quick check of the null values in each column" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PassengerId 0\n", "Survived 0\n", "Pclass 0\n", "Name 0\n", "Sex 0\n", "Age 177\n", "SibSp 0\n", "Parch 0\n", "Ticket 0\n", "Fare 0\n", "Cabin 687\n", "Embarked 2\n", "dtype: int64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.isnull().sum()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500S
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250S
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000S
4503Allen, Mr. William Henrymale35.0003734508.0500S
....................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000S
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500S
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C
89089103Dooley, Mr. Patrickmale32.0003703767.7500Q
\n", "

891 rows × 11 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Embarked \n", "0 0 A/5 21171 7.2500 S \n", "1 0 PC 17599 71.2833 C \n", "2 0 STON/O2. 3101282 7.9250 S \n", "3 0 113803 53.1000 S \n", "4 0 373450 8.0500 S \n", ".. ... ... ... ... \n", "886 0 211536 13.0000 S \n", "887 0 112053 30.0000 S \n", "888 2 W./C. 6607 23.4500 S \n", "889 0 111369 30.0000 C \n", "890 0 370376 7.7500 Q \n", "\n", "[891 rows x 11 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.dropna(axis=1, thresh=600)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PassengerId 891\n", "Survived 891\n", "Pclass 891\n", "Name 891\n", "Sex 891\n", "Age 714\n", "SibSp 891\n", "Parch 891\n", "Ticket 891\n", "Fare 891\n", "Cabin 204\n", "Embarked 889\n", "dtype: int64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.notnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting Data Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `s.astype(float)`: Convert a Series into a `float` type\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sepal.Length float64\n", "Sepal.Width float64\n", "Petal.Length float64\n", "Petal.Width float64\n", "Species object\n", "dtype: object" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.dtypes" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Sepal.Length float64\n", "Sepal.Width float64\n", "Petal.Length float64\n", "Petal.Width float64\n", "Species category\n", "dtype: object" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris['Species']=iris['Species'].astype('category')\n", "iris.dtypes\n", "#iris.value_counts(iris['Species']).plot.bar()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pandas-supported Data Types\n", "\n", "![pandas-dtypes](../images/pandas-dtypes.png)\n", "\n", "([source](https://pbpython.com/pandas_dtypes.html))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Transformation\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `s.replace(X, Y)`" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedAgeSibSpParchFare
SexPclass
female1525.3750000.87500014.1250000.5000000.875000104.083337
2369.2500001.0000008.3333330.5833331.08333326.241667
3374.9428570.5428578.4285711.5714291.05714318.727977
male1526.5000001.0000008.2300000.5000002.000000116.072900
2527.8181820.8181824.7572730.7272731.00000025.659473
3437.9534880.2325589.9632562.0697671.00000022.752523
\n", "
" ], "text/plain": [ " PassengerId Survived Age SibSp Parch \\\n", "Sex Pclass \n", "female 1 525.375000 0.875000 14.125000 0.500000 0.875000 \n", " 2 369.250000 1.000000 8.333333 0.583333 1.083333 \n", " 3 374.942857 0.542857 8.428571 1.571429 1.057143 \n", "male 1 526.500000 1.000000 8.230000 0.500000 2.000000 \n", " 2 527.818182 0.818182 4.757273 0.727273 1.000000 \n", " 3 437.953488 0.232558 9.963256 2.069767 1.000000 \n", "\n", " Fare \n", "Sex Pclass \n", "female 1 104.083337 \n", " 2 26.241667 \n", " 3 18.727977 \n", "male 1 116.072900 \n", " 2 25.659473 \n", " 3 22.752523 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEDCAYAAADOc0QpAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAOOElEQVR4nO3df6zddX3H8edLKrrAwg+5a7q2rGZ2M5hNxBuGc8mc7AeosSQDhnGzkibNElwwbtFu+2OamAX/kUE2yZphVonDMjZDg0QlRbLNDeUyK4hMuWOwtgK9IOAYkw1874/7aTxebnvOvffce+mH5yO5Od/v5/s553wOaZ799ss556aqkCT15WWrvQBJ0vgZd0nqkHGXpA4Zd0nqkHGXpA4Zd0nq0JrVXgDAaaedVps2bVrtZUjSMeWuu+56rKom5jv2ooj7pk2bmJqaWu1lSNIxJclDRzrmZRlJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOvSg+xHSs2LTjc6u9hK48eMXbV3sJUrc8c5ekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SerQSHFP8mCSe5LsSzLVxk5NcmuS+9vtKW08Sa5OMp3k7iRnLecLkCS90ELO3H+lqs6sqsm2vwPYW1Wbgb1tH+B8YHP72Q5cM67FSpJGs5TLMluAXW17F3DBwPinatYdwMlJ1i3heSRJCzRq3Av4YpK7kmxvY2ur6uG2/Qiwtm2vB/YP3PdAG5MkrZBRf83eL1XVwSQ/Adya5N8GD1ZVJamFPHH7S2I7wOmnn76Qu0qShhjpzL2qDrbbQ8BngbOBRw9fbmm3h9r0g8DGgbtvaGNzH3NnVU1W1eTExMTiX4Ek6QWGxj3JCUl+/PA28OvAN4A9wNY2bStwU9veA7ynvWvmHOCpgcs3kqQVMMplmbXAZ5Mcnv83VfX5JHcCNyTZBjwEXNzm3wK8DZgGngEuHfuqJUlHNTTuVfUA8Pp5xh8Hzp1nvIDLxrI6SdKi+AlVSeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SerQyHFPclySryW5ue2/OslXkkwn2Z3k+Db+irY/3Y5vWqa1S5KOYCFn7pcD9w3sfwy4sqpeAzwBbGvj24An2viVbZ4kaQWNFPckG4C3A3/V9gO8FbixTdkFXNC2t7R92vFz23xJ0goZ9cz9z4APAj9o+68Cnqyq59r+AWB9214P7Adox59q8yVJK2Ro3JO8AzhUVXeN84mTbE8ylWRqZmZmnA8tSS95o5y5vxl4Z5IHgc8weznmKuDkJGvanA3AwbZ9ENgI0I6fBDw+90GramdVTVbV5MTExJJehCTpRw2Ne1X9YVVtqKpNwCXAbVX1buBLwIVt2lbgpra9p+3Tjt9WVTXWVUuSjmop73P/EPCBJNPMXlO/to1fC7yqjX8A2LG0JUqSFmrN8Ck/VFW3A7e37QeAs+eZ833gojGsTZK0SH5CVZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUPGXZI6ZNwlqUML+mUdkl6cNu343GovoSsPXvH21V7CknnmLkkdMu6S1CHjLkkdMu6S1CHjLkkdMu6S1CHjLkkdMu6S1CHjLkkdMu6S1CHjLkkdGhr3JK9M8tUkX09yb5KPtPFXJ/lKkukku5Mc38Zf0fan2/FNy/waJElzjHLm/izw1qp6PXAmcF6Sc4CPAVdW1WuAJ4Btbf424Ik2fmWbJ0laQUPjXrOebrsvbz8FvBW4sY3vAi5o21vaPu34uUkyrgVLkoYb6Zp7kuOS7AMOAbcC/w48WVXPtSkHgPVtez2wH6Adfwp41RjXLEkaYqS4V9XzVXUmsAE4G3jtUp84yfYkU0mmZmZmlvpwkqQBC3q3TFU9CXwJeBNwcpLDv+xjA3CwbR8ENgK04ycBj8/zWDurarKqJicmJha3eknSvEZ5t8xEkpPb9o8Bvwbcx2zkL2zTtgI3te09bZ92/LaqqjGuWZI0xCi/Zm8dsCvJccz+ZXBDVd2c5JvAZ5J8FPgacG2bfy1wXZJp4LvAJcuwbknSUQyNe1XdDbxhnvEHmL3+Pnf8+8BFY1mdJGlR/ISqJHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHXIuEtSh4y7JHVoaNyTbEzypSTfTHJvksvb+KlJbk1yf7s9pY0nydVJppPcneSs5X4RkqQfNcqZ+3PA71fVGcA5wGVJzgB2AHurajOwt+0DnA9sbj/bgWvGvmpJ0lENjXtVPVxV/9q2/wu4D1gPbAF2tWm7gAva9hbgUzXrDuDkJOvGvXBJ0pEt6Jp7kk3AG4CvAGur6uF26BFgbdteD+wfuNuBNiZJWiEjxz3JicDfAe+vqu8NHquqAmohT5xke5KpJFMzMzMLuaskaYiR4p7k5cyG/dNV9fdt+NHDl1va7aE2fhDYOHD3DW3sR1TVzqqarKrJiYmJxa5fkjSPUd4tE+Ba4L6q+vjAoT3A1ra9FbhpYPw97V0z5wBPDVy+kSStgDUjzHkz8DvAPUn2tbE/Aq4AbkiyDXgIuLgduwV4GzANPANcOs4FS5KGGxr3qvonIEc4fO488wu4bInrkiQtgZ9QlaQOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QOGXdJ6pBxl6QODY17kk8mOZTkGwNjpya5Ncn97faUNp4kVyeZTnJ3krOWc/GSpPmNcub+18B5c8Z2AHurajOwt+0DnA9sbj/bgWvGs0xJ0kIMjXtV/QPw3TnDW4BdbXsXcMHA+Kdq1h3AyUnWjWmtkqQRLfaa+9qqerhtPwKsbdvrgf0D8w60MUnSClry/1CtqgJqofdLsj3JVJKpmZmZpS5DkjRgsXF/9PDllnZ7qI0fBDYOzNvQxl6gqnZW1WRVTU5MTCxyGZKk+Sw27nuArW17K3DTwPh72rtmzgGeGrh8I0laIWuGTUhyPfAW4LQkB4A/Aa4AbkiyDXgIuLhNvwV4GzANPANcugxrliQNMTTuVfWuIxw6d565BVy21EVJkpbGT6hKUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeMuyR1yLhLUoeWJe5JzkvyrSTTSXYsx3NIko5s7HFPchzwF8D5wBnAu5KcMe7nkSQd2XKcuZ8NTFfVA1X1v8BngC3L8DySpCNYswyPuR7YP7B/APiFuZOSbAe2t92nk3xrGdbyUnUa8NhqL2KYfGy1V6BV4J/N8fqpIx1YjriPpKp2AjtX6/l7lmSqqiZXex3SXP7ZXDnLcVnmILBxYH9DG5MkrZDliPudwOYkr05yPHAJsGcZnkeSdARjvyxTVc8leR/wBeA44JNVde+4n0dH5eUuvVj5Z3OFpKpWew2SpDHzE6qS1CHjLkkdMu6S1KFVe5+7xiPJa5n9BPD6NnQQ2FNV963eqiStNs/cj2FJPsTs1zsE+Gr7CXC9X9imF7Mkl672Gnrnu2WOYUm+Dbyuqv5vzvjxwL1VtXl1ViYdXZL/rKrTV3sdPfOyzLHtB8BPAg/NGV/XjkmrJsndRzoErF3JtbwUGfdj2/uBvUnu54df1nY68Brgfau1KKlZC/wG8MSc8QD/vPLLeWkx7sewqvp8kp9h9muWB/+H6p1V9fzqrUwC4GbgxKraN/dAkttXfDUvMV5zl6QO+W4ZSeqQcZekDhl3dSXJHye5N8ndSfYlecFvAVvEY75zXJ8bSPL0OB5HGsZr7upGkjcBHwfeUlXPJjkNOL6qvjPCfddU1XMrsManq+rE5X4eyTN39WQd8FhVPQtQVY9V1XeSPNhCT5LJw+/USPLhJNcl+TJwXZI7krzu8IMlub3Nf2+SP09yUpKHkrysHT8hyf4kL0/y00k+n+SuJP/YvhaC9ktr/iXJPUk+usL/PfQSZtzVky8CG5N8O8knkvzyCPc5A/jVqnoXsBu4GCDJOmBdVU0dnlhVTwH7gMOP+w7gC+0TwjuB36uqNwJ/AHyizbkKuKaqfg54eKkvUBqVcVc3qupp4I3AdmAG2J3kvUPutqeq/qdt3wBc2LYvBm6cZ/5u4Lfa9iXtOU4EfhH42yT7gL9k9l8RAG8Grm/b1y3k9UhL4YeY1JX24a3bgduT3ANsBZ7jhycyr5xzl/8euO/BJI8n+XlmA/678zzFHuBPk5zK7F8ktwEnAE9W1ZlHWtbiXo20eJ65qxtJfjbJ4Jelncns9+48yGyIAX5zyMPsBj4InFRVL/hulPavgzuZvdxyc1U9X1XfA/4jyUVtHUny+naXLzN7hg/w7gW/KGmRjLt6ciKwK8k325dWnQF8GPgIcFWSKWDY1zLcyGyMbzjKnN3Ab7fbw94NbEvydeBeZr9jH+By4LL2r4j1SCvEt0JKUoc8c5ekDhl3SeqQcZekDhl3SeqQcZekDhl3SeqQcZekDhl3SerQ/wOWfj4h7zfwwwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "titanic.head()\n", "titanic.value_counts(titanic['Survived']).plot.bar()\n", "titanic.columns\n", "titanic.groupby(['Sex','Pclass']).mean()\n", "titanic[titanic['Age']<18].groupby(['Sex','Pclass']).mean()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `filter()`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
05.13.51.40.2setosa
55.43.91.70.4setosa
105.43.71.50.2setosa
145.84.01.20.2setosa
155.74.41.50.4setosa
..................
1456.73.05.22.3virginica
1466.32.55.01.9virginica
1476.53.05.22.0virginica
1486.23.45.42.3virginica
1495.93.05.11.8virginica
\n", "

118 rows × 5 columns

\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "0 5.1 3.5 1.4 0.2 setosa\n", "5 5.4 3.9 1.7 0.4 setosa\n", "10 5.4 3.7 1.5 0.2 setosa\n", "14 5.8 4.0 1.2 0.2 setosa\n", "15 5.7 4.4 1.5 0.4 setosa\n", ".. ... ... ... ... ...\n", "145 6.7 3.0 5.2 2.3 virginica\n", "146 6.3 2.5 5.0 1.9 virginica\n", "147 6.5 3.0 5.2 2.0 virginica\n", "148 6.2 3.4 5.4 2.3 virginica\n", "149 5.9 3.0 5.1 1.8 virginica\n", "\n", "[118 rows x 5 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## filter\n", "iris[iris['Sepal.Length']>5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "When there are more than one filtering condition, put the conditions in parentheses.\n", "```" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, Species]\n", "Index: []" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris[(iris['Sepal.Length']>4) & (iris['Sepal.Width']>5)]" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
05.13.51.40.2setosa
55.43.91.70.4setosa
105.43.71.50.2setosa
145.84.01.20.2setosa
155.74.41.50.4setosa
..................
1456.73.05.22.3virginica
1466.32.55.01.9virginica
1476.53.05.22.0virginica
1486.23.45.42.3virginica
1495.93.05.11.8virginica
\n", "

118 rows × 5 columns

\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "0 5.1 3.5 1.4 0.2 setosa\n", "5 5.4 3.9 1.7 0.4 setosa\n", "10 5.4 3.7 1.5 0.2 setosa\n", "14 5.8 4.0 1.2 0.2 setosa\n", "15 5.7 4.4 1.5 0.4 setosa\n", ".. ... ... ... ... ...\n", "145 6.7 3.0 5.2 2.3 virginica\n", "146 6.3 2.5 5.0 1.9 virginica\n", "147 6.5 3.0 5.2 2.0 virginica\n", "148 6.2 3.4 5.4 2.3 virginica\n", "149 5.9 3.0 5.1 1.8 virginica\n", "\n", "[118 rows x 5 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.query('`Sepal.Length`>5')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
155.74.41.50.4setosa
325.24.11.50.1setosa
335.54.21.40.2setosa
\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "15 5.7 4.4 1.5 0.4 setosa\n", "32 5.2 4.1 1.5 0.1 setosa\n", "33 5.5 4.2 1.4 0.2 setosa" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris[(iris['Sepal.Length']>5) & (iris['Sepal.Width']>4)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `arrange()`" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1064.92.54.51.7virginica
1215.62.84.92.0virginica
1135.72.55.02.0virginica
1015.82.75.11.9virginica
1145.82.85.12.4virginica
..................
335.54.21.40.2setosa
365.53.51.30.2setosa
155.74.41.50.4setosa
185.73.81.70.3setosa
145.84.01.20.2setosa
\n", "

150 rows × 5 columns

\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species\n", "106 4.9 2.5 4.5 1.7 virginica\n", "121 5.6 2.8 4.9 2.0 virginica\n", "113 5.7 2.5 5.0 2.0 virginica\n", "101 5.8 2.7 5.1 1.9 virginica\n", "114 5.8 2.8 5.1 2.4 virginica\n", ".. ... ... ... ... ...\n", "33 5.5 4.2 1.4 0.2 setosa\n", "36 5.5 3.5 1.3 0.2 setosa\n", "15 5.7 4.4 1.5 0.4 setosa\n", "18 5.7 3.8 1.7 0.3 setosa\n", "14 5.8 4.0 1.2 0.2 setosa\n", "\n", "[150 rows x 5 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.sort_values(['Species','Sepal.Length'], ascending=[False,True])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `select()`" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSpecies
05.1setosa
14.9setosa
24.7setosa
34.6setosa
45.0setosa
.........
1456.7virginica
1466.3virginica
1476.5virginica
1486.2virginica
1495.9virginica
\n", "

150 rows × 2 columns

\n", "
" ], "text/plain": [ " Sepal.Length Species\n", "0 5.1 setosa\n", "1 4.9 setosa\n", "2 4.7 setosa\n", "3 4.6 setosa\n", "4 5.0 setosa\n", ".. ... ...\n", "145 6.7 virginica\n", "146 6.3 virginica\n", "147 6.5 virginica\n", "148 6.2 virginica\n", "149 5.9 virginica\n", "\n", "[150 rows x 2 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## select\n", "iris[['Sepal.Length', 'Species']]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.WidthPetal.LengthPetal.WidthSpecies
03.51.40.2setosa
13.01.40.2setosa
23.21.30.2setosa
33.11.50.2setosa
43.61.40.2setosa
\n", "
" ], "text/plain": [ " Sepal.Width Petal.Length Petal.Width Species\n", "0 3.5 1.4 0.2 setosa\n", "1 3.0 1.4 0.2 setosa\n", "2 3.2 1.3 0.2 setosa\n", "3 3.1 1.5 0.2 setosa\n", "4 3.6 1.4 0.2 setosa" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## deselect columns\n", "iris.drop(['Sepal.Length'], axis=1).head()" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesSepal.Length
0setosa5.1
1setosa4.9
2setosa4.7
3setosa4.6
4setosa5.0
.........
145virginica6.7
146virginica6.3
147virginica6.5
148virginica6.2
149virginica5.9
\n", "

150 rows × 2 columns

\n", "
" ], "text/plain": [ " Species Sepal.Length\n", "0 setosa 5.1\n", "1 setosa 4.9\n", "2 setosa 4.7\n", "3 setosa 4.6\n", "4 setosa 5.0\n", ".. ... ...\n", "145 virginica 6.7\n", "146 virginica 6.3\n", "147 virginica 6.5\n", "148 virginica 6.2\n", "149 virginica 5.9\n", "\n", "[150 rows x 2 columns]" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.filter(['Species','Sepal.Length'])" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SpeciesSepal.Length
0setosa5.1
1setosa4.9
2setosa4.7
3setosa4.6
4setosa5.0
.........
145virginica6.7
146virginica6.3
147virginica6.5
148virginica6.2
149virginica5.9
\n", "

150 rows × 2 columns

\n", "
" ], "text/plain": [ " Species Sepal.Length\n", "0 setosa 5.1\n", "1 setosa 4.9\n", "2 setosa 4.7\n", "3 setosa 4.6\n", "4 setosa 5.0\n", ".. ... ...\n", "145 virginica 6.7\n", "146 virginica 6.3\n", "147 virginica 6.5\n", "148 virginica 6.2\n", "149 virginica 5.9\n", "\n", "[150 rows x 2 columns]" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris[['Species','Sepal.Length']]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## extract one particular column\n", "sepal_length = iris['Sepal.Length']\n", "type(sepal_length)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `mutate()`" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpeciesSpecies_newSpecies_initial
05.13.51.40.2setosa6SE
14.93.01.40.2setosa6SE
24.73.21.30.2setosa6SE
34.63.11.50.2setosa6SE
45.03.61.40.2setosa6SE
........................
1456.73.05.22.3virginica9VI
1466.32.55.01.9virginica9VI
1476.53.05.22.0virginica9VI
1486.23.45.42.3virginica9VI
1495.93.05.11.8virginica9VI
\n", "

150 rows × 7 columns

\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \\\n", "0 5.1 3.5 1.4 0.2 setosa \n", "1 4.9 3.0 1.4 0.2 setosa \n", "2 4.7 3.2 1.3 0.2 setosa \n", "3 4.6 3.1 1.5 0.2 setosa \n", "4 5.0 3.6 1.4 0.2 setosa \n", ".. ... ... ... ... ... \n", "145 6.7 3.0 5.2 2.3 virginica \n", "146 6.3 2.5 5.0 1.9 virginica \n", "147 6.5 3.0 5.2 2.0 virginica \n", "148 6.2 3.4 5.4 2.3 virginica \n", "149 5.9 3.0 5.1 1.8 virginica \n", "\n", " Species_new Species_initial \n", "0 6 SE \n", "1 6 SE \n", "2 6 SE \n", "3 6 SE \n", "4 6 SE \n", ".. ... ... \n", "145 9 VI \n", "146 9 VI \n", "147 9 VI \n", "148 9 VI \n", "149 9 VI \n", "\n", "[150 rows x 7 columns]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## mutate\n", "iris['Species_new'] = iris['Species'].apply(lambda x: len(x))\n", "iris['Species_initial'] = iris['Species'].apply(lambda x: x[:2].upper())\n", "iris" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpeciesSpecies_newSpecies_initialSpecias_initial2
05.13.51.40.2setosa6SESETOSA
14.93.01.40.2setosa6SESETOSA
24.73.21.30.2setosa6SESETOSA
34.63.11.50.2setosa6SESETOSA
45.03.61.40.2setosa6SESETOSA
...........................
1456.73.05.22.3virginica9VIVIRGINICA
1466.32.55.01.9virginica9VIVIRGINICA
1476.53.05.22.0virginica9VIVIRGINICA
1486.23.45.42.3virginica9VIVIRGINICA
1495.93.05.11.8virginica9VIVIRGINICA
\n", "

150 rows × 8 columns

\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species \\\n", "0 5.1 3.5 1.4 0.2 setosa \n", "1 4.9 3.0 1.4 0.2 setosa \n", "2 4.7 3.2 1.3 0.2 setosa \n", "3 4.6 3.1 1.5 0.2 setosa \n", "4 5.0 3.6 1.4 0.2 setosa \n", ".. ... ... ... ... ... \n", "145 6.7 3.0 5.2 2.3 virginica \n", "146 6.3 2.5 5.0 1.9 virginica \n", "147 6.5 3.0 5.2 2.0 virginica \n", "148 6.2 3.4 5.4 2.3 virginica \n", "149 5.9 3.0 5.1 1.8 virginica \n", "\n", " Species_new Species_initial Specias_initial2 \n", "0 6 SE SETOSA \n", "1 6 SE SETOSA \n", "2 6 SE SETOSA \n", "3 6 SE SETOSA \n", "4 6 SE SETOSA \n", ".. ... ... ... \n", "145 9 VI VIRGINICA \n", "146 9 VI VIRGINICA \n", "147 9 VI VIRGINICA \n", "148 9 VI VIRGINICA \n", "149 9 VI VIRGINICA \n", "\n", "[150 rows x 8 columns]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## mutate alternative 2\n", "iris.assign(Specias_initial2 = iris['Species'].apply(lambda x: x.upper()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `apply()`, `mutate_if()`\n", "\n", "- `df.apply(np.mean)`: Apply a function to all columns\n", "- `df.apply(np.max,axis=1)`: Apply a function to each row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{note}\n", "When `apply()` functions to the data frame, the `axis=1` refers to row mutation and `axis=0` refers to column mutation. This is very counter-intuitive for R users.\n", "```" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpeciesSpecies_newSpecies_initial
05.13.51.40.2setosa6SE
14.93.01.40.2setosa6SE
24.73.21.30.2setosa6SE
34.63.11.50.2setosa6SE
45.03.61.40.2setosa6SE
55.43.91.70.4setosa6SE
64.63.41.40.3setosa6SE
75.03.41.50.2setosa6SE
84.42.91.40.2setosa6SE
94.93.11.50.1setosa6SE
\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species_new \\\n", "0 5.1 3.5 1.4 0.2 setosa 6 \n", "1 4.9 3.0 1.4 0.2 setosa 6 \n", "2 4.7 3.2 1.3 0.2 setosa 6 \n", "3 4.6 3.1 1.5 0.2 setosa 6 \n", "4 5.0 3.6 1.4 0.2 setosa 6 \n", "5 5.4 3.9 1.7 0.4 setosa 6 \n", "6 4.6 3.4 1.4 0.3 setosa 6 \n", "7 5.0 3.4 1.5 0.2 setosa 6 \n", "8 4.4 2.9 1.4 0.2 setosa 6 \n", "9 4.9 3.1 1.5 0.1 setosa 6 \n", "\n", " Species_initial \n", "0 SE \n", "1 SE \n", "2 SE \n", "3 SE \n", "4 SE \n", "5 SE \n", "6 SE \n", "7 SE \n", "8 SE \n", "9 SE " ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.head(10)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 3.7\n", "1 3.2\n", "2 3.4\n", "3 3.3\n", "4 3.8\n", "5 4.3\n", "6 3.7\n", "7 3.6\n", "8 3.1\n", "9 3.2\n", "dtype: float64" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris[['Sepal.Width','Petal.Width']].apply(np.sum, axis=1).head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `group_by()` and `summarize()`" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.LengthSepal.WidthPetal.LengthPetal.Width
Species
setosa5.0063.4281.4620.246
versicolor5.9362.7704.2601.326
virginica6.5882.9745.5522.026
\n", "
" ], "text/plain": [ " Sepal.Length Sepal.Width Petal.Length Petal.Width\n", "Species \n", "setosa 5.006 3.428 1.462 0.246\n", "versicolor 5.936 2.770 4.260 1.326\n", "virginica 6.588 2.974 5.552 2.026" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.groupby(by='Species').mean()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Sepal.Length
meancountstd
Species
setosa5.006500.352490
versicolor5.936500.516171
virginica6.588500.635880
\n", "
" ], "text/plain": [ " Sepal.Length \n", " mean count std\n", "Species \n", "setosa 5.006 50 0.352490\n", "versicolor 5.936 50 0.516171\n", "virginica 6.588 50 0.635880" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.filter(['Species','Sepal.Length']).groupby('Species').agg({'Sepal.Length':['mean','count','std']})\n" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S " ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.head()" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedAgeSibSpParchFare
PclassSex
1female44106912942.0052439975.8250
male55599454169.4238348201.5875
2female33676702125.5037461669.7292
male48380173043.3337242132.1125
3female57561722218.501291152321.1086
male158064476706.42173784393.5865
\n", "
" ], "text/plain": [ " PassengerId Survived Age SibSp Parch Fare\n", "Pclass Sex \n", "1 female 44106 91 2942.00 52 43 9975.8250\n", " male 55599 45 4169.42 38 34 8201.5875\n", "2 female 33676 70 2125.50 37 46 1669.7292\n", " male 48380 17 3043.33 37 24 2132.1125\n", "3 female 57561 72 2218.50 129 115 2321.1086\n", " male 158064 47 6706.42 173 78 4393.5865" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.groupby(['Pclass','Sex']).agg(np.sum)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Survived
PclassSex
1female91
male45
2female70
male17
3female72
male47
\n", "
" ], "text/plain": [ " Survived\n", "Pclass Sex \n", "1 female 91\n", " male 45\n", "2 female 70\n", " male 17\n", "3 female 72\n", " male 47" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.pivot_table(index=['Pclass','Sex'], values=['Survived'], aggfunc=np.sum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `rename()`\n" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width', 'Species',\n", " 'Species_new', 'Species_initial'],\n", " dtype='object')" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris\n", "iris.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Selective renaming column names" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
SLenSepal.WidthPetal.LengthPetal.WidthSpeciesSpecies_newSpecies_initial
05.13.51.40.2setosa6SE
14.93.01.40.2setosa6SE
24.73.21.30.2setosa6SE
34.63.11.50.2setosa6SE
45.03.61.40.2setosa6SE
........................
1456.73.05.22.3virginica9VI
1466.32.55.01.9virginica9VI
1476.53.05.22.0virginica9VI
1486.23.45.42.3virginica9VI
1495.93.05.11.8virginica9VI
\n", "

150 rows × 7 columns

\n", "
" ], "text/plain": [ " SLen Sepal.Width Petal.Length Petal.Width Species Species_new \\\n", "0 5.1 3.5 1.4 0.2 setosa 6 \n", "1 4.9 3.0 1.4 0.2 setosa 6 \n", "2 4.7 3.2 1.3 0.2 setosa 6 \n", "3 4.6 3.1 1.5 0.2 setosa 6 \n", "4 5.0 3.6 1.4 0.2 setosa 6 \n", ".. ... ... ... ... ... ... \n", "145 6.7 3.0 5.2 2.3 virginica 9 \n", "146 6.3 2.5 5.0 1.9 virginica 9 \n", "147 6.5 3.0 5.2 2.0 virginica 9 \n", "148 6.2 3.4 5.4 2.3 virginica 9 \n", "149 5.9 3.0 5.1 1.8 virginica 9 \n", "\n", " Species_initial \n", "0 SE \n", "1 SE \n", "2 SE \n", "3 SE \n", "4 SE \n", ".. ... \n", "145 VI \n", "146 VI \n", "147 VI \n", "148 VI \n", "149 VI \n", "\n", "[150 rows x 7 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris = iris.rename(columns={'Sepal.Length':'SLen'})\n", "iris" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Massive renaming column names" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
XXSLenXXSepal.WidthXXPetal.LengthXXPetal.WidthXXSpeciesXXSpecies_newXXSpecies_initial
05.13.51.40.2setosa6SE
14.93.01.40.2setosa6SE
24.73.21.30.2setosa6SE
34.63.11.50.2setosa6SE
45.03.61.40.2setosa6SE
........................
1456.73.05.22.3virginica9VI
1466.32.55.01.9virginica9VI
1476.53.05.22.0virginica9VI
1486.23.45.42.3virginica9VI
1495.93.05.11.8virginica9VI
\n", "

150 rows × 7 columns

\n", "
" ], "text/plain": [ " XXSLen XXSepal.Width XXPetal.Length XXPetal.Width XXSpecies \\\n", "0 5.1 3.5 1.4 0.2 setosa \n", "1 4.9 3.0 1.4 0.2 setosa \n", "2 4.7 3.2 1.3 0.2 setosa \n", "3 4.6 3.1 1.5 0.2 setosa \n", "4 5.0 3.6 1.4 0.2 setosa \n", ".. ... ... ... ... ... \n", "145 6.7 3.0 5.2 2.3 virginica \n", "146 6.3 2.5 5.0 1.9 virginica \n", "147 6.5 3.0 5.2 2.0 virginica \n", "148 6.2 3.4 5.4 2.3 virginica \n", "149 5.9 3.0 5.1 1.8 virginica \n", "\n", " XXSpecies_new XXSpecies_initial \n", "0 6 SE \n", "1 6 SE \n", "2 6 SE \n", "3 6 SE \n", "4 6 SE \n", ".. ... ... \n", "145 9 VI \n", "146 9 VI \n", "147 9 VI \n", "148 9 VI \n", "149 9 VI \n", "\n", "[150 rows x 7 columns]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "iris.rename(columns=lambda x: 'XX'+x)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
91012Nasser, Mrs. Nicholas (Adele Achem)female14.01023773630.0708NaNC
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", "5 6 0 3 \n", "6 7 0 1 \n", "7 8 0 3 \n", "8 9 1 3 \n", "9 10 1 2 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", "5 Moran, Mr. James male NaN 0 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "7 Palsson, Master. Gosta Leonard male 2.0 3 \n", "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n", "9 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", "5 0 330877 8.4583 NaN Q \n", "6 0 17463 51.8625 E46 S \n", "7 1 349909 21.0750 NaN S \n", "8 2 347742 11.1333 NaN S \n", "9 0 237736 30.0708 NaN C " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.head(10)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassSexAgeSibSpParchTicketFareCabinEmbarked
Name
BRAUND,_MR._OWEN_HARRIS103male22.010A/5 211717.2500NaNS
CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THAYER)211female38.010PC 1759971.2833C85C
HEIKKINEN,_MISS._LAINA313female26.000STON/O2. 31012827.9250NaNS
FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL)411female35.01011380353.1000C123S
ALLEN,_MR._WILLIAM_HENRY503male35.0003734508.0500NaNS
....................................
MONTVILA,_REV._JUOZAS88702male27.00021153613.0000NaNS
GRAHAM,_MISS._MARGARET_EDITH88811female19.00011205330.0000B42S
JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\"88903femaleNaN12W./C. 660723.4500NaNS
BEHR,_MR._KARL_HOWELL89011male26.00011136930.0000C148C
DOOLEY,_MR._PATRICK89103male32.0003703767.7500NaNQ
\n", "

891 rows × 11 columns

\n", "
" ], "text/plain": [ " PassengerId Survived \\\n", "Name \n", "BRAUND,_MR._OWEN_HARRIS 1 0 \n", "CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THA... 2 1 \n", "HEIKKINEN,_MISS._LAINA 3 1 \n", "FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL) 4 1 \n", "ALLEN,_MR._WILLIAM_HENRY 5 0 \n", "... ... ... \n", "MONTVILA,_REV._JUOZAS 887 0 \n", "GRAHAM,_MISS._MARGARET_EDITH 888 1 \n", "JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\" 889 0 \n", "BEHR,_MR._KARL_HOWELL 890 1 \n", "DOOLEY,_MR._PATRICK 891 0 \n", "\n", " Pclass Sex Age \\\n", "Name \n", "BRAUND,_MR._OWEN_HARRIS 3 male 22.0 \n", "CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THA... 1 female 38.0 \n", "HEIKKINEN,_MISS._LAINA 3 female 26.0 \n", "FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL) 1 female 35.0 \n", "ALLEN,_MR._WILLIAM_HENRY 3 male 35.0 \n", "... ... ... ... \n", "MONTVILA,_REV._JUOZAS 2 male 27.0 \n", "GRAHAM,_MISS._MARGARET_EDITH 1 female 19.0 \n", "JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\" 3 female NaN \n", "BEHR,_MR._KARL_HOWELL 1 male 26.0 \n", "DOOLEY,_MR._PATRICK 3 male 32.0 \n", "\n", " SibSp Parch \\\n", "Name \n", "BRAUND,_MR._OWEN_HARRIS 1 0 \n", "CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THA... 1 0 \n", "HEIKKINEN,_MISS._LAINA 0 0 \n", "FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL) 1 0 \n", "ALLEN,_MR._WILLIAM_HENRY 0 0 \n", "... ... ... \n", "MONTVILA,_REV._JUOZAS 0 0 \n", "GRAHAM,_MISS._MARGARET_EDITH 0 0 \n", "JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\" 1 2 \n", "BEHR,_MR._KARL_HOWELL 0 0 \n", "DOOLEY,_MR._PATRICK 0 0 \n", "\n", " Ticket Fare \\\n", "Name \n", "BRAUND,_MR._OWEN_HARRIS A/5 21171 7.2500 \n", "CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THA... PC 17599 71.2833 \n", "HEIKKINEN,_MISS._LAINA STON/O2. 3101282 7.9250 \n", "FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL) 113803 53.1000 \n", "ALLEN,_MR._WILLIAM_HENRY 373450 8.0500 \n", "... ... ... \n", "MONTVILA,_REV._JUOZAS 211536 13.0000 \n", "GRAHAM,_MISS._MARGARET_EDITH 112053 30.0000 \n", "JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\" W./C. 6607 23.4500 \n", "BEHR,_MR._KARL_HOWELL 111369 30.0000 \n", "DOOLEY,_MR._PATRICK 370376 7.7500 \n", "\n", " Cabin Embarked \n", "Name \n", "BRAUND,_MR._OWEN_HARRIS NaN S \n", "CUMINGS,_MRS._JOHN_BRADLEY_(FLORENCE_BRIGGS_THA... C85 C \n", "HEIKKINEN,_MISS._LAINA NaN S \n", "FUTRELLE,_MRS._JACQUES_HEATH_(LILY_MAY_PEEL) C123 S \n", "ALLEN,_MR._WILLIAM_HENRY NaN S \n", "... ... ... \n", "MONTVILA,_REV._JUOZAS NaN S \n", "GRAHAM,_MISS._MARGARET_EDITH B42 S \n", "JOHNSTON,_MISS._CATHERINE_HELEN_\"CARRIE\" NaN S \n", "BEHR,_MR._KARL_HOWELL C148 C \n", "DOOLEY,_MR._PATRICK NaN Q \n", "\n", "[891 rows x 11 columns]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.set_index('Name').rename(index=lambda x:x.replace(' ',\"_\").upper())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Join/Combine Data Frames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `df1.append(df2)`: Add the rows in df1 to the end of df2 (columns should be identical) (`rbind()` in R)\n", "- `pd.concat([df1, df2],axis=1)`: Add the columns in df1 to the end of df2 (rows should be identical) (`cbind()` in R)\n", "- `df1.join(df2,on=col1,how='inner')`: SQL-style join the columns in df1 with the columns on df2 where the rows for col have identical values. 'how' can be one of 'left', 'right', 'outer', 'inner'\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `df.describe()`: Summary statistics for numerical columns\n", "- `df.mean()`: Returns the mean of all columns\n", "- `df.corr()`: Returns the correlation between columns in a DataFrame\n", "- `df.count()`: Returns the number of non-null values in each DataFrame column\n", "- `df.max()`: Returns the highest value in each column\n", "- `df.min()`: Returns the lowest value in each column\n", "- `df.median()`: Returns the median of each column\n", "- `df.std()`: Returns the standard deviation of each column" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "PassengerId 891\n", "Survived 891\n", "Pclass 891\n", "Name 891\n", "Sex 891\n", "Age 714\n", "SibSp 891\n", "Parch 891\n", "Ticket 891\n", "Fare 891\n", "Cabin 204\n", "Embarked 889\n", "dtype: int64" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.count()" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "PassengerId 446.0000\n", "Survived 0.0000\n", "Pclass 3.0000\n", "Age 28.0000\n", "SibSp 0.0000\n", "Parch 0.0000\n", "Fare 14.4542\n", "dtype: float64" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.median()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generic Functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- `pandas.pivot_table()`\n", "- `pandas.crosstab()`\n", "- `pandas.cut()`\n", "- `pandas.qcut()`\n", "- `pandas.merge()`\n", "- `pandas.get_dummies()`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", " - [Python for Data Analysis](https://www.amazon.com/gp/product/1491957662/ref=as_li_tl_nodl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1491957662&linkCode=as2&tag=ledoux-20&linkId=eff92247940c967299befaed855c580a)\n", " - [Python for Data Analysis GitHub](https://github.com/wesm/pydata-book)\n", " - [How to get sample datasets in Python](https://stackoverflow.com/questions/28417293/sample-datasets-in-pandas)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Requirements" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "tags": [ "remove-input" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "statsmodels==0.12.0\n", "pandas==1.1.2\n", "numpy==1.18.1\n", "matplotlib==3.3.2\n" ] } ], "source": [ "# %load get_modules.py\n", "import pkg_resources\n", "import types\n", "def get_imports():\n", " for name, val in globals().items():\n", " if isinstance(val, types.ModuleType):\n", " # Split ensures you get root package, \n", " # not just imported function\n", " name = val.__name__.split(\".\")[0]\n", "\n", " elif isinstance(val, type):\n", " name = val.__module__.split(\".\")[0]\n", "\n", " # Some packages are weird and have different\n", " # imported names vs. system/pip names. Unfortunately,\n", " # there is no systematic way to get pip names from\n", " # a package's imported name. You'll have to add\n", " # exceptions to this list manually!\n", " poorly_named_packages = {\n", " \"PIL\": \"Pillow\",\n", " \"sklearn\": \"scikit-learn\"\n", " }\n", " if name in poorly_named_packages.keys():\n", " name = poorly_named_packages[name]\n", "\n", " yield name\n", " \n", " \n", "imports = list(set(get_imports()))\n", "\n", "# The only way I found to get the version of the root package\n", "# from only the name of the package is to cross-check the names \n", "# of installed packages vs. imported packages\n", "requirements = []\n", "for m in pkg_resources.working_set:\n", " if m.project_name in imports and m.project_name!=\"pip\":\n", " requirements.append((m.project_name, m.version))\n", "\n", "for r in requirements:\n", " print(\"{}=={}\".format(*r))" ] } ], "metadata": { "kernelspec": { "display_name": "python-notes", "language": "python", "name": "python-notes" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "oldHeight": 324.848, "position": { "height": "346.839px", "left": "1085.86px", "right": "20px", "top": "134px", "width": "344px" }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "varInspector_section_display": "block", "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }