Revision as of 17:54, 5 October 2019

Check the 10 minutes to Pandas too.

import pandas as pd: Import the library, we assume this was done on this page

DataFrame

Object for tabular data (that is e.g. obtained by read_html).

Each column is a Pandas#Series

df = pd.DataFrame([{col1: 1,col2: 2},{col1: 1,col2: 2}]): Create a simple dataframe having a range index, 2 rows and 2 columns
df = pd.DataFrame([{col1: 1,col2: 2},{col1: 1,col2: 2}]): Create a dataframe having a list index. The size of index must match the number of rows.

Dataframe actions

df.size: Return the number of cells in a dataframe
df.shape: Return the number of rows and columns in a dataframe as tuple.
df.describe(): Return various attributes for numeric columns in the dataframe. This may indicate if there are outlyers.
df.head(x)
df.tail(x): Return first/last x data rows of df (5 is the default value for x).

df.columns: Return the column headers (class = pandas.core.indexes.base.Index)
df.<columname>: Address a column by its name.
df.columns=[list,of,column,names]: Redefine the column headers (modifies the dataframe itself, nothing is returned)

df.index: Return the table index (first column) (class = pandas.core.indexes.base.Index)
df.set_index(['col1','col2']): Rerun the dataframe with a new index
df.reset_index(): Return the dataframe with a default index (range)

df.loc[<indexname>]
df.loc[<indexname>].<columnname>
df.loc[0][0]
df.loc[lambda d: d[colum1] == <value> ]: Return the content of the index (row) as pandas Series or just the named column. [0][0]-form for tables without header or index.; The last form selects all rows where column1 equals <value>

df.filter(regex=<regex>,axis='index')
df.filter(regex=<regex>,axis='index').<columnname>
df.filter(regex=<regex>,axis='index').index: Return all rows for which in index matches <regexp> or get only the column of the matched indexes. (axis=0 ) or the indexname(s).; The .index returns the matching indexes as a list.
df.filter(regex=<regex>,axis='columns'): Return all columns which name matches <regexp>. (axis=1)

df.sort_values(<columnname>)
df.sort_values([<column1>,<colunm2>],ascending=(True,False)): Return the dataframe sorted on the values in the columns. The second form sorts on column1 first and then on column2, column1 ascending, column2 descending
df.groupby([column1,column2]): Return the dataframe grouped by the column-values

table.agg(newname=('columname', np.max)): This sample uses named aggregations, that is only supported from version v0.25
df.reset_index()
df.transform()

df.merge(table2,on='column'): Like SQL-join

df.assign: Add a column

df.drop(columns=[listofcolumnstodrop]: Remove columns from a dataframe.

df.apply(<function>): Apply a function to the dataframe. By default every element of the table.

Series

Pandas Series online documentation.
A pandas series is a 1 dimensional array with named keys.
Pandas Series have all kind of methods similar to Numpy like main, std, min, max,.... In fact Pandas is using numpy to do this.

s = pd.Series([])
s = pd.Series([valuelist],[indexlist]): Initialize a series. If indexlist is omitted the keys are integers starting at 0.
s[<key>] = <value>: Assign <value> to the series element with key <key>; The order in the series is the order in which they are created, NOT the numeric order.; Elements can be addressed as s[<key>], s.<key> or s[<numkey>]. Where <numkey> is defined by the order the element was created.; Once you have used named keys in a series you cannot create new elements with a numeric key.
s.index: All indexes in the series. Can be sliced to find a particular index.
s.describe(): Series statistics

All in 1 example:

import numpy as np
import pandas as pd
s = pd.Series([])
for i in range(50):
    s[i] = int(np.random.random() * 100)

for i in s.index:
    print(i,s[i])

Funny, you can do s[0] but not

for i in s:
    print(s[i])

To get all values from the series you do:

for v in s:
    print(v)

To get the indexes too:

for i in s.index:
    print(i,s[i])

Reading Data

read_html(url): Read html tables into a list of dataframes (no header, no index)

Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.

tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)

read_sql(query,cnx,index_columns=[col1,col2]): Read data from the database opened on cnx (see Python:Databases)

df = pd.read_sql(query,cnx,index_col=['Primarykey'])

@@ Line 3: / Line 3: @@
 ;import pandas as pd
 :Import the library, we assume this was done on this page
-=Series=
-Pandas Series [https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html online documentation].<br>
-A pandas series is a 1 dimensional array with named keys.<br>
-Pandas Series have all kind of methods similar to [[Numpy]] like main, std, min, max,.... In fact Pandas is using numpy to do this.
-;s = pd.Series([])
-;s = pd.Series([valuelist],[indexlist])
-:Initialize a series. If indexlist is omitted the keys are integers starting at 0.
-;s[<key>] = <value>
-:Assign <value> to the series element with key <key>
-:The order in the series is the order in which they are created, NOT the numeric order.
-:Elements can be addressed as <code>s[<key>]</code>, <code>s.<key></code> or <code>s[<numkey>]</code>. Where <numkey> is defined by the order the element was created.
-:Once you have used named keys in a series you cannot create new elements with a numeric key.
-;s.index
-:All indexes in the series. Can be sliced to find a particular index.
-;s.describe()
-:Series statistics
-All in 1 example:
-<syntaxhighlight lang=python>
-import numpy as np
-import pandas as pd
-s = pd.Series([])
-for i in range(50):
-    s[i] = int(np.random.random() * 100)
-for i in s.index:
-    print(i,s[i])
-</syntaxhighlight>
-Funny, you can do <code>s[0]</code> but not
-<syntaxhighlight lang=python>
-for i in s:
-    print(s[i])
-</syntaxhighlight>
-To get all values from the series you do:
-<syntaxhighlight lang=python>
-for v in s:
-    print(v)
-</syntaxhighlight>
-To get the indexes too:
-<syntaxhighlight lang=python>
-for i in s.index:
-    print(i,s[i])
-</syntaxhighlight>
 =DataFrame=
@@ Line 59: / Line 14: @@
 :Create a dataframe having a list index. The size of index must match the number of rows.
-==Showing the dataframe
+==Dataframe actions==
 ;df.size
 :Return the number of cells in a dataframe
@@ Line 69: / Line 24: @@
 ;df.tail(x)
 :Return first/last x data rows of df (5 is the default value for x).
 ;df.columns
 :Return the column headers (class = pandas.core.indexes.base.Index)
@@ Line 75: / Line 31: @@
 ;df.columns=[list,of,column,names]
 :Redefine the column headers (modifies the dataframe itself, nothing is returned)
 ;df.index
 :Return the table index (first column) (class = pandas.core.indexes.base.Index)
@@ Line 88: / Line 45: @@
 :Return the content of the index (row) as pandas [[#Series|Series]] or just the named column. [0][0]-form for tables without header or index.
 :The last form selects all rows where column1 equals <value>
 ;df.filter(regex=<regex>,axis='index')
 ;df.filter(regex=<regex>,axis='index').<columnname>
@@ Line 95: / Line 53: @@
 ;df.filter(regex=<regex>,axis='columns')
 :Return all columns which name matches <regexp>. (axis=1)
 ;df.sort_values(<columnname>)
 ;df.sort_values([<column1>,<colunm2>],ascending=(True,False))
@@ Line 100: / Line 59: @@
 ;df.groupby([column1,column2])
 :Return the dataframe grouped by the column-values
 ;table.agg(newname=('columname', np.max))
 :This sample uses named aggregations, that is only supported from version v0.25
@@ Line 116: / Line 76: @@
 ;df.apply(<function>)
 :Apply a function to the dataframe. By default every element of the table.
+=Series=
+Pandas Series [https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html online documentation].<br>
+A pandas series is a 1 dimensional array with named keys.<br>
+Pandas Series have all kind of methods similar to [[Numpy]] like main, std, min, max,.... In fact Pandas is using numpy to do this.
+;s = pd.Series([])
+;s = pd.Series([valuelist],[indexlist])
+:Initialize a series. If indexlist is omitted the keys are integers starting at 0.
+;s[<key>] = <value>
+:Assign <value> to the series element with key <key>
+:The order in the series is the order in which they are created, NOT the numeric order.
+:Elements can be addressed as <code>s[<key>]</code>, <code>s.<key></code> or <code>s[<numkey>]</code>. Where <numkey> is defined by the order the element was created.
+:Once you have used named keys in a series you cannot create new elements with a numeric key.
+;s.index
+:All indexes in the series. Can be sliced to find a particular index.
+;s.describe()
+:Series statistics
+All in 1 example:
+<syntaxhighlight lang=python>
+import numpy as np
+import pandas as pd
+s = pd.Series([])
+for i in range(50):
+    s[i] = int(np.random.random() * 100)
+for i in s.index:
+    print(i,s[i])
+</syntaxhighlight>
+Funny, you can do <code>s[0]</code> but not
+<syntaxhighlight lang=python>
+for i in s:
+    print(s[i])
+</syntaxhighlight>
+To get all values from the series you do:
+<syntaxhighlight lang=python>
+for v in s:
+    print(v)
+</syntaxhighlight>
+To get the indexes too:
+<syntaxhighlight lang=python>
+for i in s.index:
+    print(i,s[i])
+</syntaxhighlight>
 =Reading Data=

Difference between revisions of "Pandas"

Revision as of 17:54, 5 October 2019

Contents

DataFrame

Dataframe actions

Series

Reading Data

Navigation menu

Search