Difference between revisions of "Pandas"

From wiki
Jump to navigation Jump to search
Line 63: Line 63:
 
;table.loc[<indexname>]
 
;table.loc[<indexname>]
 
;table.loc[<indexname>].<columnname>
 
;table.loc[<indexname>].<columnname>
:The content of the index (row) as pandas [[#Series|Series]] or just the named column.
+
;table.loc[0][0]
 +
:The content of the index (row) as pandas [[#Series|Series]] or just the named column. Last form for tables without header or index.
 
;table.filter(regex=<regex>,axis='index')
 
;table.filter(regex=<regex>,axis='index')
 
;table.filter(regex=<regex>,axis='index').<columnname>
 
;table.filter(regex=<regex>,axis='index').<columnname>
Line 72: Line 73:
  
 
=Other=
 
=Other=
;read_html
+
;read_html(url)
:Read html tables into a list of [[#DataFrame |dataframes]]
+
:Read html tables into a list of [[#DataFrame |dataframes]] (no header, no index)
 
Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.
 
Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.
 
<syntaxhighlight lang=python>
 
<syntaxhighlight lang=python>
 
tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)
 
tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)
 
</syntaxhighlight>
 
</syntaxhighlight>

Revision as of 16:38, 22 September 2019

Check the 10 minutes to Pandas too.

import pandas as pd
Import the library, we assume this was done on this page

Series

Pandas Series online documentation.
A pandas series is a 1 dimensional array with named keys.
Pandas Series have all kind of methods similar to Numpy like main, std, min, max,.... In fact Pandas is using numpy to do this.

s = pd.Series([])
s = pd.Series([valuelist],[indexlist])
Initialize a series. If indexlist is omitted the keys are integers starting at 0.
s[<key>] = <value>
Assign <value> to the series element with key <key>
The order in the series is the order in which they are created, NOT the numeric order.
Elements can be addressed as s[<key>], s.<key> or s[<numkey>]. Where <numkey> is defined by the order the element was created.
Once you have used named keys in a series you cannot create new elements with a numeric key.
s.index
All indexes in the series. Can be sliced to find a particular index.
s.describe()
Series statistics

All in 1 example:

import numpy as np
import pandas as pd
s = pd.Series([])
for i in range(50):
    s[i] = int(np.random.random() * 100)

for i in s.index:
    print(i,s[i])

Funny, you can do s[0] but not

for i in s:
    print(s[i])

To get all values from the series you do:

for v in s:
    print(v)

To get the indexes too:

for i in s.index:
    print(i,s[i])

DataFrame

Object for tabular data (that is e.g. obtained by read_html).

table.head()
Return first 5 data rows of table.
table.columns
The column headers (class = pandas.core.indexes.base.Index)
table.columns=[list,of,column,names]
Redefine the column headers
table.index
The table index (first column) (class = pandas.core.indexes.base.Index)
table.<columname>
Address a column by its name. Each column is a pandas Series
table.loc[<indexname>]
table.loc[<indexname>].<columnname>
table.loc[0][0]
The content of the index (row) as pandas Series or just the named column. Last form for tables without header or index.
table.filter(regex=<regex>,axis='index')
table.filter(regex=<regex>,axis='index').<columnname>
table.filter(regex=<regex>,axis='index').index
Find all rows for which in index matches <regexp> or get only the column of the matched indexes. (axis=0 ) or the indexname(s)
table.filter(regex=<regex>,axis='columns')
Find all column-names which name matches <regexp>. (axis=1)

Other

read_html(url)
Read html tables into a list of dataframes (no header, no index)

Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.

tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)