Pandas

From wiki
Revision as of 12:18, 7 September 2019 by Hdridder (talk | contribs) (→‎DataFrame)
Jump to navigation Jump to search

Check the 10 minutes to Pandas too.

import pandas as pd
Import the library, we assume this was done on this page

Series

Pandas Series online documentation.
A pandas series is a 1 dimensional array with named keys.
Pandas Series have all kind of methods similar to Numpy like main, std, min, max,.... In fact Pandas is using numpy to do this.

s = pd.Series([])
s = pd.Series([valuelist],[indexlist])
Initialize a series. If indexlist is omitted the keys are integers starting at 0.
s[<key>] = <value>
Assign <value> to the series element with key <key>
The order in the series is the order in which they are created, NOT the numeric order.
Elements can be addressed as s[<key>], s.<key> or s[<numkey>]. Where <numkey> is defined by the order the element was created.
Once you have used named keys in a series you cannot create new elements with a numeric key.
s.index
All indexes in the series. Can be sliced to find a particular index.
s.describe()
Series statistics

All in 1 example:

import numpy as np
import pandas as pd
s = pd.Series([])
for i in range(50):
    s[i] = int(np.random.random() * 100)

for i in s.index:
    print(i,s[i])

Funny, you can do s[0] but not

for i in s:
    print(s[i])

To get all values from the series you do:

for v in s:
    print(v)

To get the indexes too:

for i in s.index:
    print(i,s[i])

DataFrame

Object for tabular data (that is e.g. obtained by read_html).

table.head()
Return first 5 data rows of table.
table.columns=[list,of,column,names]
Redefine the column headers
table.<column>
Address a column by its name. Each column is a pandas Series

Other

read_html
Read html tables into a list of dataframes

Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.

tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)