Difference between revisions of "Pandas"
Jump to navigation
Jump to search
m (→DataFrame) |
|||
Line 53: | Line 53: | ||
;table.head() | ;table.head() | ||
:Return first 5 data rows of table. | :Return first 5 data rows of table. | ||
+ | ;table.columns | ||
+ | :The column headers (class = pandas.core.indexes.base.Index) | ||
;table.columns=[list,of,column,names] | ;table.columns=[list,of,column,names] | ||
:Redefine the column headers | :Redefine the column headers | ||
− | ;table.< | + | ;table.index |
+ | :The table index (first column) (class = pandas.core.indexes.base.Index) | ||
+ | ;table.<columname> | ||
:Address a column by its name. Each column is a pandas [[#Series|Series]] | :Address a column by its name. Each column is a pandas [[#Series|Series]] | ||
+ | ;table.loc[<indexname>] | ||
+ | ;table.loc[<indexname>].<columnname> | ||
+ | :The content of the index (row) as pandas [[#Series|Series]] or just the named column. | ||
+ | ;table.filter(regex=<regex>,axis='index') | ||
+ | ;table.filter(regex=<regex>,axis='index').<columnname> | ||
+ | :Find all rows for which in index matches <regexp> or get only the column of the matched indexes. (axis=0 ) | ||
+ | ;table.filter(regex=<regex>,axis='columns') | ||
+ | :Find all column-names which name matches <regexp>. (axis=1) | ||
=Other= | =Other= |
Revision as of 16:04, 7 September 2019
Check the 10 minutes to Pandas too.
- import pandas as pd
- Import the library, we assume this was done on this page
Series
Pandas Series online documentation.
A pandas series is a 1 dimensional array with named keys.
Pandas Series have all kind of methods similar to Numpy like main, std, min, max,.... In fact Pandas is using numpy to do this.
- s = pd.Series([])
- s = pd.Series([valuelist],[indexlist])
- Initialize a series. If indexlist is omitted the keys are integers starting at 0.
- s[<key>] = <value>
- Assign <value> to the series element with key <key>
- The order in the series is the order in which they are created, NOT the numeric order.
- Elements can be addressed as
s[<key>]
,s.<key>
ors[<numkey>]
. Where <numkey> is defined by the order the element was created. - Once you have used named keys in a series you cannot create new elements with a numeric key.
- s.index
- All indexes in the series. Can be sliced to find a particular index.
- s.describe()
- Series statistics
All in 1 example:
import numpy as np
import pandas as pd
s = pd.Series([])
for i in range(50):
s[i] = int(np.random.random() * 100)
for i in s.index:
print(i,s[i])
Funny, you can do s[0]
but not
for i in s:
print(s[i])
To get all values from the series you do:
for v in s:
print(v)
To get the indexes too:
for i in s.index:
print(i,s[i])
DataFrame
Object for tabular data (that is e.g. obtained by read_html).
- table.head()
- Return first 5 data rows of table.
- table.columns
- The column headers (class = pandas.core.indexes.base.Index)
- table.columns=[list,of,column,names]
- Redefine the column headers
- table.index
- The table index (first column) (class = pandas.core.indexes.base.Index)
- table.<columname>
- Address a column by its name. Each column is a pandas Series
- table.loc[<indexname>]
- table.loc[<indexname>].<columnname>
- The content of the index (row) as pandas Series or just the named column.
- table.filter(regex=<regex>,axis='index')
- table.filter(regex=<regex>,axis='index').<columnname>
- Find all rows for which in index matches <regexp> or get only the column of the matched indexes. (axis=0 )
- table.filter(regex=<regex>,axis='columns')
- Find all column-names which name matches <regexp>. (axis=1)
Other
- read_html
- Read html tables into a list of dataframes
Example code. The first line in the table is a header, the first column the index (e.g. dates), decimal specifies the decimal point character.
tables = pd.read_html(url,header=0,index_col=0,decimal=<char>)