Difference between revisions of "Numpy"

Module easing handling large data sets.

It seems to be common to `import numpy as np`. Therefor below np is used on this page.

Numpy documentation for mathematical functions.

At least some functions work on lists too.

Array

Class of iterable, mutable objects. Very much like a list but can have elements of only one(1) type (booleans can be mixed with numeric types, True = 1, False = 0). Arrays have their own set of methods. Some things are similar to lists, others differ.

Slicing works like in lists.

Arrays are by default multi-dimensional. Basically this is a list of arrays. A matrix is a multi-dimensional array where all elements (rows) are of the same size. Using the numpy matrix class is discouraged

Numpy provides automatic mapping of operations to the array elements.

array1 = np.array([1,2,3])
Create a 1 dimensional array
array1 = np.array([[1,2,3],[2,5,9]])
Create a 2 dimensional array
array1 / array2
Returns an array of the results from the division of all elements of array1 by the corresponding element of array2. Array1 and array2 must have the same number of elements.
array1 > x
Returns a boolean array same size as aray1 with True for elements > x and False for elements <= x
array1[array1 > x]
Return all elements of array1 > x. Can be used with different arrays too providing they are the same size.
mdarray
mdarray[0,2] (preferred)
Return the 3rd element from the first array (row 1)
mdarray.shape
np.shape(mdarray)
Return the array's shape as tuple (rows,colums)
If the rows have different number of columns only the number of rows is returned (rows,)

Slicing works for multi-dimensional arrays too:

mdarray[:,2]
Return the 3rd element of all rows
mdarray[2,4:6]
Return from the 3rd row the the elements 4 and 5 (5th and 6th)

Operations are still applied to all elements on all rows

mdarray1 * 2
Operate on all elements, return the array with all elements multiplied by 2.
mdarray1 * array1(1row)
Multiply each element in all rows of mdarray1 with the corresponding element in array1
np.sum(array1)
array1.sum()
Return the addition of all values in array1 (prod works too)
np.mean(array1)
array1.mean()
Return the average of all values in array1
np.median(array1)
Return the middle value of array1(sorted)
np.std(array1)
Return the standard deviation in array1
np.corrcoef(array[:,0],array[:,1])
Return the correlation between 2 columns
np.linspace(start,stop,num)
Create an array of num element evenly distributed from start to stop. 50 is the default for num. Sort of floating point range.
np.where(array1 == a)
Return a tuple of arrays of indexnumbers (not the index) in array1 that match the condition
np.where(array1 == a)
Return the first indexnumber (not the index) in array1 that matches the condition

Randomness

np.random.random()
Return a random number between 0 and 1
np.random.random(x)
Return an array of x random numbers between 0 and 1
np.random.random() < <probability>
Return True with a probability of <probability>. <probability> must be between 0 and 1.
Probability of 0.5 is for a coin flip (50-50).

More

np.nan
Not a number, can be used to fill in an unknown value in a series (see Pandas)