Introduction to NumPy

Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

  • a powerful N-dimensional array object

  • sophisticated (broadcasting) functions

  • tools for integrating C/C++ and Fortran code

  • useful linear algebra, Fourier transform, and random number capabilities

Why NumPy?

The standard Python data types are not very suited for mathematical operations.

EX-→ Suppose we have a list. If we multiply this list by an integer, we get:

>>> a = [2, 3, 5, 7]

>>> a * 2

[2, 3, 5, 7, 2, 3, 5, 7]

And float’s are not even allowed:

>>> a = [2, 5, 7, 9, 6]

>>> 2.1 * a

Traceback (most recent call last):

File “<pyshell#3>”, line 1, in <module>

2.1 * a

TypeError: can’t multiply sequence by non-int of type ‘float’

In order to solve this using Python lists, we would have to do something like:

>>> values = [2, 3, 8]

>>> result = []

>>> for x in values:

result.append(2.1 * x)

>>> result

[4.2, 6.300000000000001, 16.8]

In order to get a type of list which behaves like a mathematical array or matrix, we use Numpy.

>>> import numpy as np

>>> a = np.array([2, 3, 7])

>>> 2.1 * a

array([ 4.2, 6.3, 14.7])

We abbreviated numpy to np, this is conventional. – np.array takes a Python list as argument. – The list [2, 3, 7] contains int’s, yet the result contains float’s. This means numpy changed the data type automatically for us.

Shape

One of the most important properties an array is its shape. We have already seen 1 dimensional (1D) arrays, but arrays can have any dimensions you like.

To get the shape of an array, we use shape:

>>> import numpy as np

>>> a = np.array([2, 3, 8])

>>> a.shape

(3,)

Something slightly more interesting:

>>> b = np.array([

[2, 3, 5],

[4, 5, 6],

])

>>> b.shape

(2, 3)

Slicing

Just like with lists, we might want to select certain values from an array. For 1D arrays it works just like for normal python lists:

>>> a = np.array([2, 3, 8])

>>> a[2]

8

>>> a[1:]

array([3, 8])

However, when dealing with higher dimensional arrays something else happens:

>>> import numpy as np

>>> b = np.array([

[2, 3, 8],

[4, 5, 6],

])

>>> b[1]

array([4, 5, 6])

>>> b[1][2]

6

We see that using b[1] returns the 1th row along the first dimenion, which is still an array. After that, we can select individual items from that. This can be abbreviated to:

>>> b[1,2]

6

But what if I wanted the 1th column instead of the first row? Then we use : to select all items along the first dimension, and then a 1:

>>> b[:, 1]

array([3, 5])

Masking

Suppose we have an array, and we want to throw away all values above a certain cutoff:

>>> a = np.array([230, 10, 284, 39, 76])

>>> cutoff = 200

>>> a > cutoff

array([ True, False, True, False, False])

Simply using the larger than operator lets us know in which cases the test was positive. Now we set all the values above 200 to zero:

>>> a = np.array([230, 10, 284, 39, 76])

>>> cutoff = 200

>>> a[a > cutoff] = 0

>>> a

array([ 0, 10, 0, 39, 76])

The crucial line is a[a > cutoff] = 0. This selects all the points in the array where the test was positive and assigns 0 to that position. Without knowing this trick we would have had to loop over the array:

>>> a = np.array([230, 10, 284, 39, 76])

>>> cutoff = 200

>>> new_a = []

>>> for x in a:

if x > cutoff:

new_a.append(0)

else:

new_a.append(x)

>>> a = np.array(new_a)

>>> a

array([ 0, 10, 0, 39, 76])

>>> new_a

[0, 10, 0, 39, 76]

Broadcasting

Broadcasting takes place when you perform operations between arrays of different shapes. For instance

>>> a = np.array([

[0, 1],

[2, 3],

[4, 5],

])

>>> b = np.array([10, 100])

>>> a * b

array([[ 0, 100],

[ 20, 300],

[ 40, 500]])

The shapes of a and b don’t match. In order to proceed, Numpy will stretch b into a second dimension, as if it were stacked three times upon itself. The operation then takes place element-wise.

One of the rules of broadcasting is that only dimensions of size 1 can be stretched (if an array only has one dimension, all other dimensions are considered for broadcasting purposes to have size 1).

The other rule is that dimensions are compared from the last to the first. Any dimensions that do not match must be stretched to become equally sized. However, according to the previous rule, only dimensions of size 1 can stretch. This means that some shapes cannot broadcast and Numpy will give you an error:

>>> c = np.array([

[0, 1, 2],

[3, 4, 5],

])

>>> b = np.array([10, 100])

>>> c * b

Traceback (most recent call last):

File “<pyshell#13>”, line 1, in <module>

c * b

ValueError: operands could not be broadcast together with shapes (2,3) (2,)

What happens here is that Numpy, again, adds a dimension to b, making it of shape (1, 2). The sizes of the last dimensions of b and c (2 and 3, respectively) are then compared and found to differ. Since none of these dimensions is of size 1 (therefore, unstretchable) Numpy gives up and produces an error.

The solution to multiplying c and b above is to specifically tell Numpy that it must add that extra dimension as the second dimension of b. This is done by using None to index that second dimension. The shape of b then becomes (2, 1), which is compatible for broadcasting with c:

>>> c = np.array([

[0, 1, 2],

[3, 4, 5],

])

>>> b = np.array([10, 100])

>>> c * b[:, None]

array([[ 0, 10, 20],

[300, 400, 500]])

A good visual description of these rules, together with some advanced broadcasting applications can be found in this tutorial of Numpy broadcasting rules.

dtype

A commonly used term in working with numpy is dtype – short for data type. This is typically int or float, followed by some number, e.g. int8. This means the value is integer with a size of 8 bits.

Each bit is either 0 or 1. With 8 of them, we have 2 8 = 256 possible values. Since we also have to count zero itself, the largest possible value is 255. The data type we have now described is called uint8, where the u stands for unsigned: only positive values are allowed. If we want to allow negative numbers we use int8. The range then shifts to -128 to +127.

What happens when you set numbers bigger than the maximum value of your dtype?

>>> import numpy as np

>>> a = np.array([200], dtype=’uint8′)

>>> a + a

array([144], dtype=uint8)

That doesn’t seem right, does it? If you add two uint8, the result of 200 + 200 cannot be 400, because that doesn’t fit in a uint8. In standard Python, Python does a lot of magic in the background to make sure the result is the 400 you would expect. But numpy doesn’t, and will return 144. Why 144 is left as an exercise. To fix this, you should make sure that your numbers where not stored as uint8, but as something larger; uint16.

>>> import numpy as np

>>> a = np.array([200], dtype=’uint16′)

>>> a + a

array([400], dtype=uint16)

Changing dtype

To change the dtype of an existing array, you can use the astype method:

>>> import numpy as np

>>> a = np.array([200], dtype=’uint8′)

>>> a.astype(‘uint64’)

Leave a Reply