This article is the first of series of tutorials for learning the complete Python library NumPy. In the article, I will describe how NumPy works and a few simple functions you should know.

These days, Python is the go-to programming language for an increasing fraction of the scientific community. Because of this, even high school pupils, or even younger, have started using Python for their projects and reports. NumPy is a very popular Python library for a myriad of research and industrial applications. There are several other libraries that build upon NumPy to cater to more specific applications.

One of the reasons the library is this popular is because of its excellent performance. Most of the algorithms NumPy uses are actually written in C, a lower-level language with a significant performance gain versus Python, and wrapped in Python. In fact, you yourself could extend Numpy with C extension modules.

Table of contents

The basic data structure used NumPy is a multidimensional array that can be manipulated with incredible speed, compared to Python’s native arrays. If you have a basic understanding of Python’s syntax, NumPy is fairly easy to grasp. To start off, we’ll import the package. We’ll import it as np, this is standard practice.

import numpy as np

Initializing NumPy Arrays

Before we do anything, it is important to understand the NumPy array data structure thoroughly. An array is a grid of values of the same type. Arrays can be initialized in several ways:

  • np.array([2,4,6], [8,10,12]) creates a 2×3 array. This is an example of using python lists to initialize a NumPy array
  • np.zeroes((3,2)) creates a 3×2 array where all entries are 0
  • np.ones((2,4)) creates a 2×4 array where all entries are 1
  • np.full((3,3), 9) creates a 3×3 array with all entries equal to a specified value, in this case, 9
  • np.eye(4) creates an identity matrix with all diagonal entries being 1 and all other entries 0
  • np.random.random((3,3)) creates a 3×3 array filled with random values between 0 and 1

As you might expect, I could go on forever describing ways you could possibly create an array, but instead, I just showed a few common methods.

Datatypes

For arrays, all elements have to be of the same data type. When not specified, NumPy will try to make the best guess based on the data provided. However, the data type can also be explicitly specified by using the dtype argument. The following example forces the 64-bit int data type.

>>> x = numpy.array([1, 2], dtype=numpy.int64)
>>> x.dtype
dtype('int64')
>>> x.shape
(2,)

A few of the most common data types that are available in NumPy are:

  • np.float is used for floating-point numbers.
  • np.cfloat is used for complex floating-point numbers.
  • np.int_ is used for long integers. Alternatively, the underscore can be replaced by another bit size.
  • np.uint is used for unsigned long integers
  • np.intc is user for integers

Understanding NumPy Array Indexing

A NumPy array is a grid of values of the same type. An array is multi-dimensional. We can access the size of the array along each dimension using array.shape

>>> a = np.array([[2, 4, 6], [8,10,12]])
>>> a.shape
(2, 3)
>>> a[1,:].shape
(3,)

As you can see, each array has a shape and the attribute can always be accessed.

Elements of an array can be accessed and manipulated by using their index values. Using the same example, we can access the individual entries or range of entries using array(row,column). The slice syntax is i:j:k where i is the starting index, j is the stopping index and k is the step. Just as with Python lists.

>>> a[1,0]
[0:1, 0:1]
>>> a[0][1], a[1][0]
(4, 8)
>>> a[::-1][0]
array([ 8, 10, 12])

Individual entries in the array are mutable. They can be changed with the assignment operator.

>>> a[0][1] = 44
>>> a[0][1]
44

Except for general integer indexing, we can also use boolean indexing. That is more complex, but also a lot more powerful.

This advanced indexing occurs when the obj is an array object of Boolean types, which most frequently are returned from comparison operators. We can use this to, for example, filter out all NaN entries

>>> a[~np.isnan(a)]
array([ 2, 44,  6,  8, 10, 12])

Here the ~ character simply inverses the values. Another common use case is filtering out all negative values

>>> a[a < 8]
array([2, 6])
>>> a[a < 8] = 8
>>> a[a < 8]
array([], dtype=int32)

Array math

NumPy provides functionality for elementary mathematical operations on arrays. These can be performed using the standard operators in Python or using the NumPy functions which provide a significant computing advantage when working with large data. Here, we discuss some basic mathematical operations that are possible on NumPy arrays.

Elementwise operations

  • Addition – numpy.add(x, y)
  • Subtraction – numpy.subtract(x, y)
  • Multiplication – numpy.multiply(x, y)
  • Dot product – x.dot(y) or numpy.dot(x, y)
  • Division – numpy.divide(x, y)
  • Square – numpy.square(x)
  • Square Root – numpy.sqrt(x)
  • Cube Root – numpy.cbrt(x)
  • Maximum – numpy.maximum(x, y)
  • Minimum – numpy.minimum(x, y)

Complete matrix operations

  • Sum –
    • numpy.sum(x) # calculates the sum of all elements
    • numpy.sum(x, axis=0) # calculates the sum of each column
    • numpy.sum(x, axis=1) # calculates the sum of each row
  • Product – numpy.prod(x)
  • Cross product – numpy.cross(x, y)

These are some of the most basic functions you will need to know about, but there are also a lot more functions available for more advanced topics. For example, NumPy also supports trigonometric functions, hyperbolic functions and so much more, but we won’t go into detail here.

Broadcasting

Broadcasting is a powerful technique that can greatly enhance your usage of NumPy. At its heart, it simply using a small array to perform certain desired operations on a larger array. However, used with planning and some clever thinking, broadcasting can significantly improve the computation time. Coupled with Python’s intuitive syntax, looping over some arrays could be omitted altogether. Let’s demonstrate this using an example where we add a vector to each row of a matrix.

The usual way is to loop over the entire array and add the vector to each row, or column if that’s the desired result. We show that in the following example

>>> x = numpy.array([[1,2,3], [4,5,6], [7,8,9]])
>>> v = numpy.array([1, 0, 1])
>>> y = numpy.empty_like(x) # empty matrix with the same shape as x
>>> for i in range(len(x)):
...      y[i, :] = x[i, :] + v
...
>>> y
array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10]])

However, this can take a really long time if you are working with large data sets, in the region of 10 million as an example. In this case, broadcasting can significantly improve this processing time with an added bonus of having a simple operator-like syntax.

>>> x = numpy.array([[1,2,3], [4,5,6], [7,8,9]])
>>> v = numpy.array([1, 0, 1])
>>> y = x + v # Add v to each row of x using broadcasting
>>> y
array([[ 2,  2,  4],
       [ 5,  5,  7],
       [ 8,  8, 10]])

This code replaces the need for initializing an empty array and iterating over the entire original array which both are overheads when considering the cost of processing.

As you can see, both of these code samples produce the exact same result. However, we can clearly see an improvement in speed. To compare the speed, we’ll create two functions with the different code snippets inside.

>>> def f1():
...     x = numpy.array([[1,2,3], [4,5,6], [7,8,9]])
...     y = numpy.empty_like(x)
...     v = numpy.array([1, 0, 1])
...     for i in range(len(x)):
...         y[i, :] = x[i, :] + v
...     return y
...
>>> def f2():
...     x = numpy.array([[1,2,3], [4,5,6], [7,8,9]])
...     v = numpy.array([1, 0, 1])
...     y = x + v
...     return y

Next, we’ll run the timeit.timeit function to time the two functions. We’ve already written about how to use the timeit module before.

>>> timeit.timeit("f1()", setup="from __main__ import f1", number=10000)
0.07048290000000179
>>> timeit.timeit("f2()", setup="from __main__ import f2", number=10000)
0.051573899999993955

Because the data sets are so small, the time it takes to complete is already extremely small, but even then, we can still clearly see the improvement. Broadcasting improved the execution time by 36%, .

Conclusion

You should now know the basics of using NumPy to get started using it in your actual Python applications. NumPy is powerful enough by itself but for very specific applications, you will definitely want to use the derived libraries and packages. A prime example of this is SciPy which is now industry standard.

If you have any questions about getting started with NumPy or getting started with scientific computing in Python, feel free to leave a comment and I’ll do my best to reply as soon as possible.