The post NumPy Tutorial: the basics appeared first on Rooy Development.

]]>These days, Python is the go-to programming language for an increasing fraction of the scientific community. Because of this, even high school pupils, or even younger, have started using Python for their projects and reports. NumPy is a very popular Python library for a myriad of research and industrial applications. There are several other libraries that build upon NumPy to cater to more specific applications.

One of the reasons the library is this popular is because of its excellent performance. Most of the algorithms NumPy uses are actually written in C, a lower-level language with a significant performance gain versus Python, and wrapped in Python. In fact, you yourself could extend Numpy with C extension modules.

**Table of contents**

The basic data structure used NumPy is a multidimensional array that can be manipulated with incredible speed, compared to Python’s native arrays. If you have a basic understanding of Python’s syntax, NumPy is fairly easy to grasp. To start off, we’ll import the package. We’ll import it as `np`

, this is standard practice.

`import numpy as np`

Before we do anything, it is important to understand the NumPy array data structure thoroughly. An array is a grid of values of the same type. Arrays can be initialized in several ways:

`np.array([2,4,6], [8,10,12])`

creates a 2×3 array. This is an example of using python lists to initialize a NumPy array`np.zeroes((3,2))`

creates a 3×2 array where all entries are 0`np.ones((2,4))`

creates a 2×4 array where all entries are 1`np.full((3,3), 9)`

creates a 3×3 array with all entries equal to a specified value, in this case, 9`np.eye(4)`

creates an identity matrix with all diagonal entries being 1 and all other entries 0`np.random.random((3,3))`

creates a 3×3 array filled with random values between 0 and 1

As you might expect, I could go on forever describing ways you could possibly create an array, but instead, I just showed a few common methods.

For arrays, all elements have to be of the same data type. When not specified, NumPy will try to make the best guess based on the data provided. However, the data type can also be explicitly specified by using the `dtype`

argument. The following example forces the 64-bit int data type.

>>> x = numpy.array([1, 2], dtype=numpy.int64) >>> x.dtype dtype('int64') >>> x.shape (2,)

A few of the most common data types that are available in NumPy are:

`np.float`

is used for floating-point numbers.`np.cfloat`

is used for complex floating-point numbers.`np.int_`

is used for long integers. Alternatively, the underscore can be replaced by another bit size.`np.uint`

is used for unsigned long integers`np.intc`

is user for integers

A NumPy array is a grid of values of the same type. An array is multi-dimensional. We can access the size of the array along each dimension using `array.shape`

>>> a = np.array([[2, 4, 6], [8,10,12]]) >>> a.shape (2, 3) >>> a[1,:].shape (3,)

As you can see, each array has a shape and the attribute can always be accessed.

Elements of an array can be accessed and manipulated by using their index values. Using the same example, we can access the individual entries or range of entries using `array(row,column)`

. The slice syntax is i:j:k where `i`

is the starting index, `j`

is the stopping index and `k`

is the step. Just as with Python lists.

>>> a[1,0] [0:1, 0:1] >>> a[0][1], a[1][0] (4, 8) >>> a[::-1][0] array([ 8, 10, 12])

Individual entries in the array are mutable. They can be changed with the assignment operator.

>>> a[0][1] = 44 >>> a[0][1] 44

Except for general integer indexing, we can also use boolean indexing. That is more complex, but also a lot more powerful.

This advanced indexing occurs when the obj is an array object of Boolean types, which most frequently are returned from comparison operators. We can use this to, for example, filter out all NaN entries

>>> a[~np.isnan(a)] array([ 2, 44, 6, 8, 10, 12])

Here the `~`

character simply inverses the values. Another common use case is filtering out all negative values

>>> a[a < 8] array([2, 6]) >>> a[a < 8] = 8 >>> a[a < 8] array([], dtype=int32)

NumPy provides functionality for elementary mathematical operations on arrays. These can be performed using the standard operators in Python or using the NumPy functions which provide a significant computing advantage when working with large data. Here, we discuss some basic mathematical operations that are possible on NumPy arrays.

- Addition –
`numpy.add(x, y)`

- Subtraction –
`numpy.subtract(x, y)`

- Multiplication –
`numpy.multiply(x, y)`

- Dot product –
`x.dot(y)`

or`numpy.dot(x, y)`

- Division –
`numpy.divide(x, y)`

- Square –
`numpy.square(x)`

- Square Root –
`numpy.sqrt(x)`

- Cube Root –
`numpy.cbrt(x)`

- Maximum –
`numpy.maximum(x, y)`

- Minimum –
`numpy.minimum(x, y)`

- Sum –
`numpy.sum(x) # calculates the sum of all elements`

`numpy.sum(x, axis=0) # calculates the sum of each column`

`numpy.sum(x, axis=1) # calculates the sum of each row`

- Product –
`numpy.prod(x)`

- Cross product –
`numpy.cross(x, y)`

These are some of the most basic functions you will need to know about, but there are also a lot more functions available for more advanced topics. For example, NumPy also supports trigonometric functions, hyperbolic functions and so much more, but we won’t go into detail here.

Broadcasting is a powerful technique that can greatly enhance your usage of NumPy. At its heart, it simply using a small array to perform certain desired operations on a larger array. However, used with planning and some clever thinking, broadcasting can significantly improve the computation time. Coupled with Python’s intuitive syntax, looping over some arrays could be omitted altogether. Let’s demonstrate this using an example where we add a vector to each row of a matrix.

The usual way is to loop over the entire array and add the vector to each row, or column if that’s the desired result. We show that in the following example

>>> x = numpy.array([[1,2,3], [4,5,6], [7,8,9]]) >>> v = numpy.array([1, 0, 1]) >>> y = numpy.empty_like(x) # empty matrix with the same shape as x >>> for i in range(len(x)): ... y[i, :] = x[i, :] + v ... >>> y array([[ 2, 2, 4], [ 5, 5, 7], [ 8, 8, 10]])

However, this can take a really long time if you are working with large data sets, in the region of 10 million as an example. In this case, broadcasting can significantly improve this processing time with an added bonus of having a simple operator-like syntax.

>>> x = numpy.array([[1,2,3], [4,5,6], [7,8,9]]) >>> v = numpy.array([1, 0, 1]) >>> y = x + v # Add v to each row of x using broadcasting >>> y array([[ 2, 2, 4], [ 5, 5, 7], [ 8, 8, 10]])

This code replaces the need for initializing an empty array and iterating over the entire original array which both are overheads when considering the cost of processing.

As you can see, both of these code samples produce the exact same result. However, we can clearly see an improvement in speed. To compare the speed, we’ll create two functions with the different code snippets inside.

>>> def f1(): ... x = numpy.array([[1,2,3], [4,5,6], [7,8,9]]) ... y = numpy.empty_like(x) ... v = numpy.array([1, 0, 1]) ... for i in range(len(x)): ... y[i, :] = x[i, :] + v ... return y ... >>> def f2(): ... x = numpy.array([[1,2,3], [4,5,6], [7,8,9]]) ... v = numpy.array([1, 0, 1]) ... y = x + v ... return y

Next, we’ll run the `timeit.timeit`

function to time the two functions. We’ve already written about how to use the timeit module before.

>>> timeit.timeit("f1()", setup="from __main__ import f1", number=10000) 0.07048290000000179 >>> timeit.timeit("f2()", setup="from __main__ import f2", number=10000) 0.051573899999993955

Because the data sets are so small, the time it takes to complete is already extremely small, but even then, we can still clearly see the improvement. Broadcasting improved the execution time by 36%, .

You should now know the basics of using NumPy to get started using it in your actual Python applications. NumPy is powerful enough by itself but for very specific applications, you will definitely want to use the derived libraries and packages. A prime example of this is SciPy which is now industry standard.

If you have any questions about getting started with NumPy or getting started with scientific computing in Python, feel free to leave a comment and I’ll do my best to reply as soon as possible.

The post NumPy Tutorial: the basics appeared first on Rooy Development.

]]>The post Python CSV: what you need to know CSV files in Python appeared first on Rooy Development.

]]>**Table of contents**

Python is known for its extensive collection of libraries that make your job 10 times easier and of course, it also has a library dedicated to handling CSV files, the csv python module. This package can be found in the standard library, so you do not need to install anything. We’ll start off by importing this package

import csv

The CSV library has two ways of writing data to CSV files. Both have their own use cases and you will probably have to use both of them at some point.

I’ll start by discussing the `csv.writer`

function. This is the most used option with ~1.000.000 usages on GitHub, versus ~50.000 usages for the second option.

Here you can see a quick example that should speak for itself. In the example script, Python will open the `demo.csv'`

file and write two new rows to it, with one being the header.

file = open('demo.csv', 'w+', newline='') writer = csv.writer( file, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL ) writer.writerow(['stock', 'price', 'cost', 'profit']) writer.writerow(['21', '121.34', '45.34', '76'])

~ demo.csv stock,price,cost,profit 21,121.34,45.45,76

This is about as simple as it gets, however, there are a few things you should probably remember:

- The
`newline`

argument: this should always be`''`

when opening a file which the csv package will work with, to write the rows without a new line every row. - The
`quoting`

argument: this specifies which fields should be quoted, there are a few options you can choose from:`csv.QUOTE_ALL`

: All fields will be quoted`csv.QUOTE_MINIMAL`

: Only fields containing the`delimiter`

or`quotechar`

will be quoted.`csv.QUOTE_NONNUMERIC`

: The writer will quote all fields containing text and it converts all numbers to`float`

values`csv.QUOTE_NONE`

: No fields will be quoted, the writer instead escapes delimiters. If you use this value, you also need to provide the`escapechar`

argument.

This function only has one required argument, which is the file object, but it has a couple of other optional arguments:

`delimiter`

: This argument specifies which delimiter the writer will use. It defaults to`','`

, but you can set it to any other character.`quotechar`

: This specifies which character will be used for quoting. It defaults to`'"'`

`escapechar`

: This specifies the character that will be used to escape the delimiter if quoting is not being used. It defaults to nothing.

The other option we could go with is the `DictWriter`

class. This class provides you with a little bit more structure and is probably less error-prone, as you’ll see in this example.

Another thing you’ll probably notice is the extra argument called `fieldnames`

. This is a required argument for the `DictWriter`

class. This argument should contain a list of all column names. These will be used to map the values correctly. You should use the `DictWriter.writeheaders()`

function to write the column names.

file = open('demo.csv', 'w+', newline='') writer = csv.DictWriter(file, fieldnames=['stock', 'price', 'cost', 'profit']) writer.writeheaders() writer.writerow( { 'stock': '21', 'price': '121.34', 'cost': '45.34', 'profit': '76.00' } )

~ demo.csv stock,price,cost,profit 21,121.34,45.34,76.00

As you can see, this is a bit of a bulkier approach, but there certainly are advantages versus the `csv.writer`

function. The main advantage with this approach is that you can’t really mess up the order in which you enter the values since it will automatically map the correct value to the right column, this is especially handy when you have a file with a dozen or maybe even dozens of columns.

I have made this mistake a lot of times and believe me, it’s a pain to find out why there is this one value that just isn’t correct, whatever you do.

Just as with writing, we have two options to choose from when reading a CSV file in Python. Here we also have a normal `csv.reader`

function and a `csv.DictReader`

class. In general, these two options have very much in common with their writing counterparts.

For the upcoming examples, we will use the file below as an example.

~ example.csv stock, price, cost, profit 21, 121.34, 45.34, 76 13, 100, 50, 50 32, 140, 90, 50

This function is relatively simple and follows the same concept as `csv.writer`

, it also accepts the same arguments as the `csv.writer`

function. This small example should speak for itself.

file = open('example.csv', 'r') reader = csv.reader(file, delimiter=',') for counter, row in enumerate(reader): if counter == 0: print('Fieldnames: {}'.format(', '.join(row)) else: print('Stock: {stock}tPrice: {price}tCost: {cost}tProfit {profit}'.format(row[0], row[1], row[2], row[3], row[4])) file.close()

And here is the output it will generate

Fieldnames: stock, price, cost, profit stock: 21 price: 121.34 cost: 45.34 profit: 76 stock: 13 price: 100.00 cost: 50.00 profit: 50.00 stock: 32 price: 140.00 cost: 90.00 profit: 50.00

Each row returned by iterating over the reader object will return a list of strings that contain the column values. The first row returned will contain the column names.

If the CSV file has a non-default quotechar, you need to use the optional `quotechar`

argument, when calling the `csv.reader`

function.

Again, this class is very similar. Below is a very simple example that shows exactly what you can do with the class. The output should speak for itself.

file = open('example.csv', 'r') reader = csv.DictReader(file, delimiter=',') for row in reader: print(row)

And again the output this script will generate:

{'stock': '21', 'price': '121.34', 'cost': '45.34', 'profit': '76.00'} {'stock': '13', 'price': '100.00', 'cost': '45.34', 'profit': '76.00'} {'stock': '32', 'price': '140.00', 'cost': '90.00', 'profit': '50.00'}

The class grabs the fieldnames from the first line in the CSV file. If the file does not have those, you can specify them by passing on the optional `fieldnames`

argument.

You should now know the basics of reading and writing CSV files in Python. In almost all use cases, the standard csv library will be enough, however, if you work with a lot of data, you might want to use pandas instead. We’ve already discussed writing and reading csv files with pandas before.

If you have any questions left about writing to csv files with Python or reading from csv files with Python, feel free to leave a comment and I’ll do my best to reply as soon as possible.

The post Python CSV: what you need to know CSV files in Python appeared first on Rooy Development.

]]>