π Introduction to NumPy
Contents
π Introduction to NumPy#
NumPy is
a package used to accelerate mathematical operations.
is specifically tailored towards matrix multiplication.
the first package we will be using in this course.
used on the backend in almost every python package that does math.
so efficient because it is written in C, but it is easy to use like Python.
Importing NumPy#
To use NumPy you need to import it. The convention for importing NumPy is:
import numpy as np
Note
Not all packages are installed with Python. You can download a distribution that contains common packages (e.g., Anaconda), or you can use the JupyterHub where we have installed the needed packages for you. If you need to install a package you can find most packages on the Python Package Index (PyPI). Usually packages can be installed by running `pip installWhat is the difference between a Python list and a NumPy array?#
NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them.
Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous (of the same data type).
Mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays werenβt homogeneous.
What is an Array?#
An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype.
Initializing Arrays#
# importing numpy
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
or
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
Accessing an element#
We can access the elements in the array using square brackets.
In NumPy starts at 0. That means that if you want to access the first element in your array, youβll be accessing element
0
.
print(a[0])
[1 2 3 4]
Array Types#
You might occasionally hear an array referred to as a βndarray,β which is shorthand for βN-dimensional array.β An N-dimensional array is simply an array with any number of dimensions.
1D, or one-dimensional array - Vector - magnitude and direction
2D, or two-dimensional array - Image, stress, strain
3D, or three-dimensional array - piezoelectricity
4D or beyond tensor - elasticity, magnetism
Attributes of an Array#
An array is usually a fixed-size container of items of the same type and size.
The number of dimensions and items in an array is defined by its shape.
The shape of an array is a tuple of non-negative integers that specify the sizes of each dimension.
a = np.array([[0.0, 0.0, 0.0], [1.0, 1.0, 1.0]])
You can view the shape of a NumPy array using the built in method .shape
a.shape
(2, 3)
Your array has 2 axes. The first axis has a length of 2 and the second axis has a length of 3.
Modifying Arrays#
The contents of an array can be accessed and modified by indexing or slicing the array they are mutable.
Unlike the typical container objects, different arrays can share the same data, so changes made on one array might be visible in another.
a[1, 2] = 100
print(a)
[[ 0. 0. 0.]
[ 1. 1. 100.]]
Creating Basic Arrays#
To create an NumPy array, you can use the function np.array()
a = np.array([1, 2, 3])
Preallocating a NumPy Array#
Sometimes it is useful to preallocate an array in memory
This can be done using the
np.zeros
ornp.ones
methods
print(np.zeros(2))
[0. 0.]
print(np.ones(2))
[1. 1.]
An Empty array#
When you use zeroes or ones you need to rewrite all the states in memory which takes time.
An empty array is filled with random values from the current memory state in the ram. This is way more efficient if initializing large arrays
%%time
del(a)
a = np.empty((100,100,1000))
CPU times: user 9 Β΅s, sys: 15 Β΅s, total: 24 Β΅s
Wall time: 25.7 Β΅s
=======
CPU times: user 173 Β΅s, sys: 16 Β΅s, total: 189 Β΅s
Wall time: 30 Β΅s
>>>>>>> Stashed changes
An ordered array#
You can build an ordered array starting from 0 using np.arrange()
method
np.arange(4)
array([0, 1, 2, 3])
There are many options within the np.arrange()
method. np.arrange([start(inclusive), stop(exclusive), step size])
np.arange(2, 10, 2)
array([2, 4, 6, 8])
If you want to discover the syntax for a method you can search the docstring on the package website, or you can use ??
within Jupyter
?? np.arange
You can create a linear spaced vector using the np.linspace()
method, where np.linspace(start[inclusive], stop[inclusive], number_of_steps)
np.linspace(0, 11, num=5)
array([ 0. , 2.75, 5.5 , 8.25, 11. ])
Adding, Removing, and Sorting Elements#
Sorting an element is simple with np.sort()
. You can specify the axis, kind, and order when you call the function.
If you start with this array:
arr = np.array([2, 1, 5, 3, 7, 4, 6, 8])
You can quickly sort the numbers in ascending order with:
np.sort(arr)
array([1, 2, 3, 4, 5, 6, 7, 8])
Other Sort Options:
argsort, which is an indirect sort along a specified axis,
lexsort, which is an indirect stable sort on multiple keys,
searchsorted, which will find elements in a sorted array, and
partition, which is a partial sort.
Concatenating Arrays#
If you start with these arrays:
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
You can concatenate them with np.concatenate()
np.concatenate((a, b))
array([1, 2, 3, 4, 5, 6, 7, 8])
Or
x = np.array([[1, 2], [3, 4]])
y = np.array([[5, 6]])
np.concatenate((x, y), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
Determining the shape and size of an array?#
ndarray.ndim
will tell you the number of axes, or dimensions, of the array.
ndarray.size
will tell you the total number of elements of the array. This is the product of the elements of the arrayβs shape.
ndarray.shape
will display a tuple of integers that indicate the number of elements stored along each dimension of the array. If, for example, you have a 2-D array with 2 rows and 3 columns, the shape of your array is (2, 3).
array_example = np.array(
[
[[0, 1, 2, 3], [4, 5, 6, 7]],
[[0, 1, 2, 3], [4, 5, 6, 7]],
[[0, 1, 2, 3], [4, 5, 6, 7]],
]
)
To find the number of dimensions of the array, run:
array_example.ndim
3
To find the total number of elements in the array, run:
array_example.size
24
And to find the shape of your array, run:
array_example.shape
(3, 2, 4)
Reshaping an Array#
Using arr.reshape()
will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array. If you start with an array with 12 elements, youβll need to make sure that your new array also has a total of 12 elements.
a = np.arange(6)
print(a)
[0 1 2 3 4 5]
b = a.reshape(3, 2)
print(b)
[[0 1]
[2 3]
[4 5]]
Adding and Removing an Axis#
You can use np.newaxis
and np.expand_dims
to increase the dimensions of your existing array.
Using np.newaxis
will increase the dimensions of your array by one dimension when used once. This means that a 1D array will become a 2D array, a 2D array will become a 3D array, and so on.
For example, if you start with this array:
a = np.array([1, 2, 3, 4, 5, 6])
a.shape
(6,)
You can use np.newaxis
to add a new axis:
a2 = a[np.newaxis, :]
a2.shape
(1, 6)
You can remove an axis using np.squeeze
a2.squeeze()
array([1, 2, 3, 4, 5, 6])
Indexing and Slicing#
You can index and slice NumPy arrays in the same ways you can slice Python lists.
data = np.array([1, 2, 3])
data[1]
2
data[0:2]
array([1, 2])
data[1:]
array([2, 3])
data[-2:]
array([2, 3])
You can visualize it this way:
If you want to select values from your array that fulfill certain conditions, itβs straightforward with NumPy.
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
You can easily print all of the values in the array that are less than 5.
print(a[a < 5])
[1 2 3 4]
Or numbers that are divisible by 2
divisible_by_2 = a[a % 2 == 0]
print(divisible_by_2)
[ 2 4 6 8 10 12]
Or you can select elements that satisfy two conditions using the &
(and) and |
(or) operators:
c = a[(a > 2) & (a < 11)]
print(c)
[ 3 4 5 6 7 8 9 10]
You can also make use of the logical operators & and | in order to return boolean values that specify whether or not the values in an array fulfill a certain condition. This can be useful with arrays that contain names or other categorical values.
five_up = (a > 5) | (a == 5)
print(five_up)
[[False False False False]
[ True True True True]
[ True True True True]]
You can also find the index that satisfy a condition using np.argwhere
ind = np.argwhere((a > 2) & (a < 11))
print(ind)
[[0 2]
[0 3]
[1 0]
[1 1]
[1 2]
[1 3]
[2 0]
[2 1]]
Or you can find a boolean and use it for indexing
ind = np.where((a > 2) & (a < 11))
print(a[ind])
[ 3 4 5 6 7 8 9 10]
How to create arrays from existing data?#
Letβs say you have this array:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
You can create a new array from a section of your array any time by specifying where you want to slice your array
arr1 = a[3:8]
arr1
array([4, 5, 6, 7, 8])
Here, you grabbed a section of your array from index position 3 through index position 8.
You can also stack two existing arrays, both vertically and horizontally. Letβs say you have two arrays, a1
and a2
:
a1 = np.array([[1, 1], [2, 2]])
a2 = np.array([[3, 3], [4, 4]])
You can stack them vertically with vstack
:
np.vstack((a1, a2))
array([[1, 1],
[2, 2],
[3, 3],
[4, 4]])
Or stack them horizontally with hstack
:
np.hstack((a1, a2))
array([[1, 1, 3, 3],
[2, 2, 4, 4]])
Copies and Views#
When operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not. This is often a source of confusion for beginners. There are three cases:
No Copy at All#
Simple assignments make no copy of objects or their data.
a = np.array([[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]])
b = a
b is a
True
Python passes mutable objects as references, so function calls make no copy.
def f(x):
print(id(x))
f(a)
f(b)
140477309597808
140477309597808
=======
4504271824
4504271824
>>>>>>> Stashed changes
See, they really are the same, this is useful if you want to save memory
View or Shallow Copy#
Different array objects can share the same data. The view
method creates a new array object that looks at the same data.
c = a.view()
c is a
False
c.base is a # c is a view of the data owned by a
True
c = c.reshape((2, 6)) # a's shape doesn't change
a.shape
(3, 4)
c[0, 4] = 1234 # a's data changes
a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])
Note that operations on the object can be applied without effecting the original object. If a value is changed it does effect the original object.
Deep Copy#
The copy method makes a complete copy of the array and its data
This will consume the same amount of memory as the original object
d = a.copy() # a new array object with new data is created
d is a
False
d.base is a # d doesn't share anything with a
False
d[0, 0] = 9999
a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])
Basic array operations#
Once youβve created your arrays, you can start to work with them. Letβs say, for example, that youβve created two arrays, one called βdataβ and one called βonesβ
You can add the arrays together with the plus sign.
data = np.array([1, 2])
ones = np.ones(2, dtype=int)
data + ones
array([2, 3])
You can do all operations on arrays
data - ones
array([0, 1])
data * data
array([1, 4])
data / data
array([1., 1.])
Basic operations are simple with NumPy. If you want to find the sum of the elements in an array, youβd use sum()
. This works for 1D arrays, 2D arrays, and arrays in higher dimensions.
a = np.array([1, 2, 3, 4])
a.sum()
10
To add the rows or the columns in a 2D array, you would specify the axis.
If you start with this array:
b = np.array([[1, 1], [2, 2]])
b.sum(axis=0)
array([3, 3])
or
b = np.array([[1, 1], [2, 2]])
b.sum(axis=1)
array([2, 4])
Learn More about basic operations here
Besides sum
there are a plethora of built in calculations. You can discover more here
Broadcasting#
There are times when you might want to carry out an operation between an array and a single number (also called an operation between a vector and a scalar) or between arrays of two different sizes. For example, your array (weβll call it βdataβ) might contain information about distance in miles but you want to convert the information to kilometers. You can perform this operation with:
data = np.array([1.0, 2.0])
data * 1.6
array([1.6, 3.2])
NumPy understands that the multiplication should happen with each cell.
Broadcasting is a mechanism that allows NumPy to perform operations on arrays of different shapes.
The dimensions of your array must be compatible, for example, when the dimensions of both arrays are equal or when one of them is 1.
If the dimensions are not compatible, you will get a ValueError.
Finding Unique Items and Counts#
You can find the unique elements in an array easily with np.unique
For example, if you start with this array:
a = np.array([11, 11, 12, 13, 14, 15, 16, 17, 12, 13, 11, 14, 18, 19, 20])
you can use np.unique
to print the unique values in your array:
unique_values = np.unique(a)
print(unique_values)
[11 12 13 14 15 16 17 18 19 20]
To get the indices of unique values in a NumPy array (an array of first index positions of unique values in the array), just pass the return_index argument in np.unique()
as well as your array.
unique_values, indices_list = np.unique(a, return_index=True)
print(indices_list)
[ 0 2 3 4 5 6 7 12 13 14]
There are many more options which can be read in the doc string
Transposing and Reshaping#
Itβs common to need to transpose your matrices. NumPy arrays have the property T that allows you to transpose a matrix.
data = np.array([[1, 2], [3, 4], [5, 6]])
data.T
array([[1, 3, 5],
[2, 4, 6]])
You may also need to switch the dimensions of a matrix. This can happen when, for example, you have a model that expects a certain input shape that is different from your dataset. This is where the reshape
method can be useful. You simply need to pass in the new dimensions that you want for the matrix.
data.reshape(2, 3)
array([[1, 2, 3],
[4, 5, 6]])
data.reshape(3, 2)
array([[1, 2],
[3, 4],
[5, 6]])
To learn more about transposing and reshaping arrays, see transpose and reshape.