Ryerson University Creation and Numpy Arrays Questions

From your reading of Chapter 7 of our textbook array-oriented programming with NumPy, discuss with the class (with original examples).

arrays creation and operations with Numpy.

Numpy calculation methods and functions.

arrays slicing and reshaping with Numpy.

7. Array-Oriented Programming with NumPy
Objectives
In this chapter, you’ll:
•
•
•
•
•
Learn what arrays are and how they differ from lists.
Use the numpy module’s highperformance ndarrays.
Compare list and ndarray performance with the IPython %timeit magic.
Use ndarrays to store and retrieve data efficiently.
Create and initialize ndarrays.
•
•
•
•
•
Refer to individual ndarray elements.
Iterate through ndarrays.
Create and manipulate multidimensional ndarrays.
Perform common ndarray manipulations.
Create and manipulate pandas one-dimensional Series and two-dimensional DataFrames.
•
•
•
Customize Series and DataFrame indices.
Calculate basic descriptive statistics for data in a Series and a DataFrame.
Customize floating-point number precision in pandas output formatting.
7.1 Introduction
NumPy (Numerical Python) Library
•
•
•
•
•
•
First appeared in 2006 and is the preferred Python array implementation.
High-performance, richly functional n-dimensional array type called ndarray.
Written in C and up to 100 times faster than lists.
Critical in big-data processing, AI applications and much more.
According to libraries.io, over 450 Python libraries depend on NumPy.
Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras
(for deep learning) are built on or depend on NumPy.
Array-Oriented Programming
•
Functional-style programming with internal iteration makes array-oriented manipulations
concise and straightforward, and reduces the possibility of error.
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.2 Creating arrays from Existing Data
•
•
•
Creating an array with the array function
Argument is an array or other iterable
Returns a new array containing the argument’s elements
In [1]:
import numpy as np
In [2]:
numbers = np.array([2, 3, 5, 7, 11])
In [3]:
type(numbers)
Out[3]:
numpy.ndarray
In [4]:
numbers
Out[4]:
array([ 2,
3,
5,
7, 11])
Multidimensional Arguments
In [5]:
np.array([[1, 2, 3], [4, 5, 6]])
Out[5]:
array([[1, 2, 3],
[4, 5, 6]])
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.3 array Attributes
•
attributes enable you to discover information about its structure and contents
In [1]:
import numpy as np
In [2]:
integers = np.array([[1, 2, 3], [4, 5, 6]])
In [3]:
integers
Out[3]:
array([[1, 2, 3],
[4, 5, 6]])
In [4]:
floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
In [5]:
floats
Out[5]:
array([0. , 0.1, 0.2, 0.3, 0.4])
•
NumPy does not display trailing 0s
Determining an array’s Element Type
In [6]:
integers.dtype
Out[6]:
dtype(‘int64’)
In [7]:
floats.dtype
Out[7]:
dtype(‘float64′)
•
•
For performance reasons, NumPy is written in the C programming language and uses C’s
data types
Other NumPy types
Determining an array’s Dimensions
•
•
ndim contains an array’s number of dimensions
shape contains a tuple specifying an array’s dimensions
In [8]:
integers.ndim
Out[8]:
2
In [9]:
floats.ndim
Out[9]:
1
In [10]:
integers.shape
Out[10]:
(2, 3)
In [11]:
floats.shape
Out[11]:
(5,)
Determining an array’s Number of Elements and Element Size
•
•
view an array’s total number of elements with size
view number of bytes required to store each element with itemsize
In [12]:
integers.size
Out[12]:
6
In [13]:
integers.itemsize
Out[13]:
8
In [14]:
floats.size
Out[14]:
5
In [15]:
floats.itemsize
Out[15]:
8
Iterating through a Multidimensional array’s Elements
In [16]:
for row in integers:
for column in row:
print(column, end=’
print()
1
4
‘)
2
5
3
6
•
Iterate through a multidimensional array as if it were one-dimensional by using flat
In [17]:
for i in integers.flat:
print(i, end=’ ‘)
1
2
3
4
5
6
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.4 Filling arrays with Specific Values
•
Functions zeros, ones and full create arrays containing 0s, 1s or a specified value,
respectively
In [1]:
import numpy as np
In [2]:
np.zeros(5)
Out[2]:
array([0., 0., 0., 0., 0.])
•
For a tuple of integers, these functions return a multidimensional array with the specified
dimensions
In [3]:
np.ones((2, 4), dtype=int)
Out[3]:
array([[1, 1, 1, 1],
[1, 1, 1, 1]])
In [4]:
np.full((3, 5), 13)
Out[4]:
array([[13, 13, 13, 13, 13],
[13, 13, 13, 13, 13],
[13, 13, 13, 13, 13]])
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.5 Creating arrays from Ranges
•
NumPy provides optimized functions for creating arrays from ranges
Creating Integer Ranges with arange
In [1]:
import numpy as np
In [2]:
np.arange(5)
Out[2]:
array([0, 1, 2, 3, 4])
In [3]:
np.arange(5, 10)
Out[3]:
array([5, 6, 7, 8, 9])
In [4]:
np.arange(10, 1, -2)
Out[4]:
array([10,
8,
6,
4,
2])
Creating Floating-Point Ranges with linspace
•
•
Produce evenly spaced floating-point ranges with NumPy’s linspace function
Ending value is included in the array
In [5]:
np.linspace(0.0, 1.0, num=5)
Out[5]:
array([0.
, 0.25, 0.5 , 0.75, 1.
])
Reshaping an array
•
•
array method reshape transforms an array into different number of dimensions
New shape must have the same number of elements as the original
In [6]:
np.arange(1, 21).reshape(4, 5)
Out[6]:
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20]])
Displaying Large arrays
•
When displaying an array, if there are 1000 items or more, NumPy drops the middle rows,
columns or both from the output
In [7]:
np.arange(1, 100001).reshape(4, 25000)
Out[7]:
array([[
1,
[ 25001,
[ 50001,
[ 75001,
2,
25002,
50002,
75002,
3,
25003,
50003,
75003,
…,
…,
…,
…,
24998,
49998,
74998,
99998,
24999, 25000],
49999, 50000],
74999, 75000],
99999, 100000]])
In [8]:
np.arange(1, 100001).reshape(100, 1000)
Out[8]:
array([[
1,
[ 1001,
[ 2001,
…,
[ 97001,
[ 98001,
[ 99001,
2,
1002,
2002,
3, …,
1003, …,
2003, …,
998,
1998,
2998,
97002,
98002,
99002,
97003, …,
98003, …,
99003, …,
97998,
98998,
99998,
999,
1999,
2999,
1000],
2000],
3000],
97999, 98000],
98999, 99000],
99999, 100000]])
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.6 List vs. array Performance:
Introducing %timeit
•
•
Most array operations execute significantly faster than corresponding list operations
IPython %timeit magic command times the average duration of operations
Timing the Creation of a List Containing Results of 6,000,000 Die Rolls
In [1]:
import random
In [2]:
%timeit rolls_list = \
[random.randrange(1, 7) for i in range(0, 6_000_000)]
6.88 s ± 276 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
•
•
•
By default, %timeit executes a statement in a loop, and it runs the loop seven times
If you do not indicate the number of loops, %timeit chooses an appropriate value
After executing the statement, %timeit displays the statement’s average execution time, as
well as the standard deviation of all the executions
Timing the Creation of an array Containing Results of 6,000,000 Die Rolls
In [3]:
import numpy as np
In [4]:
%timeit rolls_array = np.random.randint(1, 7, 6_000_000)
75.2 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
60,000,000 and 600,000,000 Die Rolls
In [5]:
%timeit rolls_array = np.random.randint(1, 7, 60_000_000)
916 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [6]:
%timeit rolls_array = np.random.randint(1, 7, 600_000_000)
10.3 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Customizing the %timeit Iterations
In [7]:
%timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000)
74.5 ms ± 7.58 ms per loop (mean ± std. dev. of 2 runs, 3 loops each)
Other IPython Magics
IPython provides dozens of magics for a variety of tasks—for a complete list, see the IPython magics
documentation. Here are a few helpful ones:
•
•
•
•
•
•
•
%load to read code into IPython from a local file or URL.
%save to save snippets to a file.
%run to execute a .py file from IPython.
%precision to change the default floating-point precision for IPython outputs.
%cd to change directories without having to exit IPython first.
%edit to launch an external editor—handy if you need to modify more complex snippets.
%history to view a list of all snippets and commands you’ve executed in the current
IPython session.
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.7 array Operators
•
•
array operators perform operations on entire arrays.
Can perform arithmetic between arrays and scalar numeric values,
and between arrays of the same shape.
In [1]:
import numpy as np
In [2]:
numbers = np.arange(1, 6)
In [3]:
numbers
Out[3]:
array([1, 2, 3, 4, 5])
In [4]:
numbers * 2
Out[4]:
array([ 2,
4,
6,
8, 10])
In [5]:
numbers ** 3
Out[5]:
array([
1,
8,
27,
64, 125])
numbers
# numbers is unchanged by the arithmetic operators
In [6]:
Out[6]:
array([1, 2, 3, 4, 5])
In [7]:
numbers += 10
In [8]:
numbers
Out[8]:
array([11, 12, 13, 14, 15])
Broadcasting
•
•
•
•
Arithmetic operations require as operands two arrays of the same size and shape.
numbers * 2 is equivalent to numbers * [2, 2, 2, 2, 2] for a 5-element array.
Applying the operation to every element is called broadcasting.
Also can be applied between arrays of different sizes and shapes, enabling some concise
and powerful manipulations.
Arithmetic Operations Between arrays
•
Can perform arithmetic operations and augmented assignments between arrays of
the same shape
In [9]:
numbers2 = np.linspace(1.1, 5.5, 5)
In [10]:
numbers2
Out[10]:
array([1.1, 2.2, 3.3, 4.4, 5.5])
In [11]:
numbers * numbers2
Out[11]:
array([12.1, 26.4, 42.9, 61.6, 82.5])
Comparing arrays
•
•
•
Can compare arrays with individual values and with other arrays
Comparisons performed element-wise
Produce arrays of Boolean values in which each element’s True or False value indicates
the comparison result
In [12]:
numbers
Out[12]:
array([11, 12, 13, 14, 15])
In [13]:
numbers >= 13
Out[13]:
array([False, False,
True,
True,
True])
In [14]:
numbers2
Out[14]:
array([1.1, 2.2, 3.3, 4.4, 5.5])
In [15]:
numbers2 < numbers Out[15]: array([ True, True, True, True, True]) In [16]: numbers == numbers2 Out[16]: array([False, False, False, False, False]) In [17]: numbers == numbers Out[17]: array([ True, True, True, True, True]) ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.8 NumPy Calculation Methods • • These methods ignore the array’s shape and use all the elements in the calculations. Consider an array representing four students’ grades on three exams: In [1]: import numpy as np In [2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In [3]: grades Out[3]: array([[ 87, [100, [ 94, [100, • • 96, 87, 77, 81, 70], 90], 90], 82]]) Can use methods to calculate sum, min, max, mean, std (standard deviation) and var (variance) Each is a functional-style programming reduction In [4]: grades.sum() Out[4]: 1054 In [5]: grades.min() Out[5]: 70 In [6]: grades.max() Out[6]: 100 In [7]: grades.mean() Out[7]: 87.83333333333333 In [8]: grades.std() Out[8]: 8.792357792739987 In [9]: grades.var() Out[9]: 77.30555555555556 Calculations by Row or Column • • • You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions) Each 2D+ array has one axis per dimension In a 2D array, axis=0 indicates calculations should be column-by-column In [10]: grades.mean(axis=0) Out[10]: array([95.25, 85.25, 83. • ]) In a 2D array, axis=1 indicates calculations should be row-by-row In [11]: grades.mean(axis=1) Out[11]: array([84.33333333, 92.33333333, 87. • , 87.66666667]) Other Numpy array Calculation Methods ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.9 Universal Functions • • • • Standalone universal functions (ufuncs) perform element-wise operations using one or two array or array-like arguments (like lists) Each returns a new array containing the results Some ufuncs are called when you use array operators like + and * Create an array and calculate the square root of its values, using the sqrt universal function In [1]: import numpy as np In [2]: numbers = np.array([1, 4, 9, 16, 25, 36]) In [3]: np.sqrt(numbers) Out[3]: array([1., 2., 3., 4., 5., 6.]) • • Add two arrays with the same shape, using the add universal function Equivalent to: • numbers + numbers2 In [4]: numbers2 = np.arange(1, 7) * 10 In [5]: numbers2 Out[5]: array([10, 20, 30, 40, 50, 60]) In [6]: np.add(numbers, numbers2) Out[6]: array([11, 24, 39, 56, 75, 96]) Broadcasting with Universal Functions • Universal functions can use broadcasting, just like NumPy array operators In [7]: np.multiply(numbers2, 5) Out[7]: array([ 50, 100, 150, 200, 250, 300]) In [8]: numbers3 = numbers2.reshape(2, 3) In [9]: numbers3 Out[9]: array([[10, 20, 30], [40, 50, 60]]) In [10]: numbers4 = np.array([2, 4, 6]) In [11]: np.multiply(numbers3, numbers4) Out[11]: array([[ 20, 80, 180], [ 80, 200, 360]]) • Broadcasting rules documentation Other Universal Functions NumPy universal functions Math — add, subtract, multiply, divide, remainder, exp, log, sqrt, power, and more. Trigonometry —sin, cos, tan, hypot, arcsin, arccos, arctan, and more. Bit manipulation —bitwise_and, bitwise_or, bitwise_xor, invert, left_shift and right_shift. Comparison — greater, greater_equal, less, less_equal, equal, not_equal, logical_and, logical_or, logical_ xor, logical_not, minimum, maximum, and more. Floating point —floor, ceil, isinf, isnan, fabs, trunc, and more. ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.10 Indexing and Slicing • One-dimensional arrays can be indexed and sliced like lists. Indexing with Two-Dimensional arrays • To select an element in a two-dimensional array, specify a tuple containing the element’s row and column indices in square brackets In [1]: import numpy as np In [2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In [3]: grades Out[3]: array([[ 87, [100, [ 94, [100, 96, 87, 77, 81, 70], 90], 90], 82]]) In [4]: grades[0, 1] # row 0, column 1 Out[4]: 96 Selecting a Subset of a Two-Dimensional array’s Rows • To select a single row, specify only one index in square brackets In [5]: grades[1] Out[5]: array([100, • 87, 90]) Select multiple sequential rows with slice notation In [6]: grades[0:2] Out[6]: array([[ 87, [100, 96, 87, 70], 90]]) • Select multiple non-sequential rows with a list of row indices In [7]: grades[[1, 3]] Out[7]: array([[100, [100, 87, 81, 90], 82]]) Selecting a Subset of a Two-Dimensional array’s Columns • The column index also can be a specific index, a slice or a list In [8]: grades[:, 0] Out[8]: array([ 87, 100, 94, 100]) In [9]: grades[:, 1:3] Out[9]: array([[96, [87, [77, [81, 70], 90], 90], 82]]) In [10]: grades[:, [0, 2]] Out[10]: array([[ 87, [100, [ 94, [100, 70], 90], 90], 82]]) ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.11 Views: Shallow Copies • • Views “see” the data in other objects, rather than having their own copies of the data Views are shallow copies *array method view returns a new array object with a view of the original array object’s data In [1]: import numpy as np In [2]: numbers = np.arange(1, 6) In [3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In [4]: numbers2 = numbers.view() In [5]: numbers2 Out[5]: array([1, 2, 3, 4, 5]) • Use built-in id function to see that numbers and numbers2 are different objects In [6]: id(numbers) Out[6]: 4431803056 In [7]: id(numbers2) Out[7]: 4430398928 • Modifying an element in the original array, also modifies the view and vice versa In [8]: numbers[1] *= 10 In [9]: numbers2 Out[9]: array([ 1, 20, 3, 4, 5]) In [10]: numbers Out[10]: array([ 1, 20, 3, 4, 5]) In [11]: numbers2[1] /= 10 In [12]: numbers Out[12]: array([1, 2, 3, 4, 5]) In [13]: numbers2 Out[13]: array([1, 2, 3, 4, 5]) Slice Views • Slices also create views In [14]: numbers2 = numbers[0:3] In [15]: numbers2 Out[15]: array([1, 2, 3]) In [16]: id(numbers) Out[16]: 4431803056 In [17]: id(numbers2) Out[17]: 4451350368 • Confirm that numbers2 is a view of only first three numbers elements In [18]: numbers2[3] -----------------------------------------------------------------------IndexError Traceback (most recent call last) in ----> 1 numbers2[3]
IndexError: index 3 is out of bounds for axis 0 with size 3
•
Modify an element both arrays share to show both are updated
In [19]:
numbers[1] *= 20
In [20]:
numbers
Out[20]:
array([ 1, 40,
3,
4,
5])
In [21]:
numbers2
Out[21]:
array([ 1, 40,
3])
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.12 Deep Copies
•
•
•
When sharing mutable values, sometimes it’s necessary to create a deep copy of the
original data
Especially important in multi-core programming, where separate parts of your program could
attempt to modify your data at the same time, possibly corrupting it
array method copy returns a new array object with an independent copy of the original
array’s data
In [1]:
import numpy as np
In [2]:
numbers = np.arange(1, 6)
In [3]:
numbers
Out[3]:
array([1, 2, 3, 4, 5])
In [4]:
numbers2 = numbers.copy()
In [5]:
numbers2
Out[5]:
array([1, 2, 3, 4, 5])
In [6]:
numbers[1] *= 10
In [7]:
numbers
Out[7]:
array([ 1, 20,
3,
4,
5])
In [8]:
numbers2
Out[8]:
array([1, 2, 3, 4, 5])
Module copy—Shallow vs. Deep Copies for Other Types of Python Objects
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.13 Reshaping and Transposing
reshape vs. resize
•
•
Method reshape returns a view (shallow copy) of the original array with new dimensions
Does not modify the original array
In [1]:
import numpy as np
In [2]:
grades = np.array([[87, 96, 70], [100, 87, 90]])
In [3]:
grades
Out[3]:
array([[ 87,
[100,
96,
87,
70],
90]])
In [4]:
grades.reshape(1, 6)
Out[4]:
array([[ 87,
96,
70, 100,
87,
90]])
In [5]:
grades
Out[5]:
array([[ 87,
[100,
•
96,
87,
70],
90]])
Method resize modifies the original array’s shape
In [6]:
grades.resize(1, 6)
In [7]:
grades
Out[7]:
array([[ 87,
96,
70, 100,
87,
90]])
flatten vs. ravel
•
•
Can flatten a multi-dimensonal array into a single dimension with
methods flatten and ravel
flatten deep copies the original array’s data
In [8]:
grades = np.array([[87, 96, 70], [100, 87, 90]])
In [9]:
grades
Out[9]:
array([[ 87,
[100,
96,
87,
70],
90]])
In [10]:
flattened = grades.flatten()
In [11]:
flattened
Out[11]:
array([ 87,
96,
70, 100,
87,
90])
In [12]:
grades
Out[12]:
array([[ 87,
[100,
96,
87,
70],
90]])
In [13]:
flattened[0] = 100
In [14]:
flattened
Out[14]:
array([100,
96,
70, 100,
87,
90])
In [15]:
grades
Out[15]:
array([[ 87,
[100,
•
96,
87,
70],
90]])
Method ravel produces a view of the original array, which shares the grades array’s
data
In [16]:
raveled = grades.ravel()
In [17]:
raveled
Out[17]:
array([ 87,
96,
70, 100,
87,
90])
In [18]:
grades
Out[18]:
array([[ 87,
[100,
96,
87,
70],
90]])
In [19]:
raveled[0] = 100
In [20]:
raveled
Out[20]:
array([100,
96,
70, 100,
87,
90])
In [21]:
grades
Out[21]:
array([[100,
[100,
96,
87,
70],
90]])
Transposing Rows and Columns
•
•
Can quickly transpose an array’s rows and columns
▪ “flips” the array, so the rows become the columns and the columns become the
rows
T attribute returns a transposed view (shallow copy) of the array
In [22]:
grades.T
Out[22]:
array([[100, 100],
[ 96, 87],
[ 70, 90]])
In [23]:
grades
Out[23]:
array([[100,
[100,
96,
87,
70],
90]])
Horizontal and Vertical Stacking
•
Can combine arrays by adding more columns or more rows—known as horizontal
stacking and vertical stacking
In [24]:
grades2 = np.array([[94, 77, 90], [100, 81, 82]])
•
•
Combine grades and grades2 with NumPy’s hstack (horizontal stack) function by
passing a tuple containing the arrays to combine
The extra parentheses are required because hstack expects one argument
•
Adds more columns
In [25]:
np.hstack((grades, grades2))
Out[25]:
array([[100,
[100,
•
•
96,
87,
70, 94,
90, 100,
77,
81,
90],
82]])
Combine grades and grades2 with NumPy’s vstack (vertical stack) function
Adds more rows
In [26]:
np.vstack((grades, grades2))
Out[26]:
array([[100,
[100,
[ 94,
[100,
96,
87,
77,
81,
70],
90],
90],
82]])
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.14.1 pandas Series
•
•
•
An enhanced one-dimensional array
Supports custom indexing, including even non-integer indices like strings
Offers additional capabilities that make them more convenient for many data-science
oriented tasks
▪ Series may have missing data
▪ Many Series operations ignore missing data by default
Creating a Series with Default Indices
•
By default, a Series has integer indices numbered sequentially from 0
In [1]:
import pandas as pd
In [2]:
grades = pd.Series([87, 100, 94])
Creating a Series with All Elements Having the Same Value
•
•
Second argument is a one-dimensional iterable object (such as a list, an array or a range)
containing the Series’ indices
Number of indices determines the number of elements
In [149]:
pd.Series(98.6, range(3))
Out[149]:
0
98.6
1
98.6
2
98.6
dtype: float64
Accessing a Series’ Elements
In [150]:
grades[0]
Out[150]:
87
Producing Descriptive Statistics for a Series
•
•
Series provides many methods for common tasks including producing various descriptive
statistics
Each of these is a functional-style reduction
In [151]:
grades.count()
Out[151]:
3
In [152]:
grades.mean()
Out[152]:
93.66666666666667
In [153]:
grades.min()
Out[153]:
87
In [154]:
grades.max()
Out[154]:
100
In [155]:
grades.std()
Out[155]:
6.506407098647712
•
•
•
Series method describe produces all these stats and more
The 25%, 50% and 75% are quartiles:
▪ 50% represents the median of the sorted values.
▪ 25% represents the median of the first half of the sorted values.
▪ 75% represents the median of the second half of the sorted values.
For the quartiles, if there are two middle elements, then their average is that quartile’s
median
In [156]:
grades.describe()
Out[156]:
count
mean
std
min
25%
50%
75%
3.000000
93.666667
6.506407
87.000000
90.500000
94.000000
97.000000
max
100.000000
dtype: float64
Creating a Series with Custom Indices
Can specify custom indices with the index keyword argument
In [157]:
grades = pd.Series([87, 100, 94], index=[‘Wally’, ‘Eva’, ‘Sam’])
In [158]:
grades
Out[158]:
Wally
87
Eva
100
Sam
94
dtype: int64
Dictionary Initializers
•
If you initialize a Series with a dictionary, its keys are the indices, and its values become
the Series’ element values
In [159]:
grades = pd.Series({‘Wally’: 87, ‘Eva’: 100, ‘Sam’: 94})
In [160]:
grades
Out[160]:
Wally
87
Eva
100
Sam
94
dtype: int64
Accessing Elements of a Series Via Custom Indices
•
Can access individual elements via square brackets containing a custom index value
In [161]:
grades[‘Eva’]
Out[161]:
100
•
If custom indices are strings that could represent valid Python identifiers, pandas
automatically adds them to the Series as attributes
In [162]:
grades.Wally
Out[162]:
87
•
dtype attribute returns the underlying array’s element type
In [163]:
grades.dtype
Out[163]:
dtype(‘int64’)
•
values attribute returns the underlying array
In [164]:
grades.values
Out[164]:
array([ 87, 100,
94])
Creating a Series of Strings
•
In a Series of strings, you can use str attribute to call string methods on the elements
In [165]:
hardware = pd.Series([‘Hammer’, ‘Saw’, ‘Wrench’])
In [166]:
hardware
Out[166]:
0
Hammer
1
Saw
2
Wrench
dtype: object
•
•
•
Call string method contains on each element
Returns a Series containing bool values indicating the contains method’s result for each
element
The str attribute provides many string-processing methods that are similar to those in
Python’s string type
▪ https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling
In [167]:
hardware.str.contains(‘a’)
Out[167]:
0
True
1
True
2
False
dtype: bool
•
Use string method upper to produce a new Series containing the uppercase versions of
each element in hardware
In [168]:
hardware.str.upper()
Out[168]:
0
HAMMER
1
SAW
2
WRENCH
dtype: object
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
the book Intro to Python for Computer Science and Data Science: Learning to Program with
AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
book. These efforts include the development, research, and testing of the theories and programs to
determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
or implied, with regard to these programs or to the documentation contained in these books. The
authors and publisher shall not be liable in any event for incidental or consequential damages in
connection with, or arising out of, the furnishing, performance, or use of these programs.
7.14.2 DataFrames
•
•
•
Enhanced two-dimensional array
Can have custom row and column indices
Offers additional operations and capabilities that make them more convenient for many datascience oriented tasks
Support missing data
Each column in a DataFrame is a Series
•
•
Creating a DataFrame from a Dictionary
•
Create a DataFrame from a dictionary that represents student grades on three exams
In [1]:
import pandas as pd
In [2]:
grades_dict = {‘Wally’: [87, 96, 70], ‘Eva’: [100, 87, 90],
‘Sam’: [94, 77, 90], ‘Katie’: [100, 81, 82],
‘Bob’: [83, 65, 85]}
In [3]:
grades = pd.DataFrame(grades_dict)
•
Pandas displays DataFrames in tabular format with indices left aligned in the index column
and the remaining columns’ values right aligned
In [4]:
grades
Out[4]:
Wally
Eva
Sam
Katie
Bob
0
87
100
94
100
83
1
96
87
77
81
65
2
70
90
90
82
85
Customizing a DataFrame’s Indices with the index Attribute
•
•
Can use the index attribute to change the DataFrame’s indices from sequential integers to
labels
Must provide a one-dimensional collection that has the same number of elements as there
are rows in the DataFrame
In [5]:
grades.index = [‘Test1’, ‘Test2’, ‘Test3’]
In [6]:
grades
Out[6]:
Wally
Eva
Sam
Katie
Bob
Test1
87
100
94
100
83
Test2
96
87
77
81
65
Test3
70
90
90
82
85
Accessing a DataFrame’s Columns
•
•
•
Can quickly and conveniently look at your data in many different ways, including selecting
portions of the data
Get Eva’s grades by name
Displays her column as a Series
In [7]:
grades[‘Eva’]
Out[7]:
Test1
100
Test2
87
Test3
90
Name: Eva, dtype: int64
•
If a DataFrame’s column-name strings are valid Python identifiers, you can use them as
attributes
In [8]:
grades.Sam
Out[8]:
Test1
94
Test2
77
Test3
90
Name: Sam, dtype: int64
Selecting Rows via the loc and iloc Attributes
•
•
DataFrames support indexing capabilities with [], but pandas documentation recommends
using the attributes loc, iloc, at and iat
▪ Optimized to access DataFrames and also provide additional capabilities
Access a row by its label via the DataFrame’s loc attribute
In [9]:
grades.loc[‘Test1’]
Out[9]:
Wally
87
Eva
100
Sam
94
Katie
100
Bob
83
Name: Test1, dtype: int64
•
Access rows by integer zero-based indices using the iloc attribute (the i in iloc means
that it’s used with integer indices)
In [10]:
grades.iloc[1]
Out[10]:
Wally
96
Eva
87
Sam
77
Katie
81
Bob
65
Name: Test2, dtype: int64
Selecting Rows via Slices and Lists with the loc and iloc Attributes
•
•
Index can be a slice
When using slices containing labels with loc, the range specified includes the high index
(‘Test3’):
In [11]:
grades.loc[‘Test1′:’Test3’]
Out[11]:
Wally
Eva
Sam
Katie
Bob
Test1
87
100
94
100
83
Test2
96
87
77
81
65
Test3
70
90
90
82
85
•
When using slices containing integer indices with iloc, the range you
specify excludes the high index (2):
In [12]:
grades.iloc[0:2]
Out[12]:
Wally
Eva
Sam
Katie
Bob
Test1
87
100
94
100
83
Test2
96
87
77
81
65
•
Select specific rows with a list
In [13]:
grades.loc[[‘Test1’, ‘Test3’]]
Out[13]:
Wally
Eva
Sam
Katie
Bob
Test1
87
100
94
100
83
Test3
70
90
90
82
85
In [14]:
grades.iloc[[0, 2]]
Out[14]:
Wally
Eva
Sam
Katie
Bob
Test1
87
100
94
100
83
Test3
70
90
90
82
85
Selecting Subsets of the Rows and Columns
•
View only Eva’s and Katie’s grades on Test1 and Test2
In [15]:
grades.loc[‘Test1′:’Test2’, [‘Eva’, ‘Katie’]]
Out[15]:
Eva
Katie
Test1
100
100
Test2
87
81
•
Use iloc with a list and a slice to select the first and third tests and the first three columns
for those tests
In [16]:
grades.iloc[[0, 2], 0:3]
Out[16]:
Test1
Wally
Eva
Sam
87
100
94
Test3
Wally
Eva
Sam
70
90
90
Boolean Indexing
•
•
One of pandas’ more powerful selection capabilities is Boolean indexing
Select all the A grades—that is, those that are greater than or equal to 90:
▪ Pandas checks every grade to determine whether its value is greater than or equal to
90 and, if so, includes it in the new DataFrame.
▪ Grades for which the condition is False are represented as NaN (not a number) in
the new `DataFrame
▪ NaN is pandas’ notation for missing values
In [17]:
grades[grades >= 90]
Out[17]:
Wally
Eva
Sam
Katie
Bob
Test1
NaN
100.0
94.0
100.0
NaN
Test2
96.0
NaN
NaN
NaN
NaN
Test3
NaN
90.0
90.0
NaN
NaN
•
Select all the B grades in the range 80–89
In [18]:
grades[(grades >= 80) & (grades < 90)] Out[18]: Test1 Wally Eva Sam Katie Bob 87.0 NaN NaN NaN 83.0 Wally Eva Sam Katie Bob Test2 NaN 87.0 NaN 81.0 NaN Test3 NaN NaN NaN 82.0 85.0 • • • Pandas Boolean indices combine multiple conditions with the Python operator & (bitwise AND), not the and Boolean operator For or conditions, use | (bitwise OR) NumPy also supports Boolean indexing for arrays, but always returns a one-dimensional array containing only the values that satisfy the condition Accessing a Specific DataFrame Cell by Row and Column • DataFrame method at and iat attributes get a single value from a DataFrame In [19]: grades.at['Test2', 'Eva'] Out[19]: 87 In [20]: grades.iat[2, 0] Out[20]: 70 • Can assign new values to specific elements In [21]: grades.at['Test2', 'Eva'] = 100 In [22]: grades.at['Test2', 'Eva'] Out[22]: 100 In [23]: grades.iat[1, 2] = 87 In [24]: grades.iat[1, 2] Out[24]: 87 Descriptive Statistics • • DataFrames describe method calculates basic descriptive statistics for the data and returns them as a DataFrame Statistics are calculated by column In [25]: grades.describe() Out[25]: Wally Eva Sam Katie Bob count 3.000000 3.000000 3.000000 3.000000 3.000000 mean 84.333333 96.666667 90.333333 87.666667 77.666667 std 13.203535 5.773503 3.511885 10.692677 11.015141 min 70.000000 90.000000 87.000000 81.000000 65.000000 25% 78.500000 95.000000 88.500000 81.500000 74.000000 50% 87.000000 100.000000 90.000000 82.000000 83.000000 75% 91.500000 100.000000 92.000000 91.000000 84.000000 max 96.000000 100.000000 94.000000 100.000000 85.000000 • • • Quick way to summarize your data Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call Can control the precision and other default settings with pandas’ set_option function In [26]: pd.set_option('precision', 2) In [27]: grades.describe() Out[27]: Wally Eva Sam Katie Bob count 3.00 3.00 3.00 3.00 3.00 mean 84.33 96.67 90.33 87.67 77.67 std 13.20 5.77 3.51 10.69 11.02 min 70.00 90.00 87.00 81.00 65.00 25% 78.50 95.00 88.50 81.50 74.00 50% 87.00 100.00 90.00 82.00 83.00 75% 91.50 100.00 92.00 91.00 84.00 max 96.00 100.00 94.00 100.00 85.00 • • For student grades, the most important of these statistics is probably the mean Can calculate that for each student simply by calling mean on the DataFrame In [28]: grades.mean() Out[28]: Wally Eva Sam Katie 84.33 96.67 90.33 87.67 Bob 77.67 dtype: float64 Transposing the DataFrame with the T Attribute • Can quickly transpose rows and columns—so the rows become the columns, and the columns become the rows—by using the T attribute to get a view In [29]: grades.T Out[29]: Test1 Test2 Test3 Wally 87 96 70 Eva 100 100 90 Sam 94 87 90 Katie 100 81 82 Bob 83 65 85 • • Assume that rather than getting the summary statistics by student, you want to get them by test Call describe on grades.T In [30]: grades.T.describe() Out[30]: count Test1 Test2 Test3 5.00 5.00 5.00 Test1 Test2 Test3 mean 92.80 85.80 83.40 std 7.66 13.81 8.23 min 83.00 65.00 70.00 25% 87.00 81.00 82.00 50% 94.00 87.00 85.00 75% 100.00 96.00 90.00 max 100.00 100.00 90.00 • Get average of all the students’ grades on each test In [31]: grades.T.mean() Out[31]: Test1 92.8 Test2 85.8 Test3 83.4 dtype: float64 Sorting by Rows by Their Indices • • Can sort a DataFrame by its rows or columns, based on their indices or values Sort the rows by their indices in descending order using sort_index and its keyword argument ascending=False In [32]: grades.sort_index(ascending=False) Out[32]: Wally Eva Sam Katie Bob Test3 70 90 90 82 85 Test2 96 100 87 81 65 Test1 87 100 94 100 83 Sorting by Column Indices • • Sort columns into ascending order (left-to-right) by their column names axis=1 keyword argument indicates that we wish to sort the column indices, rather than the row indices ▪ axis=0 (the default) sorts the row indices In [33]: grades.sort_index(axis=1) Out[33]: Bob Eva Katie Sam Wally Test1 83 100 100 94 87 Test2 65 100 81 87 96 Test3 85 90 82 90 70 Sorting by Column Values • • To view Test1’s grades in descending order so we can see the students’ names in highestto-lowest grade order, call method sort_values by and axis arguments work together to determine which values will be sorted ▪ In this case, we sort based on the column values (axis=1) for Test1 In [34]: grades.sort_values(by='Test1', axis=1, ascending=False) Out[34]: Eva Katie Sam Wally Bob Test1 100 100 94 87 83 Test2 100 81 87 96 65 Test3 90 82 90 70 85 • • Might be easier to read the grades and names if they were in a column Sort the transposed DataFrame instead In [35]: grades.T.sort_values(by='Test1', ascending=False) Out[35]: Test1 Test2 Test3 Eva 100 100 90 Katie 100 81 82 Sam 94 87 90 Wally 87 96 70 Bob 83 65 85 • • Since we’re sorting only Test1’s grades, we might not want to see the other tests at all Combine selection with sorting In [36]: grades.loc['Test1'].sort_values(ascending=False) Out[36]: Katie 100 Eva 100 Sam 94 Wally 87 Bob 83 Name: Test1, dtype: int64 Copy vs. In-Place Sorting • • • sort_index and sort_values return a copy of the original DataFrame Could require substantial memory in a big data application Can sort in place by passing the keyword argument inplace=True ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

Ryerson University Creation and Numpy Arrays Questions

Top-quality papers guaranteed

100% original papers

Confidential service

Money-back guarantee

Enjoy the free features we offer to everyone

Calculate how much your essay costs

How to place an order

What we are popular for

Ask experts to write you a cheap essay of excellent quality