Ryerson University Creation and Numpy Arrays Questions

From your reading of Chapter 7 of our textbook array-oriented programming with NumPy, discuss with the class (with original examples).

  • arrays creation and operations with Numpy.
  • Numpy calculation methods and functions.
  • arrays slicing and reshaping with Numpy.
  • 7. Array-Oriented Programming with NumPy
    Objectives
    In this chapter, you’ll:





    Learn what arrays are and how they differ from lists.
    Use the numpy module’s highperformance ndarrays.
    Compare list and ndarray performance with the IPython %timeit magic.
    Use ndarrays to store and retrieve data efficiently.
    Create and initialize ndarrays.





    Refer to individual ndarray elements.
    Iterate through ndarrays.
    Create and manipulate multidimensional ndarrays.
    Perform common ndarray manipulations.
    Create and manipulate pandas one-dimensional Series and two-dimensional DataFrames.



    Customize Series and DataFrame indices.
    Calculate basic descriptive statistics for data in a Series and a DataFrame.
    Customize floating-point number precision in pandas output formatting.
    7.1 Introduction
    NumPy (Numerical Python) Library






    First appeared in 2006 and is the preferred Python array implementation.
    High-performance, richly functional n-dimensional array type called ndarray.
    Written in C and up to 100 times faster than lists.
    Critical in big-data processing, AI applications and much more.
    According to libraries.io, over 450 Python libraries depend on NumPy.
    Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras
    (for deep learning) are built on or depend on NumPy.
    Array-Oriented Programming

    Functional-style programming with internal iteration makes array-oriented manipulations
    concise and straightforward, and reduces the possibility of error.
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.2 Creating arrays from Existing Data



    Creating an array with the array function
    Argument is an array or other iterable
    Returns a new array containing the argument’s elements
    In [1]:
    import numpy as np
    In [2]:
    numbers = np.array([2, 3, 5, 7, 11])
    In [3]:
    type(numbers)
    Out[3]:
    numpy.ndarray
    In [4]:
    numbers
    Out[4]:
    array([ 2,
    3,
    5,
    7, 11])
    Multidimensional Arguments
    In [5]:
    np.array([[1, 2, 3], [4, 5, 6]])
    Out[5]:
    array([[1, 2, 3],
    [4, 5, 6]])
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.3 array Attributes

    attributes enable you to discover information about its structure and contents
    In [1]:
    import numpy as np
    In [2]:
    integers = np.array([[1, 2, 3], [4, 5, 6]])
    In [3]:
    integers
    Out[3]:
    array([[1, 2, 3],
    [4, 5, 6]])
    In [4]:
    floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
    In [5]:
    floats
    Out[5]:
    array([0. , 0.1, 0.2, 0.3, 0.4])

    NumPy does not display trailing 0s
    Determining an array’s Element Type
    In [6]:
    integers.dtype
    Out[6]:
    dtype(‘int64’)
    In [7]:
    floats.dtype
    Out[7]:
    dtype(‘float64′)


    For performance reasons, NumPy is written in the C programming language and uses C’s
    data types
    Other NumPy types
    Determining an array’s Dimensions


    ndim contains an array’s number of dimensions
    shape contains a tuple specifying an array’s dimensions
    In [8]:
    integers.ndim
    Out[8]:
    2
    In [9]:
    floats.ndim
    Out[9]:
    1
    In [10]:
    integers.shape
    Out[10]:
    (2, 3)
    In [11]:
    floats.shape
    Out[11]:
    (5,)
    Determining an array’s Number of Elements and Element Size


    view an array’s total number of elements with size
    view number of bytes required to store each element with itemsize
    In [12]:
    integers.size
    Out[12]:
    6
    In [13]:
    integers.itemsize
    Out[13]:
    8
    In [14]:
    floats.size
    Out[14]:
    5
    In [15]:
    floats.itemsize
    Out[15]:
    8
    Iterating through a Multidimensional array’s Elements
    In [16]:
    for row in integers:
    for column in row:
    print(column, end=’
    print()
    1
    4
    ‘)
    2
    5
    3
    6

    Iterate through a multidimensional array as if it were one-dimensional by using flat
    In [17]:
    for i in integers.flat:
    print(i, end=’ ‘)
    1
    2
    3
    4
    5
    6
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.4 Filling arrays with Specific Values

    Functions zeros, ones and full create arrays containing 0s, 1s or a specified value,
    respectively
    In [1]:
    import numpy as np
    In [2]:
    np.zeros(5)
    Out[2]:
    array([0., 0., 0., 0., 0.])

    For a tuple of integers, these functions return a multidimensional array with the specified
    dimensions
    In [3]:
    np.ones((2, 4), dtype=int)
    Out[3]:
    array([[1, 1, 1, 1],
    [1, 1, 1, 1]])
    In [4]:
    np.full((3, 5), 13)
    Out[4]:
    array([[13, 13, 13, 13, 13],
    [13, 13, 13, 13, 13],
    [13, 13, 13, 13, 13]])
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.5 Creating arrays from Ranges

    NumPy provides optimized functions for creating arrays from ranges
    Creating Integer Ranges with arange
    In [1]:
    import numpy as np
    In [2]:
    np.arange(5)
    Out[2]:
    array([0, 1, 2, 3, 4])
    In [3]:
    np.arange(5, 10)
    Out[3]:
    array([5, 6, 7, 8, 9])
    In [4]:
    np.arange(10, 1, -2)
    Out[4]:
    array([10,
    8,
    6,
    4,
    2])
    Creating Floating-Point Ranges with linspace


    Produce evenly spaced floating-point ranges with NumPy’s linspace function
    Ending value is included in the array
    In [5]:
    np.linspace(0.0, 1.0, num=5)
    Out[5]:
    array([0.
    , 0.25, 0.5 , 0.75, 1.
    ])
    Reshaping an array


    array method reshape transforms an array into different number of dimensions
    New shape must have the same number of elements as the original
    In [6]:
    np.arange(1, 21).reshape(4, 5)
    Out[6]:
    array([[ 1, 2, 3, 4, 5],
    [ 6, 7, 8, 9, 10],
    [11, 12, 13, 14, 15],
    [16, 17, 18, 19, 20]])
    Displaying Large arrays

    When displaying an array, if there are 1000 items or more, NumPy drops the middle rows,
    columns or both from the output
    In [7]:
    np.arange(1, 100001).reshape(4, 25000)
    Out[7]:
    array([[
    1,
    [ 25001,
    [ 50001,
    [ 75001,
    2,
    25002,
    50002,
    75002,
    3,
    25003,
    50003,
    75003,
    …,
    …,
    …,
    …,
    24998,
    49998,
    74998,
    99998,
    24999, 25000],
    49999, 50000],
    74999, 75000],
    99999, 100000]])
    In [8]:
    np.arange(1, 100001).reshape(100, 1000)
    Out[8]:
    array([[
    1,
    [ 1001,
    [ 2001,
    …,
    [ 97001,
    [ 98001,
    [ 99001,
    2,
    1002,
    2002,
    3, …,
    1003, …,
    2003, …,
    998,
    1998,
    2998,
    97002,
    98002,
    99002,
    97003, …,
    98003, …,
    99003, …,
    97998,
    98998,
    99998,
    999,
    1999,
    2999,
    1000],
    2000],
    3000],
    97999, 98000],
    98999, 99000],
    99999, 100000]])
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.6 List vs. array Performance:
    Introducing %timeit


    Most array operations execute significantly faster than corresponding list operations
    IPython %timeit magic command times the average duration of operations
    Timing the Creation of a List Containing Results of 6,000,000 Die Rolls
    In [1]:
    import random
    In [2]:
    %timeit rolls_list = \
    [random.randrange(1, 7) for i in range(0, 6_000_000)]
    6.88 s ± 276 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)



    By default, %timeit executes a statement in a loop, and it runs the loop seven times
    If you do not indicate the number of loops, %timeit chooses an appropriate value
    After executing the statement, %timeit displays the statement’s average execution time, as
    well as the standard deviation of all the executions
    Timing the Creation of an array Containing Results of 6,000,000 Die Rolls
    In [3]:
    import numpy as np
    In [4]:
    %timeit rolls_array = np.random.randint(1, 7, 6_000_000)
    75.2 ms ± 2.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    60,000,000 and 600,000,000 Die Rolls
    In [5]:
    %timeit rolls_array = np.random.randint(1, 7, 60_000_000)
    916 ms ± 26.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    In [6]:
    %timeit rolls_array = np.random.randint(1, 7, 600_000_000)
    10.3 s ± 180 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    Customizing the %timeit Iterations
    In [7]:
    %timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000)
    74.5 ms ± 7.58 ms per loop (mean ± std. dev. of 2 runs, 3 loops each)
    Other IPython Magics
    IPython provides dozens of magics for a variety of tasks—for a complete list, see the IPython magics
    documentation. Here are a few helpful ones:







    %load to read code into IPython from a local file or URL.
    %save to save snippets to a file.
    %run to execute a .py file from IPython.
    %precision to change the default floating-point precision for IPython outputs.
    %cd to change directories without having to exit IPython first.
    %edit to launch an external editor—handy if you need to modify more complex snippets.
    %history to view a list of all snippets and commands you’ve executed in the current
    IPython session.
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.7 array Operators


    array operators perform operations on entire arrays.
    Can perform arithmetic between arrays and scalar numeric values,
    and between arrays of the same shape.
    In [1]:
    import numpy as np
    In [2]:
    numbers = np.arange(1, 6)
    In [3]:
    numbers
    Out[3]:
    array([1, 2, 3, 4, 5])
    In [4]:
    numbers * 2
    Out[4]:
    array([ 2,
    4,
    6,
    8, 10])
    In [5]:
    numbers ** 3
    Out[5]:
    array([
    1,
    8,
    27,
    64, 125])
    numbers
    # numbers is unchanged by the arithmetic operators
    In [6]:
    Out[6]:
    array([1, 2, 3, 4, 5])
    In [7]:
    numbers += 10
    In [8]:
    numbers
    Out[8]:
    array([11, 12, 13, 14, 15])
    Broadcasting




    Arithmetic operations require as operands two arrays of the same size and shape.
    numbers * 2 is equivalent to numbers * [2, 2, 2, 2, 2] for a 5-element array.
    Applying the operation to every element is called broadcasting.
    Also can be applied between arrays of different sizes and shapes, enabling some concise
    and powerful manipulations.
    Arithmetic Operations Between arrays

    Can perform arithmetic operations and augmented assignments between arrays of
    the same shape
    In [9]:
    numbers2 = np.linspace(1.1, 5.5, 5)
    In [10]:
    numbers2
    Out[10]:
    array([1.1, 2.2, 3.3, 4.4, 5.5])
    In [11]:
    numbers * numbers2
    Out[11]:
    array([12.1, 26.4, 42.9, 61.6, 82.5])
    Comparing arrays



    Can compare arrays with individual values and with other arrays
    Comparisons performed element-wise
    Produce arrays of Boolean values in which each element’s True or False value indicates
    the comparison result
    In [12]:
    numbers
    Out[12]:
    array([11, 12, 13, 14, 15])
    In [13]:
    numbers >= 13
    Out[13]:
    array([False, False,
    True,
    True,
    True])
    In [14]:
    numbers2
    Out[14]:
    array([1.1, 2.2, 3.3, 4.4, 5.5])
    In [15]:
    numbers2 < numbers Out[15]: array([ True, True, True, True, True]) In [16]: numbers == numbers2 Out[16]: array([False, False, False, False, False]) In [17]: numbers == numbers Out[17]: array([ True, True, True, True, True]) ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.8 NumPy Calculation Methods • • These methods ignore the array’s shape and use all the elements in the calculations. Consider an array representing four students’ grades on three exams: In [1]: import numpy as np In [2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In [3]: grades Out[3]: array([[ 87, [100, [ 94, [100, • • 96, 87, 77, 81, 70], 90], 90], 82]]) Can use methods to calculate sum, min, max, mean, std (standard deviation) and var (variance) Each is a functional-style programming reduction In [4]: grades.sum() Out[4]: 1054 In [5]: grades.min() Out[5]: 70 In [6]: grades.max() Out[6]: 100 In [7]: grades.mean() Out[7]: 87.83333333333333 In [8]: grades.std() Out[8]: 8.792357792739987 In [9]: grades.var() Out[9]: 77.30555555555556 Calculations by Row or Column • • • You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions) Each 2D+ array has one axis per dimension In a 2D array, axis=0 indicates calculations should be column-by-column In [10]: grades.mean(axis=0) Out[10]: array([95.25, 85.25, 83. • ]) In a 2D array, axis=1 indicates calculations should be row-by-row In [11]: grades.mean(axis=1) Out[11]: array([84.33333333, 92.33333333, 87. • , 87.66666667]) Other Numpy array Calculation Methods ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.9 Universal Functions • • • • Standalone universal functions (ufuncs) perform element-wise operations using one or two array or array-like arguments (like lists) Each returns a new array containing the results Some ufuncs are called when you use array operators like + and * Create an array and calculate the square root of its values, using the sqrt universal function In [1]: import numpy as np In [2]: numbers = np.array([1, 4, 9, 16, 25, 36]) In [3]: np.sqrt(numbers) Out[3]: array([1., 2., 3., 4., 5., 6.]) • • Add two arrays with the same shape, using the add universal function Equivalent to: • numbers + numbers2 In [4]: numbers2 = np.arange(1, 7) * 10 In [5]: numbers2 Out[5]: array([10, 20, 30, 40, 50, 60]) In [6]: np.add(numbers, numbers2) Out[6]: array([11, 24, 39, 56, 75, 96]) Broadcasting with Universal Functions • Universal functions can use broadcasting, just like NumPy array operators In [7]: np.multiply(numbers2, 5) Out[7]: array([ 50, 100, 150, 200, 250, 300]) In [8]: numbers3 = numbers2.reshape(2, 3) In [9]: numbers3 Out[9]: array([[10, 20, 30], [40, 50, 60]]) In [10]: numbers4 = np.array([2, 4, 6]) In [11]: np.multiply(numbers3, numbers4) Out[11]: array([[ 20, 80, 180], [ 80, 200, 360]]) • Broadcasting rules documentation Other Universal Functions NumPy universal functions Math — add, subtract, multiply, divide, remainder, exp, log, sqrt, power, and more. Trigonometry —sin, cos, tan, hypot, arcsin, arccos, arctan, and more. Bit manipulation —bitwise_and, bitwise_or, bitwise_xor, invert, left_shift and right_shift. Comparison — greater, greater_equal, less, less_equal, equal, not_equal, logical_and, logical_or, logical_ xor, logical_not, minimum, maximum, and more. Floating point —floor, ceil, isinf, isnan, fabs, trunc, and more. ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.10 Indexing and Slicing • One-dimensional arrays can be indexed and sliced like lists. Indexing with Two-Dimensional arrays • To select an element in a two-dimensional array, specify a tuple containing the element’s row and column indices in square brackets In [1]: import numpy as np In [2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In [3]: grades Out[3]: array([[ 87, [100, [ 94, [100, 96, 87, 77, 81, 70], 90], 90], 82]]) In [4]: grades[0, 1] # row 0, column 1 Out[4]: 96 Selecting a Subset of a Two-Dimensional array’s Rows • To select a single row, specify only one index in square brackets In [5]: grades[1] Out[5]: array([100, • 87, 90]) Select multiple sequential rows with slice notation In [6]: grades[0:2] Out[6]: array([[ 87, [100, 96, 87, 70], 90]]) • Select multiple non-sequential rows with a list of row indices In [7]: grades[[1, 3]] Out[7]: array([[100, [100, 87, 81, 90], 82]]) Selecting a Subset of a Two-Dimensional array’s Columns • The column index also can be a specific index, a slice or a list In [8]: grades[:, 0] Out[8]: array([ 87, 100, 94, 100]) In [9]: grades[:, 1:3] Out[9]: array([[96, [87, [77, [81, 70], 90], 90], 82]]) In [10]: grades[:, [0, 2]] Out[10]: array([[ 87, [100, [ 94, [100, 70], 90], 90], 82]]) ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.11 Views: Shallow Copies • • Views “see” the data in other objects, rather than having their own copies of the data Views are shallow copies *array method view returns a new array object with a view of the original array object’s data In [1]: import numpy as np In [2]: numbers = np.arange(1, 6) In [3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In [4]: numbers2 = numbers.view() In [5]: numbers2 Out[5]: array([1, 2, 3, 4, 5]) • Use built-in id function to see that numbers and numbers2 are different objects In [6]: id(numbers) Out[6]: 4431803056 In [7]: id(numbers2) Out[7]: 4430398928 • Modifying an element in the original array, also modifies the view and vice versa In [8]: numbers[1] *= 10 In [9]: numbers2 Out[9]: array([ 1, 20, 3, 4, 5]) In [10]: numbers Out[10]: array([ 1, 20, 3, 4, 5]) In [11]: numbers2[1] /= 10 In [12]: numbers Out[12]: array([1, 2, 3, 4, 5]) In [13]: numbers2 Out[13]: array([1, 2, 3, 4, 5]) Slice Views • Slices also create views In [14]: numbers2 = numbers[0:3] In [15]: numbers2 Out[15]: array([1, 2, 3]) In [16]: id(numbers) Out[16]: 4431803056 In [17]: id(numbers2) Out[17]: 4451350368 • Confirm that numbers2 is a view of only first three numbers elements In [18]: numbers2[3] -----------------------------------------------------------------------IndexError Traceback (most recent call last) in ----> 1 numbers2[3]
    IndexError: index 3 is out of bounds for axis 0 with size 3

    Modify an element both arrays share to show both are updated
    In [19]:
    numbers[1] *= 20
    In [20]:
    numbers
    Out[20]:
    array([ 1, 40,
    3,
    4,
    5])
    In [21]:
    numbers2
    Out[21]:
    array([ 1, 40,
    3])
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.12 Deep Copies



    When sharing mutable values, sometimes it’s necessary to create a deep copy of the
    original data
    Especially important in multi-core programming, where separate parts of your program could
    attempt to modify your data at the same time, possibly corrupting it
    array method copy returns a new array object with an independent copy of the original
    array’s data
    In [1]:
    import numpy as np
    In [2]:
    numbers = np.arange(1, 6)
    In [3]:
    numbers
    Out[3]:
    array([1, 2, 3, 4, 5])
    In [4]:
    numbers2 = numbers.copy()
    In [5]:
    numbers2
    Out[5]:
    array([1, 2, 3, 4, 5])
    In [6]:
    numbers[1] *= 10
    In [7]:
    numbers
    Out[7]:
    array([ 1, 20,
    3,
    4,
    5])
    In [8]:
    numbers2
    Out[8]:
    array([1, 2, 3, 4, 5])
    Module copy—Shallow vs. Deep Copies for Other Types of Python Objects
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.13 Reshaping and Transposing
    reshape vs. resize


    Method reshape returns a view (shallow copy) of the original array with new dimensions
    Does not modify the original array
    In [1]:
    import numpy as np
    In [2]:
    grades = np.array([[87, 96, 70], [100, 87, 90]])
    In [3]:
    grades
    Out[3]:
    array([[ 87,
    [100,
    96,
    87,
    70],
    90]])
    In [4]:
    grades.reshape(1, 6)
    Out[4]:
    array([[ 87,
    96,
    70, 100,
    87,
    90]])
    In [5]:
    grades
    Out[5]:
    array([[ 87,
    [100,

    96,
    87,
    70],
    90]])
    Method resize modifies the original array’s shape
    In [6]:
    grades.resize(1, 6)
    In [7]:
    grades
    Out[7]:
    array([[ 87,
    96,
    70, 100,
    87,
    90]])
    flatten vs. ravel


    Can flatten a multi-dimensonal array into a single dimension with
    methods flatten and ravel
    flatten deep copies the original array’s data
    In [8]:
    grades = np.array([[87, 96, 70], [100, 87, 90]])
    In [9]:
    grades
    Out[9]:
    array([[ 87,
    [100,
    96,
    87,
    70],
    90]])
    In [10]:
    flattened = grades.flatten()
    In [11]:
    flattened
    Out[11]:
    array([ 87,
    96,
    70, 100,
    87,
    90])
    In [12]:
    grades
    Out[12]:
    array([[ 87,
    [100,
    96,
    87,
    70],
    90]])
    In [13]:
    flattened[0] = 100
    In [14]:
    flattened
    Out[14]:
    array([100,
    96,
    70, 100,
    87,
    90])
    In [15]:
    grades
    Out[15]:
    array([[ 87,
    [100,

    96,
    87,
    70],
    90]])
    Method ravel produces a view of the original array, which shares the grades array’s
    data
    In [16]:
    raveled = grades.ravel()
    In [17]:
    raveled
    Out[17]:
    array([ 87,
    96,
    70, 100,
    87,
    90])
    In [18]:
    grades
    Out[18]:
    array([[ 87,
    [100,
    96,
    87,
    70],
    90]])
    In [19]:
    raveled[0] = 100
    In [20]:
    raveled
    Out[20]:
    array([100,
    96,
    70, 100,
    87,
    90])
    In [21]:
    grades
    Out[21]:
    array([[100,
    [100,
    96,
    87,
    70],
    90]])
    Transposing Rows and Columns


    Can quickly transpose an array’s rows and columns
    ▪ “flips” the array, so the rows become the columns and the columns become the
    rows
    T attribute returns a transposed view (shallow copy) of the array
    In [22]:
    grades.T
    Out[22]:
    array([[100, 100],
    [ 96, 87],
    [ 70, 90]])
    In [23]:
    grades
    Out[23]:
    array([[100,
    [100,
    96,
    87,
    70],
    90]])
    Horizontal and Vertical Stacking

    Can combine arrays by adding more columns or more rows—known as horizontal
    stacking and vertical stacking
    In [24]:
    grades2 = np.array([[94, 77, 90], [100, 81, 82]])


    Combine grades and grades2 with NumPy’s hstack (horizontal stack) function by
    passing a tuple containing the arrays to combine
    The extra parentheses are required because hstack expects one argument

    Adds more columns
    In [25]:
    np.hstack((grades, grades2))
    Out[25]:
    array([[100,
    [100,


    96,
    87,
    70, 94,
    90, 100,
    77,
    81,
    90],
    82]])
    Combine grades and grades2 with NumPy’s vstack (vertical stack) function
    Adds more rows
    In [26]:
    np.vstack((grades, grades2))
    Out[26]:
    array([[100,
    [100,
    [ 94,
    [100,
    96,
    87,
    77,
    81,
    70],
    90],
    90],
    82]])
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.14.1 pandas Series



    An enhanced one-dimensional array
    Supports custom indexing, including even non-integer indices like strings
    Offers additional capabilities that make them more convenient for many data-science
    oriented tasks
    ▪ Series may have missing data
    ▪ Many Series operations ignore missing data by default
    Creating a Series with Default Indices

    By default, a Series has integer indices numbered sequentially from 0
    In [1]:
    import pandas as pd
    In [2]:
    grades = pd.Series([87, 100, 94])
    Creating a Series with All Elements Having the Same Value


    Second argument is a one-dimensional iterable object (such as a list, an array or a range)
    containing the Series’ indices
    Number of indices determines the number of elements
    In [149]:
    pd.Series(98.6, range(3))
    Out[149]:
    0
    98.6
    1
    98.6
    2
    98.6
    dtype: float64
    Accessing a Series’ Elements
    In [150]:
    grades[0]
    Out[150]:
    87
    Producing Descriptive Statistics for a Series


    Series provides many methods for common tasks including producing various descriptive
    statistics
    Each of these is a functional-style reduction
    In [151]:
    grades.count()
    Out[151]:
    3
    In [152]:
    grades.mean()
    Out[152]:
    93.66666666666667
    In [153]:
    grades.min()
    Out[153]:
    87
    In [154]:
    grades.max()
    Out[154]:
    100
    In [155]:
    grades.std()
    Out[155]:
    6.506407098647712



    Series method describe produces all these stats and more
    The 25%, 50% and 75% are quartiles:
    ▪ 50% represents the median of the sorted values.
    ▪ 25% represents the median of the first half of the sorted values.
    ▪ 75% represents the median of the second half of the sorted values.
    For the quartiles, if there are two middle elements, then their average is that quartile’s
    median
    In [156]:
    grades.describe()
    Out[156]:
    count
    mean
    std
    min
    25%
    50%
    75%
    3.000000
    93.666667
    6.506407
    87.000000
    90.500000
    94.000000
    97.000000
    max
    100.000000
    dtype: float64
    Creating a Series with Custom Indices
    Can specify custom indices with the index keyword argument
    In [157]:
    grades = pd.Series([87, 100, 94], index=[‘Wally’, ‘Eva’, ‘Sam’])
    In [158]:
    grades
    Out[158]:
    Wally
    87
    Eva
    100
    Sam
    94
    dtype: int64
    Dictionary Initializers

    If you initialize a Series with a dictionary, its keys are the indices, and its values become
    the Series’ element values
    In [159]:
    grades = pd.Series({‘Wally’: 87, ‘Eva’: 100, ‘Sam’: 94})
    In [160]:
    grades
    Out[160]:
    Wally
    87
    Eva
    100
    Sam
    94
    dtype: int64
    Accessing Elements of a Series Via Custom Indices

    Can access individual elements via square brackets containing a custom index value
    In [161]:
    grades[‘Eva’]
    Out[161]:
    100

    If custom indices are strings that could represent valid Python identifiers, pandas
    automatically adds them to the Series as attributes
    In [162]:
    grades.Wally
    Out[162]:
    87

    dtype attribute returns the underlying array’s element type
    In [163]:
    grades.dtype
    Out[163]:
    dtype(‘int64’)

    values attribute returns the underlying array
    In [164]:
    grades.values
    Out[164]:
    array([ 87, 100,
    94])
    Creating a Series of Strings

    In a Series of strings, you can use str attribute to call string methods on the elements
    In [165]:
    hardware = pd.Series([‘Hammer’, ‘Saw’, ‘Wrench’])
    In [166]:
    hardware
    Out[166]:
    0
    Hammer
    1
    Saw
    2
    Wrench
    dtype: object



    Call string method contains on each element
    Returns a Series containing bool values indicating the contains method’s result for each
    element
    The str attribute provides many string-processing methods that are similar to those in
    Python’s string type
    ▪ https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling
    In [167]:
    hardware.str.contains(‘a’)
    Out[167]:
    0
    True
    1
    True
    2
    False
    dtype: bool

    Use string method upper to produce a new Series containing the uppercase versions of
    each element in hardware
    In [168]:
    hardware.str.upper()
    Out[168]:
    0
    HAMMER
    1
    SAW
    2
    WRENCH
    dtype: object
    ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of
    the book Intro to Python for Computer Science and Data Science: Learning to Program with
    AI, Big Data and the Cloud.
    DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the
    book. These efforts include the development, research, and testing of the theories and programs to
    determine their effectiveness. The authors and publisher make no warranty of any kind, expressed
    or implied, with regard to these programs or to the documentation contained in these books. The
    authors and publisher shall not be liable in any event for incidental or consequential damages in
    connection with, or arising out of, the furnishing, performance, or use of these programs.
    7.14.2 DataFrames



    Enhanced two-dimensional array
    Can have custom row and column indices
    Offers additional operations and capabilities that make them more convenient for many datascience oriented tasks
    Support missing data
    Each column in a DataFrame is a Series


    Creating a DataFrame from a Dictionary

    Create a DataFrame from a dictionary that represents student grades on three exams
    In [1]:
    import pandas as pd
    In [2]:
    grades_dict = {‘Wally’: [87, 96, 70], ‘Eva’: [100, 87, 90],
    ‘Sam’: [94, 77, 90], ‘Katie’: [100, 81, 82],
    ‘Bob’: [83, 65, 85]}
    In [3]:
    grades = pd.DataFrame(grades_dict)

    Pandas displays DataFrames in tabular format with indices left aligned in the index column
    and the remaining columns’ values right aligned
    In [4]:
    grades
    Out[4]:
    Wally
    Eva
    Sam
    Katie
    Bob
    0
    87
    100
    94
    100
    83
    1
    96
    87
    77
    81
    65
    2
    70
    90
    90
    82
    85
    Customizing a DataFrame’s Indices with the index Attribute


    Can use the index attribute to change the DataFrame’s indices from sequential integers to
    labels
    Must provide a one-dimensional collection that has the same number of elements as there
    are rows in the DataFrame
    In [5]:
    grades.index = [‘Test1’, ‘Test2’, ‘Test3’]
    In [6]:
    grades
    Out[6]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    87
    100
    94
    100
    83
    Test2
    96
    87
    77
    81
    65
    Test3
    70
    90
    90
    82
    85
    Accessing a DataFrame’s Columns



    Can quickly and conveniently look at your data in many different ways, including selecting
    portions of the data
    Get Eva’s grades by name
    Displays her column as a Series
    In [7]:
    grades[‘Eva’]
    Out[7]:
    Test1
    100
    Test2
    87
    Test3
    90
    Name: Eva, dtype: int64

    If a DataFrame’s column-name strings are valid Python identifiers, you can use them as
    attributes
    In [8]:
    grades.Sam
    Out[8]:
    Test1
    94
    Test2
    77
    Test3
    90
    Name: Sam, dtype: int64
    Selecting Rows via the loc and iloc Attributes


    DataFrames support indexing capabilities with [], but pandas documentation recommends
    using the attributes loc, iloc, at and iat
    ▪ Optimized to access DataFrames and also provide additional capabilities
    Access a row by its label via the DataFrame’s loc attribute
    In [9]:
    grades.loc[‘Test1’]
    Out[9]:
    Wally
    87
    Eva
    100
    Sam
    94
    Katie
    100
    Bob
    83
    Name: Test1, dtype: int64

    Access rows by integer zero-based indices using the iloc attribute (the i in iloc means
    that it’s used with integer indices)
    In [10]:
    grades.iloc[1]
    Out[10]:
    Wally
    96
    Eva
    87
    Sam
    77
    Katie
    81
    Bob
    65
    Name: Test2, dtype: int64
    Selecting Rows via Slices and Lists with the loc and iloc Attributes


    Index can be a slice
    When using slices containing labels with loc, the range specified includes the high index
    (‘Test3’):
    In [11]:
    grades.loc[‘Test1′:’Test3’]
    Out[11]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    87
    100
    94
    100
    83
    Test2
    96
    87
    77
    81
    65
    Test3
    70
    90
    90
    82
    85

    When using slices containing integer indices with iloc, the range you
    specify excludes the high index (2):
    In [12]:
    grades.iloc[0:2]
    Out[12]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    87
    100
    94
    100
    83
    Test2
    96
    87
    77
    81
    65

    Select specific rows with a list
    In [13]:
    grades.loc[[‘Test1’, ‘Test3’]]
    Out[13]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    87
    100
    94
    100
    83
    Test3
    70
    90
    90
    82
    85
    In [14]:
    grades.iloc[[0, 2]]
    Out[14]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    87
    100
    94
    100
    83
    Test3
    70
    90
    90
    82
    85
    Selecting Subsets of the Rows and Columns

    View only Eva’s and Katie’s grades on Test1 and Test2
    In [15]:
    grades.loc[‘Test1′:’Test2’, [‘Eva’, ‘Katie’]]
    Out[15]:
    Eva
    Katie
    Test1
    100
    100
    Test2
    87
    81

    Use iloc with a list and a slice to select the first and third tests and the first three columns
    for those tests
    In [16]:
    grades.iloc[[0, 2], 0:3]
    Out[16]:
    Test1
    Wally
    Eva
    Sam
    87
    100
    94
    Test3
    Wally
    Eva
    Sam
    70
    90
    90
    Boolean Indexing


    One of pandas’ more powerful selection capabilities is Boolean indexing
    Select all the A grades—that is, those that are greater than or equal to 90:
    ▪ Pandas checks every grade to determine whether its value is greater than or equal to
    90 and, if so, includes it in the new DataFrame.
    ▪ Grades for which the condition is False are represented as NaN (not a number) in
    the new `DataFrame
    ▪ NaN is pandas’ notation for missing values
    In [17]:
    grades[grades >= 90]
    Out[17]:
    Wally
    Eva
    Sam
    Katie
    Bob
    Test1
    NaN
    100.0
    94.0
    100.0
    NaN
    Test2
    96.0
    NaN
    NaN
    NaN
    NaN
    Test3
    NaN
    90.0
    90.0
    NaN
    NaN

    Select all the B grades in the range 80–89
    In [18]:
    grades[(grades >= 80) & (grades < 90)] Out[18]: Test1 Wally Eva Sam Katie Bob 87.0 NaN NaN NaN 83.0 Wally Eva Sam Katie Bob Test2 NaN 87.0 NaN 81.0 NaN Test3 NaN NaN NaN 82.0 85.0 • • • Pandas Boolean indices combine multiple conditions with the Python operator & (bitwise AND), not the and Boolean operator For or conditions, use | (bitwise OR) NumPy also supports Boolean indexing for arrays, but always returns a one-dimensional array containing only the values that satisfy the condition Accessing a Specific DataFrame Cell by Row and Column • DataFrame method at and iat attributes get a single value from a DataFrame In [19]: grades.at['Test2', 'Eva'] Out[19]: 87 In [20]: grades.iat[2, 0] Out[20]: 70 • Can assign new values to specific elements In [21]: grades.at['Test2', 'Eva'] = 100 In [22]: grades.at['Test2', 'Eva'] Out[22]: 100 In [23]: grades.iat[1, 2] = 87 In [24]: grades.iat[1, 2] Out[24]: 87 Descriptive Statistics • • DataFrames describe method calculates basic descriptive statistics for the data and returns them as a DataFrame Statistics are calculated by column In [25]: grades.describe() Out[25]: Wally Eva Sam Katie Bob count 3.000000 3.000000 3.000000 3.000000 3.000000 mean 84.333333 96.666667 90.333333 87.666667 77.666667 std 13.203535 5.773503 3.511885 10.692677 11.015141 min 70.000000 90.000000 87.000000 81.000000 65.000000 25% 78.500000 95.000000 88.500000 81.500000 74.000000 50% 87.000000 100.000000 90.000000 82.000000 83.000000 75% 91.500000 100.000000 92.000000 91.000000 84.000000 max 96.000000 100.000000 94.000000 100.000000 85.000000 • • • Quick way to summarize your data Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call Can control the precision and other default settings with pandas’ set_option function In [26]: pd.set_option('precision', 2) In [27]: grades.describe() Out[27]: Wally Eva Sam Katie Bob count 3.00 3.00 3.00 3.00 3.00 mean 84.33 96.67 90.33 87.67 77.67 std 13.20 5.77 3.51 10.69 11.02 min 70.00 90.00 87.00 81.00 65.00 25% 78.50 95.00 88.50 81.50 74.00 50% 87.00 100.00 90.00 82.00 83.00 75% 91.50 100.00 92.00 91.00 84.00 max 96.00 100.00 94.00 100.00 85.00 • • For student grades, the most important of these statistics is probably the mean Can calculate that for each student simply by calling mean on the DataFrame In [28]: grades.mean() Out[28]: Wally Eva Sam Katie 84.33 96.67 90.33 87.67 Bob 77.67 dtype: float64 Transposing the DataFrame with the T Attribute • Can quickly transpose rows and columns—so the rows become the columns, and the columns become the rows—by using the T attribute to get a view In [29]: grades.T Out[29]: Test1 Test2 Test3 Wally 87 96 70 Eva 100 100 90 Sam 94 87 90 Katie 100 81 82 Bob 83 65 85 • • Assume that rather than getting the summary statistics by student, you want to get them by test Call describe on grades.T In [30]: grades.T.describe() Out[30]: count Test1 Test2 Test3 5.00 5.00 5.00 Test1 Test2 Test3 mean 92.80 85.80 83.40 std 7.66 13.81 8.23 min 83.00 65.00 70.00 25% 87.00 81.00 82.00 50% 94.00 87.00 85.00 75% 100.00 96.00 90.00 max 100.00 100.00 90.00 • Get average of all the students’ grades on each test In [31]: grades.T.mean() Out[31]: Test1 92.8 Test2 85.8 Test3 83.4 dtype: float64 Sorting by Rows by Their Indices • • Can sort a DataFrame by its rows or columns, based on their indices or values Sort the rows by their indices in descending order using sort_index and its keyword argument ascending=False In [32]: grades.sort_index(ascending=False) Out[32]: Wally Eva Sam Katie Bob Test3 70 90 90 82 85 Test2 96 100 87 81 65 Test1 87 100 94 100 83 Sorting by Column Indices • • Sort columns into ascending order (left-to-right) by their column names axis=1 keyword argument indicates that we wish to sort the column indices, rather than the row indices ▪ axis=0 (the default) sorts the row indices In [33]: grades.sort_index(axis=1) Out[33]: Bob Eva Katie Sam Wally Test1 83 100 100 94 87 Test2 65 100 81 87 96 Test3 85 90 82 90 70 Sorting by Column Values • • To view Test1’s grades in descending order so we can see the students’ names in highestto-lowest grade order, call method sort_values by and axis arguments work together to determine which values will be sorted ▪ In this case, we sort based on the column values (axis=1) for Test1 In [34]: grades.sort_values(by='Test1', axis=1, ascending=False) Out[34]: Eva Katie Sam Wally Bob Test1 100 100 94 87 83 Test2 100 81 87 96 65 Test3 90 82 90 70 85 • • Might be easier to read the grades and names if they were in a column Sort the transposed DataFrame instead In [35]: grades.T.sort_values(by='Test1', ascending=False) Out[35]: Test1 Test2 Test3 Eva 100 100 90 Katie 100 81 82 Sam 94 87 90 Wally 87 96 70 Bob 83 65 85 • • Since we’re sorting only Test1’s grades, we might not want to see the other tests at all Combine selection with sorting In [36]: grades.loc['Test1'].sort_values(ascending=False) Out[36]: Katie 100 Eva 100 Sam 94 Wally 87 Bob 83 Name: Test1, dtype: int64 Copy vs. In-Place Sorting • • • sort_index and sort_values return a copy of the original DataFrame Could require substantial memory in a big data application Can sort in place by passing the keyword argument inplace=True ©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

    Calculate your order
    275 words
    Total price: $0.00

    Top-quality papers guaranteed

    54

    100% original papers

    We sell only unique pieces of writing completed according to your demands.

    54

    Confidential service

    We use security encryption to keep your personal data protected.

    54

    Money-back guarantee

    We can give your money back if something goes wrong with your order.

    Enjoy the free features we offer to everyone

    1. Title page

      Get a free title page formatted according to the specifics of your particular style.

    2. Custom formatting

      Request us to use APA, MLA, Harvard, Chicago, or any other style for your essay.

    3. Bibliography page

      Don’t pay extra for a list of references that perfectly fits your academic needs.

    4. 24/7 support assistance

      Ask us a question anytime you need to—we don’t charge extra for supporting you!

    Calculate how much your essay costs

    Type of paper
    Academic level
    Deadline
    550 words

    How to place an order

    • Choose the number of pages, your academic level, and deadline
    • Push the orange button
    • Give instructions for your paper
    • Pay with PayPal or a credit card
    • Track the progress of your order
    • Approve and enjoy your custom paper

    Ask experts to write you a cheap essay of excellent quality

    Place an order