🔢 NumPy Handbook

Master numerical computing with arrays, broadcasting, and efficient data operations

1. NumPy Introduction

NumPy is the core library for scientific computing in Python. It is essential for any data science or machine learning algorithms. The central object is the NumPy array, a high-performance multidimensional array object designed for math operations, linear algebra, and probability calculations.

Why NumPy? Much faster than Python lists, requires less code, and provides powerful mathematical functions

Many popular libraries use NumPy under the hood:

  • Scikit-learn - Machine learning
  • Matplotlib - Data visualization
  • Pandas - Data analysis
  • TensorFlow/PyTorch - Deep learning

2. Installation and Array Basics

Install with pip or Anaconda:

Bash
# Install with pip
$ pip install numpy

# Or with Anaconda
$ conda install numpy

Import and Check Version

Python
import numpy as np

# Check version
print(np.__version__)  # 1.21.0 (or newer)

Creating Arrays

Python
a = np.array([1, 2, 3, 4, 5])
print(a)          # [1 2 3 4 5]

print(a.shape)    # (5,)
print(a.dtype)    # int64 (or int32 depending on system)
print(a.ndim)     # 1 (number of dimensions)
print(a.size)     # 5 (total number of elements)
print(a.itemsize) # 8 (size of each element in bytes)

Accessing and Modifying Elements

Python
a = np.array([1, 2, 3])

# Access elements
print(a[0])  # 1

# Change elements
a[0] = 5
print(a)     # [5 2 3]

# Element wise operations
b = np.array([2, 0, 2])
print(a * b)  # [10 0 6]
print(a.sum())  # 10

3. Array vs List

Understanding the key differences between NumPy arrays and Python lists.

Size Mutability

Python
l = [1, 2, 3]
a = np.array([1, 2, 3])

# Lists can grow
l.append(4)
print(l)  # [1, 2, 3, 4]

# Arrays have fixed size
# a.append(4)  # Error: AttributeError

Arithmetic Operations

Python
l = [1, 2, 3]
a = np.array([1, 2, 3])

# Addition
l2 = l + [5]
print(l2)  # [1, 2, 3, 5] (concatenation)

a2 = a + np.array([4])
print(a2)  # [5 6 7] (broadcasting)

# Multiplication
l2 = 2 * l
print(l2)  # [1, 2, 3, 1, 2, 3] (repetition)

a3 = 2 * a
print(a3)  # [2 4 6] (element wise multiplication)

Applying Functions

Python
l = [1, 2, 3]
a = np.array([1, 2, 3])

# List: need list comprehension
l2 = [i**2 for i in l]
print(l2)  # [1, 4, 9]

# NumPy: vectorized operations
a2 = a**2
print(a2)  # [1 4 9]

# Universal functions
print(np.sqrt(a))  # [1.         1.41421356 1.73205081]
print(np.log(a))   # [0.         0.69314718 1.09861229]
print(np.sin(a))   # [0.84147098 0.90929743 0.14112001]

4. Dot Product

The dot product is the sum of the products of corresponding entries of two sequences of numbers.

Python
a = np.array([1, 2])
b = np.array([3, 4])

# Manual way (cumbersome)
dot = 0
for i in range(len(a)):
    dot += a[i] * b[i]
print(dot)  # 11

# NumPy function
dot = np.dot(a, b)
print(dot)  # 11

# Instance method
dot = a.dot(b)
print(dot)  # 11

# @ operator (Python 3.5+)
dot = a @ b
print(dot)  # 11

5. Speed Test: Array vs List

NumPy is significantly faster for numerical operations because its core is written in C.

Python
import numpy as np
from timeit import default_timer as timer

a = np.random.randn(1000)
b = np.random.randn(1000)
A = list(a)
B = list(b)
T = 1000

def dot1():
    dot = 0
    for i in range(len(A)):
        dot += A[i] * B[i]
    return dot

def dot2():
    return np.dot(a, b)

# Time list operation
start = timer()
for t in range(T):
    dot1()
end = timer()
t1 = end - start

# Time NumPy operation
start = timer()
for t in range(T):
    dot2()
end = timer()
t2 = end - start

print(f"Time with lists: {t1:.5f}s")
print(f"Time with array: {t2:.5f}s")
print(f"Ratio: {t1 / t2:.1f}x faster")
# Typically 50-100x faster!

6. Multidimensional Arrays

NumPy arrays can have multiple dimensions, like matrices and tensors.

Python
a = np.array([[1, 2], [3, 4]])
print(a)
# [[1 2]
#  [3 4]]

print(a.shape)  # (2, 2)

# Accessing elements (row first, then column)
print(a[0])      # [1 2] (first row)
print(a[0][0])   # 1
print(a[0, 0])   # 1 (preferred syntax)

# Slicing
print(a[:, 0])   # [1 3] (all rows, column 0)
print(a[0, :])   # [1 2] (row 0, all columns)

# Transpose
print(a.T)
# [[1 3]
#  [2 4]]

Matrix Operations

Python
a = np.array([[1, 2], [3, 4]])
b = np.array([[3, 4], [5, 6]])

# Element wise multiplication
print(a * b)
# [[ 3  8]
#  [15 24]]

# Matrix multiplication
print(a.dot(b))  # or a @ b
# [[11 16]
#  [23 34]]

# Linear algebra functions
print(np.linalg.det(a))  # -2.0 (determinant)
print(np.linalg.inv(a))  # Inverse matrix
print(np.diag(a))        # [1 4] (diagonal elements)

7. Indexing, Slicing, and Boolean Indexing

Basic Slicing

Slicing syntax: [start:stop:step]

Python
a = np.array([[1, 2, 3, 4], 
              [5, 6, 7, 8], 
              [9, 10, 11, 12]])

# Get subarray
slice_a = a[0:2, 1:3]
print(slice_a)
# [[2 3]
#  [6 7]]

# Negative indexing
print(a[-1, -1])  # 12

# Every other element
print(a[::2, ::2])
# [[ 1  3]
#  [ 9 11]]

Boolean Indexing

Python
a = np.array([[1, 2], [3, 4], [5, 6]])

# Create boolean mask
bool_idx = a > 2
print(bool_idx)
# [[False False]
#  [ True  True]
#  [ True  True]]

# Use boolean index to filter
print(a[bool_idx])  # [3 4 5 6] (flattened)

# One line filtering
print(a[a > 2])  # [3 4 5 6]

# np.where() - ternary operator
# np.where(condition, value_if_true, value_if_false)
b = np.where(a > 2, a, -1)
print(b)
# [[-1 -1]
#  [ 3  4]
#  [ 5  6]]

Fancy Indexing

Python
a = np.array([10, 19, 30, 41, 50, 61])

# Get elements at specific indices
b = a[[1, 3, 5]]
print(b)  # [19 41 61]

# Find indices of even numbers
even = np.argwhere(a % 2 == 0).flatten()
print(even)     # [0 2 4]
print(a[even])  # [10 30 50]

8. Reshaping

Change the shape of an array while preserving the total number of elements.

Python
a = np.arange(1, 7)  # [1 2 3 4 5 6]

# Reshape to 2 rows, 3 columns
b = a.reshape((2, 3))
print(b)
# [[1 2 3]
#  [4 5 6]]

# Reshape to 3 rows, 2 columns
c = a.reshape((3, 2))
print(c)
# [[1 2]
#  [3 4]
#  [5 6]]

# Automatic dimension calculation
d = a.reshape((2, -1))  # NumPy calculates the other dimension
print(d)  # (2, 3)

Adding Dimensions

Python
a = np.arange(1, 7)
print(a.shape)  # (6,)

# Add new axis for row vector
d = a[np.newaxis, :]
print(d)        # [[1 2 3 4 5 6]]
print(d.shape)  # (1, 6)

# Add new axis for column vector
e = a[:, np.newaxis]
print(e)
# [[1]
#  [2]
#  [3]
#  [4]
#  [5]
#  [6]]
print(e.shape)  # (6, 1)

9. Concatenation

Joining arrays together along different axes.

Python
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

# Concatenate along axis=None (flattens first)
c = np.concatenate((a, b), axis=None)
print(c)  # [1 2 3 4 5 6]

# Concatenate along axis=0 (add new row)
d = np.concatenate((a, b), axis=0)
print(d)
# [[1 2]
#  [3 4]
#  [5 6]]

# Concatenate along axis=1 (add new column)
e = np.concatenate((a, b.T), axis=1)
print(e)
# [[1 2 5]
#  [3 4 6]]

Stack Functions

Python
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])

# Horizontal stack (column-wise)
c = np.hstack((a, b))
print(c)  # [1 2 3 4 5 6 7 8]

# Vertical stack (row-wise)
c = np.vstack((a, b))
print(c)
# [[1 2 3 4]
#  [5 6 7 8]]

10. Broadcasting

Broadcasting allows NumPy to perform operations on arrays of different shapes.

Broadcasting Rules: The smaller array is "broadcast" across the larger array so that they have compatible shapes

Python
x = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9], 
              [10, 11, 12]])
y = np.array([1, 0, 1])

# y is added to each row of x
z = x + y
print(z)
# [[ 2  2  4]
#  [ 5  5  7]
#  [ 8  8 10]
#  [11 11 13]]

# Scalar broadcasting
a = np.array([[1, 2], [3, 4]])
print(a + 10)
# [[11 12]
#  [13 14]]

11. Functions and Axis

Operations can be performed along specific axes:

  • axis=None - Operates on entire array
  • axis=0 - Operates "down" the rows (result per column)
  • axis=1 - Operates "across" the columns (result per row)
Python
a = np.array([[7, 8, 9, 10, 11, 12, 13], 
              [17, 18, 19, 20, 21, 22, 23]])

# Sum
print(a.sum(axis=None))  # 210 (all elements)
print(a.sum(axis=0))     # [24 26 28 30 32 34 36] (sum each column)
print(a.sum(axis=1))     # [ 70 140] (sum each row)

# Mean
print(a.mean(axis=None))  # 15.0
print(a.mean(axis=0))     # [12. 13. 14. 15. 16. 17. 18.]
print(a.mean(axis=1))     # [10. 20.]

# Other useful functions
print(a.std())   # Standard deviation
print(a.var())   # Variance
print(a.min())   # Minimum
print(a.max())   # Maximum
print(a.argmin())  # Index of minimum
print(a.argmax())  # Index of maximum

12. Datatypes

NumPy can infer datatypes or you can set them explicitly for memory efficiency.

Python
# Let NumPy choose
x = np.array([1, 2])
print(x.dtype)  # int64

x = np.array([1.0, 2.0])
print(x.dtype)  # float64

# Force specific datatype
x = np.array([1, 2], dtype=np.int64)   # 8 bytes
print(x.dtype)  # int64

x = np.array([1, 2], dtype=np.float32)  # 4 bytes
print(x.dtype)  # float32

x = np.array([1, 2], dtype=np.int16)    # 2 bytes
print(x.dtype)  # int16

# Common dtypes: int8, int16, int32, int64
#                float32, float64, complex64, complex128
#                bool, object

13. Copying

Critical concept: Assignment creates a view, not a copy. Use .copy() for independent arrays.

Python
# This is just a reference (view)
a = np.array([1, 2, 3])
b = a
b[0] = 42
print(a)  # [42 2 3] - a is modified!

# This is an actual copy
a = np.array([1, 2, 3])
b = a.copy()
b[0] = 42
print(a)  # [1 2 3] - a is unchanged

# Slices are views too!
a = np.array([1, 2, 3, 4])
b = a[0:2]
b[0] = 99
print(a)  # [99 2 3 4] - original modified!

# Make a copy of slice
b = a[0:2].copy()
b[0] = 99
print(a)  # [1 2 3 4] - original unchanged

14. Generating Arrays

NumPy provides many functions to create arrays of specific values or patterns.

Python
# Zeros
a = np.zeros((2, 3))
print(a)
# [[0. 0. 0.]
#  [0. 0. 0.]]

# Ones
b = np.ones((2, 3))
print(b)
# [[1. 1. 1.]
#  [1. 1. 1.]]

# Specific value
c = np.full((3, 3), 5.0)
print(c)
# [[5. 5. 5.]
#  [5. 5. 5.]
#  [5. 5. 5.]]

# Identity matrix
d = np.eye(3)
print(d)
# [[1. 0. 0.]
#  [0. 1. 0.]
#  [0. 0. 1.]]

# Range of values
e = np.arange(10)
print(e)  # [0 1 2 3 4 5 6 7 8 9]

e = np.arange(2, 10, 2)
print(e)  # [2 4 6 8]

# Linearly spaced values
f = np.linspace(0, 10, 5)  # Start, Stop, Num Points
print(f)  # [ 0.   2.5  5.   7.5 10. ]

# Logarithmically spaced
g = np.logspace(0, 3, 4)  # 10^0 to 10^3
print(g)  # [   1.   10.  100. 1000.]

15. Random Numbers

Generate random arrays for testing, simulations, and machine learning.

Python
# Uniform distribution (0 to 1)
a = np.random.random((3, 2))
print(a)

# Normal/Gaussian distribution (mean=0, std=1)
b = np.random.randn(3, 2)
print(b)

# Random integers
c = np.random.randint(3, 10, size=(3, 3))  # low, high (exclusive), size
print(c)

# Random choice from array
d = np.random.choice([1, 2, 3, 4], size=8)
print(d)

# Set random seed for reproducibility
np.random.seed(42)
print(np.random.random(3))  # Always same result
np.random.seed(42)
print(np.random.random(3))  # Same as above

# Shuffle array in-place
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr)

16. Linear Algebra

Eigenvalues and Eigenvectors

Python
a = np.array([[1, 2], [3, 4]])

eigenvalues, eigenvectors = np.linalg.eig(a)
print(eigenvalues)
# [-0.37228132  5.37228132]

print(eigenvectors)  # Column vectors
# [[-0.82456484 -0.41597356]
#  [ 0.56576746 -0.90937671]]

# Verify: A * eigenvector = eigenvalue * eigenvector
d = eigenvectors[:, 0] * eigenvalues[0]
e = a @ eigenvectors[:, 0]
print(np.allclose(d, e))  # True

Solving Linear Systems

Solve systems like: x₁ + x₂ = 2200 and 1.5x₁ + 4x₂ = 5050

Python
A = np.array([[1, 1], [1.5, 4]])
b = np.array([2200, 5050])

# Bad way: slow and less accurate
x = np.linalg.inv(A).dot(b)
print(x)  # [1500.  700.]

# Good way: use np.linalg.solve()
x = np.linalg.solve(A, b)
print(x)  # [1500.  700.]

# Other useful linear algebra functions
print(np.linalg.norm(A))      # Matrix norm
print(np.linalg.matrix_rank(A))  # Matrix rank
U, s, V = np.linalg.svd(A)    # Singular value decomposition

17. Loading Data from Files

NumPy provides simple functions for loading text/CSV data.

Python
# 1. np.loadtxt() - Simple, consistent data
# Assumes all data is the same type
data = np.loadtxt('data.csv', delimiter=',', dtype=np.float32)
print(data.shape, data.dtype)

# Skip header row
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)

# 2. np.genfromtxt() - More robust, handles missing values
data = np.genfromtxt('data.csv', delimiter=',', dtype=np.float32)
print(data.shape)

# Handle missing values
data = np.genfromtxt('data.csv', delimiter=',', filling_values=0)

# 3. Save arrays
a = np.array([[1, 2, 3], [4, 5, 6]])

# Save as text
np.savetxt('output.csv', a, delimiter=',')

# Save as binary (faster, smaller)
np.save('output.npy', a)

# Load binary
loaded = np.load('output.npy')

# Save multiple arrays
np.savez('arrays.npz', a=a, b=b)

Note: For more advanced data loading with headers, mixed types, and data analysis, use Pandas instead!

🎉 Congratulations!

You've mastered NumPy fundamentals! Ready to explore Pandas for data analysis?

Next: Pandas → ← Back to Python 🏠 Home