🔢 NumPy Handbook
Master numerical computing with arrays, broadcasting, and efficient data operations
1. NumPy Introduction
NumPy is the core library for scientific computing in Python. It is essential for any data science or machine learning algorithms. The central object is the NumPy array, a high-performance multidimensional array object designed for math operations, linear algebra, and probability calculations.
Why NumPy? Much faster than Python lists, requires less code, and provides powerful mathematical functions
Many popular libraries use NumPy under the hood:
- Scikit-learn - Machine learning
- Matplotlib - Data visualization
- Pandas - Data analysis
- TensorFlow/PyTorch - Deep learning
2. Installation and Array Basics
Install with pip or Anaconda:
# Install with pip
$ pip install numpy
# Or with Anaconda
$ conda install numpy
Import and Check Version
import numpy as np
# Check version
print(np.__version__) # 1.21.0 (or newer)
Creating Arrays
a = np.array([1, 2, 3, 4, 5])
print(a) # [1 2 3 4 5]
print(a.shape) # (5,)
print(a.dtype) # int64 (or int32 depending on system)
print(a.ndim) # 1 (number of dimensions)
print(a.size) # 5 (total number of elements)
print(a.itemsize) # 8 (size of each element in bytes)
Accessing and Modifying Elements
a = np.array([1, 2, 3])
# Access elements
print(a[0]) # 1
# Change elements
a[0] = 5
print(a) # [5 2 3]
# Element wise operations
b = np.array([2, 0, 2])
print(a * b) # [10 0 6]
print(a.sum()) # 10
3. Array vs List
Understanding the key differences between NumPy arrays and Python lists.
Size Mutability
l = [1, 2, 3]
a = np.array([1, 2, 3])
# Lists can grow
l.append(4)
print(l) # [1, 2, 3, 4]
# Arrays have fixed size
# a.append(4) # Error: AttributeError
Arithmetic Operations
l = [1, 2, 3]
a = np.array([1, 2, 3])
# Addition
l2 = l + [5]
print(l2) # [1, 2, 3, 5] (concatenation)
a2 = a + np.array([4])
print(a2) # [5 6 7] (broadcasting)
# Multiplication
l2 = 2 * l
print(l2) # [1, 2, 3, 1, 2, 3] (repetition)
a3 = 2 * a
print(a3) # [2 4 6] (element wise multiplication)
Applying Functions
l = [1, 2, 3]
a = np.array([1, 2, 3])
# List: need list comprehension
l2 = [i**2 for i in l]
print(l2) # [1, 4, 9]
# NumPy: vectorized operations
a2 = a**2
print(a2) # [1 4 9]
# Universal functions
print(np.sqrt(a)) # [1. 1.41421356 1.73205081]
print(np.log(a)) # [0. 0.69314718 1.09861229]
print(np.sin(a)) # [0.84147098 0.90929743 0.14112001]
4. Dot Product
The dot product is the sum of the products of corresponding entries of two sequences of numbers.
a = np.array([1, 2])
b = np.array([3, 4])
# Manual way (cumbersome)
dot = 0
for i in range(len(a)):
dot += a[i] * b[i]
print(dot) # 11
# NumPy function
dot = np.dot(a, b)
print(dot) # 11
# Instance method
dot = a.dot(b)
print(dot) # 11
# @ operator (Python 3.5+)
dot = a @ b
print(dot) # 11
5. Speed Test: Array vs List
NumPy is significantly faster for numerical operations because its core is written in C.
import numpy as np
from timeit import default_timer as timer
a = np.random.randn(1000)
b = np.random.randn(1000)
A = list(a)
B = list(b)
T = 1000
def dot1():
dot = 0
for i in range(len(A)):
dot += A[i] * B[i]
return dot
def dot2():
return np.dot(a, b)
# Time list operation
start = timer()
for t in range(T):
dot1()
end = timer()
t1 = end - start
# Time NumPy operation
start = timer()
for t in range(T):
dot2()
end = timer()
t2 = end - start
print(f"Time with lists: {t1:.5f}s")
print(f"Time with array: {t2:.5f}s")
print(f"Ratio: {t1 / t2:.1f}x faster")
# Typically 50-100x faster!
6. Multidimensional Arrays
NumPy arrays can have multiple dimensions, like matrices and tensors.
a = np.array([[1, 2], [3, 4]])
print(a)
# [[1 2]
# [3 4]]
print(a.shape) # (2, 2)
# Accessing elements (row first, then column)
print(a[0]) # [1 2] (first row)
print(a[0][0]) # 1
print(a[0, 0]) # 1 (preferred syntax)
# Slicing
print(a[:, 0]) # [1 3] (all rows, column 0)
print(a[0, :]) # [1 2] (row 0, all columns)
# Transpose
print(a.T)
# [[1 3]
# [2 4]]
Matrix Operations
a = np.array([[1, 2], [3, 4]])
b = np.array([[3, 4], [5, 6]])
# Element wise multiplication
print(a * b)
# [[ 3 8]
# [15 24]]
# Matrix multiplication
print(a.dot(b)) # or a @ b
# [[11 16]
# [23 34]]
# Linear algebra functions
print(np.linalg.det(a)) # -2.0 (determinant)
print(np.linalg.inv(a)) # Inverse matrix
print(np.diag(a)) # [1 4] (diagonal elements)
7. Indexing, Slicing, and Boolean Indexing
Basic Slicing
Slicing syntax: [start:stop:step]
a = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Get subarray
slice_a = a[0:2, 1:3]
print(slice_a)
# [[2 3]
# [6 7]]
# Negative indexing
print(a[-1, -1]) # 12
# Every other element
print(a[::2, ::2])
# [[ 1 3]
# [ 9 11]]
Boolean Indexing
a = np.array([[1, 2], [3, 4], [5, 6]])
# Create boolean mask
bool_idx = a > 2
print(bool_idx)
# [[False False]
# [ True True]
# [ True True]]
# Use boolean index to filter
print(a[bool_idx]) # [3 4 5 6] (flattened)
# One line filtering
print(a[a > 2]) # [3 4 5 6]
# np.where() - ternary operator
# np.where(condition, value_if_true, value_if_false)
b = np.where(a > 2, a, -1)
print(b)
# [[-1 -1]
# [ 3 4]
# [ 5 6]]
Fancy Indexing
a = np.array([10, 19, 30, 41, 50, 61])
# Get elements at specific indices
b = a[[1, 3, 5]]
print(b) # [19 41 61]
# Find indices of even numbers
even = np.argwhere(a % 2 == 0).flatten()
print(even) # [0 2 4]
print(a[even]) # [10 30 50]
8. Reshaping
Change the shape of an array while preserving the total number of elements.
a = np.arange(1, 7) # [1 2 3 4 5 6]
# Reshape to 2 rows, 3 columns
b = a.reshape((2, 3))
print(b)
# [[1 2 3]
# [4 5 6]]
# Reshape to 3 rows, 2 columns
c = a.reshape((3, 2))
print(c)
# [[1 2]
# [3 4]
# [5 6]]
# Automatic dimension calculation
d = a.reshape((2, -1)) # NumPy calculates the other dimension
print(d) # (2, 3)
Adding Dimensions
a = np.arange(1, 7)
print(a.shape) # (6,)
# Add new axis for row vector
d = a[np.newaxis, :]
print(d) # [[1 2 3 4 5 6]]
print(d.shape) # (1, 6)
# Add new axis for column vector
e = a[:, np.newaxis]
print(e)
# [[1]
# [2]
# [3]
# [4]
# [5]
# [6]]
print(e.shape) # (6, 1)
9. Concatenation
Joining arrays together along different axes.
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
# Concatenate along axis=None (flattens first)
c = np.concatenate((a, b), axis=None)
print(c) # [1 2 3 4 5 6]
# Concatenate along axis=0 (add new row)
d = np.concatenate((a, b), axis=0)
print(d)
# [[1 2]
# [3 4]
# [5 6]]
# Concatenate along axis=1 (add new column)
e = np.concatenate((a, b.T), axis=1)
print(e)
# [[1 2 5]
# [3 4 6]]
Stack Functions
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
# Horizontal stack (column-wise)
c = np.hstack((a, b))
print(c) # [1 2 3 4 5 6 7 8]
# Vertical stack (row-wise)
c = np.vstack((a, b))
print(c)
# [[1 2 3 4]
# [5 6 7 8]]
10. Broadcasting
Broadcasting allows NumPy to perform operations on arrays of different shapes.
Broadcasting Rules: The smaller array is "broadcast" across the larger array so that they have compatible shapes
x = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]])
y = np.array([1, 0, 1])
# y is added to each row of x
z = x + y
print(z)
# [[ 2 2 4]
# [ 5 5 7]
# [ 8 8 10]
# [11 11 13]]
# Scalar broadcasting
a = np.array([[1, 2], [3, 4]])
print(a + 10)
# [[11 12]
# [13 14]]
11. Functions and Axis
Operations can be performed along specific axes:
axis=None- Operates on entire arrayaxis=0- Operates "down" the rows (result per column)axis=1- Operates "across" the columns (result per row)
a = np.array([[7, 8, 9, 10, 11, 12, 13],
[17, 18, 19, 20, 21, 22, 23]])
# Sum
print(a.sum(axis=None)) # 210 (all elements)
print(a.sum(axis=0)) # [24 26 28 30 32 34 36] (sum each column)
print(a.sum(axis=1)) # [ 70 140] (sum each row)
# Mean
print(a.mean(axis=None)) # 15.0
print(a.mean(axis=0)) # [12. 13. 14. 15. 16. 17. 18.]
print(a.mean(axis=1)) # [10. 20.]
# Other useful functions
print(a.std()) # Standard deviation
print(a.var()) # Variance
print(a.min()) # Minimum
print(a.max()) # Maximum
print(a.argmin()) # Index of minimum
print(a.argmax()) # Index of maximum
12. Datatypes
NumPy can infer datatypes or you can set them explicitly for memory efficiency.
# Let NumPy choose
x = np.array([1, 2])
print(x.dtype) # int64
x = np.array([1.0, 2.0])
print(x.dtype) # float64
# Force specific datatype
x = np.array([1, 2], dtype=np.int64) # 8 bytes
print(x.dtype) # int64
x = np.array([1, 2], dtype=np.float32) # 4 bytes
print(x.dtype) # float32
x = np.array([1, 2], dtype=np.int16) # 2 bytes
print(x.dtype) # int16
# Common dtypes: int8, int16, int32, int64
# float32, float64, complex64, complex128
# bool, object
13. Copying
Critical concept: Assignment creates a view, not a copy. Use .copy() for independent arrays.
# This is just a reference (view)
a = np.array([1, 2, 3])
b = a
b[0] = 42
print(a) # [42 2 3] - a is modified!
# This is an actual copy
a = np.array([1, 2, 3])
b = a.copy()
b[0] = 42
print(a) # [1 2 3] - a is unchanged
# Slices are views too!
a = np.array([1, 2, 3, 4])
b = a[0:2]
b[0] = 99
print(a) # [99 2 3 4] - original modified!
# Make a copy of slice
b = a[0:2].copy()
b[0] = 99
print(a) # [1 2 3 4] - original unchanged
14. Generating Arrays
NumPy provides many functions to create arrays of specific values or patterns.
# Zeros
a = np.zeros((2, 3))
print(a)
# [[0. 0. 0.]
# [0. 0. 0.]]
# Ones
b = np.ones((2, 3))
print(b)
# [[1. 1. 1.]
# [1. 1. 1.]]
# Specific value
c = np.full((3, 3), 5.0)
print(c)
# [[5. 5. 5.]
# [5. 5. 5.]
# [5. 5. 5.]]
# Identity matrix
d = np.eye(3)
print(d)
# [[1. 0. 0.]
# [0. 1. 0.]
# [0. 0. 1.]]
# Range of values
e = np.arange(10)
print(e) # [0 1 2 3 4 5 6 7 8 9]
e = np.arange(2, 10, 2)
print(e) # [2 4 6 8]
# Linearly spaced values
f = np.linspace(0, 10, 5) # Start, Stop, Num Points
print(f) # [ 0. 2.5 5. 7.5 10. ]
# Logarithmically spaced
g = np.logspace(0, 3, 4) # 10^0 to 10^3
print(g) # [ 1. 10. 100. 1000.]
15. Random Numbers
Generate random arrays for testing, simulations, and machine learning.
# Uniform distribution (0 to 1)
a = np.random.random((3, 2))
print(a)
# Normal/Gaussian distribution (mean=0, std=1)
b = np.random.randn(3, 2)
print(b)
# Random integers
c = np.random.randint(3, 10, size=(3, 3)) # low, high (exclusive), size
print(c)
# Random choice from array
d = np.random.choice([1, 2, 3, 4], size=8)
print(d)
# Set random seed for reproducibility
np.random.seed(42)
print(np.random.random(3)) # Always same result
np.random.seed(42)
print(np.random.random(3)) # Same as above
# Shuffle array in-place
arr = np.array([1, 2, 3, 4, 5])
np.random.shuffle(arr)
print(arr)
16. Linear Algebra
Eigenvalues and Eigenvectors
a = np.array([[1, 2], [3, 4]])
eigenvalues, eigenvectors = np.linalg.eig(a)
print(eigenvalues)
# [-0.37228132 5.37228132]
print(eigenvectors) # Column vectors
# [[-0.82456484 -0.41597356]
# [ 0.56576746 -0.90937671]]
# Verify: A * eigenvector = eigenvalue * eigenvector
d = eigenvectors[:, 0] * eigenvalues[0]
e = a @ eigenvectors[:, 0]
print(np.allclose(d, e)) # True
Solving Linear Systems
Solve systems like: x₁ + x₂ = 2200 and 1.5x₁ + 4x₂ = 5050
A = np.array([[1, 1], [1.5, 4]])
b = np.array([2200, 5050])
# Bad way: slow and less accurate
x = np.linalg.inv(A).dot(b)
print(x) # [1500. 700.]
# Good way: use np.linalg.solve()
x = np.linalg.solve(A, b)
print(x) # [1500. 700.]
# Other useful linear algebra functions
print(np.linalg.norm(A)) # Matrix norm
print(np.linalg.matrix_rank(A)) # Matrix rank
U, s, V = np.linalg.svd(A) # Singular value decomposition
17. Loading Data from Files
NumPy provides simple functions for loading text/CSV data.
# 1. np.loadtxt() - Simple, consistent data
# Assumes all data is the same type
data = np.loadtxt('data.csv', delimiter=',', dtype=np.float32)
print(data.shape, data.dtype)
# Skip header row
data = np.loadtxt('data.csv', delimiter=',', skiprows=1)
# 2. np.genfromtxt() - More robust, handles missing values
data = np.genfromtxt('data.csv', delimiter=',', dtype=np.float32)
print(data.shape)
# Handle missing values
data = np.genfromtxt('data.csv', delimiter=',', filling_values=0)
# 3. Save arrays
a = np.array([[1, 2, 3], [4, 5, 6]])
# Save as text
np.savetxt('output.csv', a, delimiter=',')
# Save as binary (faster, smaller)
np.save('output.npy', a)
# Load binary
loaded = np.load('output.npy')
# Save multiple arrays
np.savez('arrays.npz', a=a, b=b)
Note: For more advanced data loading with headers, mixed types, and data analysis, use Pandas instead!
🎉 Congratulations!
You've mastered NumPy fundamentals! Ready to explore Pandas for data analysis?