NumPy: A Hidden Gem in Python’s Sea of Data Structures

Python comes with some built-in methods for performing mathematical operations. However, these are not typically how mathematics is done in Python. Most programmers prefer using NumPy, as it offers greater simplicity, specificity, and performance. In fact, NumPy is so effective that you can focus on the math itself rather than the coding behind it. In this article, I’ll first introduce Python’s built-in data structures, and then show you the magic and power of NumPy.

NumPy stands out as one of the most essential libraries in Python for mathematical and numerical operations. While Python has built-in methods for handling math, NumPy provides a more efficient and elegant way to perform these tasks.

One of its key advantages is simplicity—it allows you to express complex mathematical operations in clean and readable code, closely resembling standard mathematical notation. This reduces the cognitive load of coding and lets you focus more on solving mathematical problems rather than thinking about implementation details.

Another major strength is specificity. NumPy is purpose-built for numerical computing, which means it comes with a wide range of functions tailored for linear algebra, statistical analysis, matrix manipulation, and more. This makes it far more suited for tasks involving vectors, matrices, and large datasets than generic Python structures like lists or tuples.

Finally, performance is a critical factor. NumPy arrays are more memory-efficient and significantly faster than Python lists due to their implementation in C and use of contiguous memory. This makes NumPy ideal for data science, machine learning, and scientific computing where speed and efficiency matter.

Numpy: A hidden gem in Python's sea of libraries
Numpy: A hidden gem in Python's sea of libraries
Together, these advantages make NumPy the go-to library for doing real mathematics in Python.

Python data structures

We’ll start by exploring Python’s built-in data structures that are frequently used in mathematical computations. Recognizing their limitations will help you fully appreciate the strengths of NumPy—something that’s easy to take for granted. Take your time with this section, but if you’re already confident with Python fundamentals, feel free to move ahead.

Python Lists

Python lists are the most common data structures that everyone encounters when learning Python. They are ordered collections of items and are mutable, meaning you can add, remove, or modify elements after creation. Lists in Python provide a flexible and intuitive way to represent vectors and matrices.

The following example shows how to create a list, add items to it, and retrieve values by their position.

# Initialize a Python list
python_list = [1, 2, 3, 4, 5]

# Modifying an element
python_list[1] = 10

# Append elements to the list
python_list.append(6)
python_list.insert(2,12)

# Access elements by index
first_element = python_list[0]
last_element = python_list[-1]

This example highlights several key features of Python lists:

  1. Ordered Structure: Python lists preserve the order of elements, which is crucial when representing data structures like vectors and matrices.

  2. Mutability: Lists allow in-place modification of elements, enabling easy updates to existing data.

  3. Dynamic Sizing: Lists can automatically expand or contract as elements are added or removed, offering flexibility in dynamic applications.

  4. User-Friendly Design: With their intuitive syntax and versatility, lists are well-suited for constructing and manipulating numerical structures such as vectors and matrices.

Python Sets

Sets are collections of unique elements, and they are not inherently ordered. While you could technically store arrays, vectors, or matrices in a set, it would not be practical for most numerical computing tasks. Sets are more commonly used for tasks where uniqueness and set operations (such as union, intersection, etc.) are required.

# Create a set of numbers
my_set = {1, 2, 3, 4, 5}

# Add a new element
my_set.add(6)

# Try adding a duplicate (it will be ignored)
my_set.add(3)

# Remove an element
my_set.remove(2)

# Check if a value exists in the set
is_three_present = 3 in my_set

Python sets are mutable, easy to use, and have dynamic size, just like Python lists. However, the big difference is that sets are unordered. Because of this lack of order, you can’t access elements by index, insert at a specific position, or append in the way you do with lists. Sets don’t maintain the order of elements, so you also can't predict the position of an element after adding it. What you can do is:

  • Add elements using .add()

  • Remove elements using .remove() or .discard()

  • Check for existence using the in keyword

  • Compute unions, intersections and other set operations.

Python Dictionaries

Dictionaries are collections of key-value pairs, where each key is associated with a value. While dictionaries could be used to store arrays, vectors, or matrices by using keys as identifiers, this approach is not common or efficient for numerical computing. Dictionaries are more commonly used for tasks where efficient lookup by key is required.

Python Tuples

Tuples are immutable collections of elements, and they can store heterogeneous data types. While tuples could technically be used to store arrays, vectors, or matrices, they are not well-suited for this purpose due to their immutability and lack of specialized operations for numerical computations. Tuples are more commonly used for tasks where ordered, immutable collections are needed.

Python arrays

In Python, both arrays and lists are data structures used for storing collections of items. While they share some similarities, they also have distinct characteristics that make them suitable for different purposes. Python arrays are a more specialized data structure available through the built-in array module. While similar to lists, arrays have a fixed size and contain elements of the same data type. This makes arrays more memory-efficient and suitable for storing large sequences of homogeneous data.