Chapter 10: Multiprocessing: True Parallelism for CPU-Bound Tasks

In the previous chapter, we explored how threading provides excellent concurrency for I/O-bound workloads. However, we also established its fundamental limitation for CPU-bound tasks: the Global Interpreter Lock (GIL). Because of the GIL, no matter how many threads you create, only one of them can execute Python bytecode at any given moment. For tasks that require heavy computation, threading offers no performance benefit.

To unlock true parallelism in Python, we must sidestep the GIL. The standard library's solution is the multiprocessing module. Instead of creating threads, which all live within a single process and share a single GIL, this module creates entirely new processes. Each new process gets its own Python interpreter and its own memory space, and therefore, its own GIL. This allows your program to execute different chunks of Python code on different CPU cores simultaneously.

This chapter covers:

Using the modern concurrent.futures.ProcessPoolExecutor.
Understanding the overhead of inter-process communication (IPC).
The role of serialization (pickling) when passing data.
A guideline for when to choose multiprocessing over threading.

Modern Multiprocessing: `ProcessPoolExecutor`

Just as ThreadPoolExecutor is the modern way to manage threads, ProcessPoolExecutor is the preferred high-level interface for managing a pool of worker processes. The API is remarkably consistent, making it easy to switch between the two.

Let's consider a CPU-bound task: finding the largest prime factor for a series of large numbers. This is a purely computational task.

import concurrent.futures
import time
import math

NUMBERS = [
    100913_100913,
    100913_100919,
    100913_100921,
    100913_100923,
    100913_100929,
    100913_100931,
]

def find_largest_prime_factor(n):
    """A simple, slow function to find the largest prime factor of n."""
    factor = 2
    last_factor = 1
    while n > 1:
        if n % factor == 0:
            last_factor = factor
            n //= factor
            while n % factor == 0:
                n //= factor
        factor += 1
    return last_factor

def main():
    start_time = time.time()
    # The 'with' statement ensures the pool is properly shut down.
    with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
        results = executor.map(find_largest_prime_factor, NUMBERS)

    for number, factor in zip(NUMBERS, results):
        print(f"Largest prime factor of {number} is {factor}")

    print(f"Completed in {time.time() - start_time:.2f} seconds")

if __name__ == "__main__":
    main()

If you run this code, you'll see a significant speedup compared to running it sequentially. The operating system distributes the worker processes across the available CPU cores, and since each has its own GIL, the calculations happen in parallel.

Note on if __name__ == "__main__": This guard is essential for multiprocessing on some platforms (like Windows and macOS). When a child process is spawned, it re-imports the script. The guard ensures that the process-spawning code only runs once in the main script, not in every child process, preventing an infinite loop of process creation.

The Cost: Data Serialization and IPC

The power of multiprocessing comes at a price. Unlike threads, which share memory within a single process, processes have their own isolated memory spaces. This is a safety feature—it prevents processes from corrupting each other's state—but it means that any data passed between the main process and a worker process must be carefully packaged and transmitted. This is called Inter-Process Communication (IPC).

In Python, this communication is handled by the pickle module.

Serialization (Pickling): When you submit a task to the ProcessPoolExecutor, the target function and its arguments are serialized into a byte stream.
Transmission: This byte stream is sent from the main process to a worker process (e.g., over a pipe).
Deserialization (Unpickling): The worker process receives the byte stream and reconstructs the function and arguments in its own memory.
The process is reversed for the return value.

This serialization overhead is not trivial. If you send very large objects (e.g., a massive list or a complex custom object) to a worker process, the time spent pickling and unpickling can sometimes outweigh the benefits of parallelism.

Furthermore, not everything is picklable. Objects like open file handles, database connections, and some lambda functions cannot be serialized, and trying to pass them to another process will result in an error.

`threading` vs. `multiprocessing`: A Guideline

Choosing between the two models is a critical design decision based on the nature of your task.

Aspect

threading

multiprocessing

Best For

I/O-Bound tasks (networking, file I/O)

CPU-Bound tasks (calculations, data processing)

GIL

All threads share one GIL; no parallelism

Each process has its own GIL; true parallelism

Memory

Threads share memory (easy data sharing)

Processes have separate memory (data is copied)

Overhead

Low startup overhead

High startup overhead

Data Sharing

Prone to race conditions; requires locks

Requires serialization; prone to high IPC cost

Primary Goal

Responsiveness

Performance

Rule of Thumb: Profile your code. If your program is spending most of its time waiting, use threading. If it's spending most of its time executing Python code, use multiprocessing.

Summary

The multiprocessing module is Python's definitive answer to the Global Interpreter Lock for CPU-bound problems. By leveraging separate processes, it enables true parallelism and allows you to take full advantage of multi-core hardware. This power, however, requires careful management. As a senior developer, you must be mindful of the overhead of data serialization and design your tasks to be as independent as possible, minimizing the amount of data that needs to be shuffled between processes.

PreviousChapter 9: Threading: Concurrency for I/O-Bound Tasks NextChapter 11: Asyncio: Single-Threaded Concurrency

Last updated 1 month ago

Modern Multiprocessing: ProcessPoolExecutor

The Cost: Data Serialization and IPC

threading vs. multiprocessing: A Guideline

Summary

Modern Multiprocessing: `ProcessPoolExecutor`

`threading` vs. `multiprocessing`: A Guideline