Chapter 10: Multiprocessing: True Parallelism for CPU-Bound Tasks
In the previous chapter, we explored how threading provides excellent concurrency for I/O-bound workloads. However, we also established its fundamental limitation for CPU-bound tasks: the Global Interpreter Lock (GIL). Because of the GIL, no matter how many threads you create, only one of them can execute Python bytecode at any given moment. For tasks that require heavy computation, threading offers no performance benefit.
To unlock true parallelism in Python, we must sidestep the GIL. The standard library's solution is the multiprocessing module. Instead of creating threads, which all live within a single process and share a single GIL, this module creates entirely new processes. Each new process gets its own Python interpreter and its own memory space, and therefore, its own GIL. This allows your program to execute different chunks of Python code on different CPU cores simultaneously.
This chapter covers:
Using the modern
concurrent.futures.ProcessPoolExecutor.Understanding the overhead of inter-process communication (IPC).
The role of serialization (pickling) when passing data.
A guideline for when to choose
multiprocessingoverthreading.
Modern Multiprocessing: ProcessPoolExecutor
ProcessPoolExecutorJust as ThreadPoolExecutor is the modern way to manage threads, ProcessPoolExecutor is the preferred high-level interface for managing a pool of worker processes. The API is remarkably consistent, making it easy to switch between the two.
Let's consider a CPU-bound task: finding the largest prime factor for a series of large numbers. This is a purely computational task.
import concurrent.futures
import time
import math
NUMBERS = [
100913_100913,
100913_100919,
100913_100921,
100913_100923,
100913_100929,
100913_100931,
]
def find_largest_prime_factor(n):
"""A simple, slow function to find the largest prime factor of n."""
factor = 2
last_factor = 1
while n > 1:
if n % factor == 0:
last_factor = factor
n //= factor
while n % factor == 0:
n //= factor
factor += 1
return last_factor
def main():
start_time = time.time()
# The 'with' statement ensures the pool is properly shut down.
with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
results = executor.map(find_largest_prime_factor, NUMBERS)
for number, factor in zip(NUMBERS, results):
print(f"Largest prime factor of {number} is {factor}")
print(f"Completed in {time.time() - start_time:.2f} seconds")
if __name__ == "__main__":
main()
If you run this code, you'll see a significant speedup compared to running it sequentially. The operating system distributes the worker processes across the available CPU cores, and since each has its own GIL, the calculations happen in parallel.
Note on if __name__ == "__main__": This guard is essential for multiprocessing on some platforms (like Windows and macOS). When a child process is spawned, it re-imports the script. The guard ensures that the process-spawning code only runs once in the main script, not in every child process, preventing an infinite loop of process creation.
The Cost: Data Serialization and IPC
The power of multiprocessing comes at a price. Unlike threads, which share memory within a single process, processes have their own isolated memory spaces. This is a safety feature—it prevents processes from corrupting each other's state—but it means that any data passed between the main process and a worker process must be carefully packaged and transmitted. This is called Inter-Process Communication (IPC).
In Python, this communication is handled by the pickle module.
Serialization (Pickling): When you submit a task to the
ProcessPoolExecutor, the target function and its arguments are serialized into a byte stream.Transmission: This byte stream is sent from the main process to a worker process (e.g., over a pipe).
Deserialization (Unpickling): The worker process receives the byte stream and reconstructs the function and arguments in its own memory.
The process is reversed for the return value.
This serialization overhead is not trivial. If you send very large objects (e.g., a massive list or a complex custom object) to a worker process, the time spent pickling and unpickling can sometimes outweigh the benefits of parallelism.
Furthermore, not everything is picklable. Objects like open file handles, database connections, and some lambda functions cannot be serialized, and trying to pass them to another process will result in an error.
threading vs. multiprocessing: A Guideline
threading vs. multiprocessing: A GuidelineChoosing between the two models is a critical design decision based on the nature of your task.
Aspect
threading
multiprocessing
Best For
I/O-Bound tasks (networking, file I/O)
CPU-Bound tasks (calculations, data processing)
GIL
All threads share one GIL; no parallelism
Each process has its own GIL; true parallelism
Memory
Threads share memory (easy data sharing)
Processes have separate memory (data is copied)
Overhead
Low startup overhead
High startup overhead
Data Sharing
Prone to race conditions; requires locks
Requires serialization; prone to high IPC cost
Primary Goal
Responsiveness
Performance
Rule of Thumb: Profile your code. If your program is spending most of its time waiting, use threading. If it's spending most of its time executing Python code, use multiprocessing.
Summary
The multiprocessing module is Python's definitive answer to the Global Interpreter Lock for CPU-bound problems. By leveraging separate processes, it enables true parallelism and allows you to take full advantage of multi-core hardware. This power, however, requires careful management. As a senior developer, you must be mindful of the overhead of data serialization and design your tasks to be as independent as possible, minimizing the amount of data that needs to be shuffled between processes.
Last updated