Chapter 3: The Iteration Protocol and the Power of Generators
In Chapter 1, we saw how implementing the sequence protocol on our FrenchDeck class granted it iteration capabilities as a "free" side effect of implementing __getitem__. While this works, it's an implicit mechanism. The true, explicit mechanism behind all iteration in Python—from for loops to comprehensions to generator expressions—is the iteration protocol.
Understanding this protocol is the key to writing highly memory-efficient and idiomatic Python. It allows you to work with data streams of potentially infinite size without ever needing to hold them in memory all at once.
This chapter covers:
How the iteration protocol (
__iter__and__next__) actually works.Generators as a vastly superior way to implement iterators.
Generator expressions for lazy, memory-efficient operations.
The advanced
yield fromsyntax for composing generators.
How for Loops Really Work
for Loops Really WorkWhen you write a for loop, like for item in my_object:, the Python interpreter performs two distinct steps:
It calls
iter(my_object)to get an iterator. The built-initer()function looks for a__iter__method on your object. This method's job is to return an object that has a__next__method.It then repeatedly calls
next()on the iterator object returned in step 1. The value returned bynext()is assigned to the loop variable (item). This continues until the iterator raises aStopIterationexception, which signals the end of the iteration and terminates the loop gracefully.
An object that has an __iter__ method is called an iterable. The object returned by __iter__, which has the __next__ method, is the iterator.
Generators: Simplified Iterator Creation
Implementing the iteration protocol manually by creating a class with __iter__ and __next__ methods is tedious and stateful. You have to manage the iteration state (e.g., the current index) yourself.
Python provides a much more elegant solution: generators. A function becomes a generator function the moment it contains the yield keyword.
When a generator function is called, it doesn't run the function body. Instead, it immediately returns a generator object, which is a ready-to-use iterator. The yield keyword does two things:
It produces a value to be consumed by the
forloop (ornext()call).It pauses the function's execution at that point, saving its entire local state.
When the next value is requested, the function resumes execution right where it left off.
Let's refactor our FrenchDeck to have an explicit __iter__ method implemented as a generator. It is far more readable and direct.
This __iter__ is explicit and clear. It says, "To iterate over me, iterate over my internal _cards list." While the end result is the same as the __getitem__ fallback, this is the canonical way to make an object iterable.
Generator Expressions: Lazy Comprehensions
You are already familiar with list comprehensions, which build a new list in memory all at once:
squares_list = [x*x for x in range(1_000_000)] # Creates a list with a million integers
A generator expression has almost identical syntax but uses parentheses instead of square brackets. Crucially, it does not build a list. It creates a generator object that will produce the values on demand.
squares_gen = (x*x for x in range(1_000_000)) # Creates a generator object
The generator expression is orders of magnitude more memory-efficient. It doesn't allocate a million-item list; it creates a small generator object that knows how to produce the squares when asked. This is the essence of lazy evaluation.
You can use generator expressions anywhere you would use an iterable, such as in a for loop or as an argument to a function that consumes an iterable.
Rule of Thumb: If you are creating a sequence just to iterate over it immediately, use a generator expression. Prefer it over a list comprehension unless you explicitly need list methods like slicing or sorting.
yield from: Delegating to Sub-Generators
yield from: Delegating to Sub-GeneratorsThe yield from expression, introduced in Python 3.3, is a powerful syntax for composing generators. It allows a generator to delegate part of its operations to another generator.
Imagine you have a function that needs to yield the values from several different iterables in sequence. The old way would involve nested for loops:
With yield from, this becomes much cleaner and more efficient:
The yield from it expression handles the inner loop automatically. It pulls all items from the sub-generator it until it is exhausted, and then the outer generator continues. This is not just syntactic sugar; it establishes a transparent channel between the caller and the sub-generator, which is fundamental to advanced concepts like asyncio coroutines.
Summary
The iteration protocol is a core pillar of Pythonic code. While __getitem__ can provide implicit iteration, an explicit __iter__ method is preferred. Generators, with their yield keyword, are the simplest and most powerful tool for creating custom iterators, allowing you to write highly readable and memory-efficient code. For one-off iterable sequences, generator expressions provide a concise, lazy alternative to list comprehensions. Finally, yield from offers a clean way to compose and delegate between generators, paving the way for more advanced programming patterns.
Last updated