Chapter 3: The Iteration Protocol and the Power of Generators

In Chapter 1, we saw how implementing the sequence protocol on our FrenchDeck class granted it iteration capabilities as a "free" side effect of implementing __getitem__. While this works, it's an implicit mechanism. The true, explicit mechanism behind all iteration in Python—from for loops to comprehensions to generator expressions—is the iteration protocol.

Understanding this protocol is the key to writing highly memory-efficient and idiomatic Python. It allows you to work with data streams of potentially infinite size without ever needing to hold them in memory all at once.

This chapter covers:

  1. How the iteration protocol (__iter__ and __next__) actually works.

  2. Generators as a vastly superior way to implement iterators.

  3. Generator expressions for lazy, memory-efficient operations.

  4. The advanced yield from syntax for composing generators.

How for Loops Really Work

When you write a for loop, like for item in my_object:, the Python interpreter performs two distinct steps:

  1. It calls iter(my_object) to get an iterator. The built-in iter() function looks for a __iter__ method on your object. This method's job is to return an object that has a __next__ method.

  2. It then repeatedly calls next() on the iterator object returned in step 1. The value returned by next() is assigned to the loop variable (item). This continues until the iterator raises a StopIteration exception, which signals the end of the iteration and terminates the loop gracefully.

An object that has an __iter__ method is called an iterable. The object returned by __iter__, which has the __next__ method, is the iterator.

Generators: Simplified Iterator Creation

Implementing the iteration protocol manually by creating a class with __iter__ and __next__ methods is tedious and stateful. You have to manage the iteration state (e.g., the current index) yourself.

Python provides a much more elegant solution: generators. A function becomes a generator function the moment it contains the yield keyword.

When a generator function is called, it doesn't run the function body. Instead, it immediately returns a generator object, which is a ready-to-use iterator. The yield keyword does two things:

  1. It produces a value to be consumed by the for loop (or next() call).

  2. It pauses the function's execution at that point, saving its entire local state.

When the next value is requested, the function resumes execution right where it left off.

Let's refactor our FrenchDeck to have an explicit __iter__ method implemented as a generator. It is far more readable and direct.

This __iter__ is explicit and clear. It says, "To iterate over me, iterate over my internal _cards list." While the end result is the same as the __getitem__ fallback, this is the canonical way to make an object iterable.

Generator Expressions: Lazy Comprehensions

You are already familiar with list comprehensions, which build a new list in memory all at once:

squares_list = [x*x for x in range(1_000_000)] # Creates a list with a million integers

A generator expression has almost identical syntax but uses parentheses instead of square brackets. Crucially, it does not build a list. It creates a generator object that will produce the values on demand.

squares_gen = (x*x for x in range(1_000_000)) # Creates a generator object

The generator expression is orders of magnitude more memory-efficient. It doesn't allocate a million-item list; it creates a small generator object that knows how to produce the squares when asked. This is the essence of lazy evaluation.

You can use generator expressions anywhere you would use an iterable, such as in a for loop or as an argument to a function that consumes an iterable.

Rule of Thumb: If you are creating a sequence just to iterate over it immediately, use a generator expression. Prefer it over a list comprehension unless you explicitly need list methods like slicing or sorting.

yield from: Delegating to Sub-Generators

The yield from expression, introduced in Python 3.3, is a powerful syntax for composing generators. It allows a generator to delegate part of its operations to another generator.

Imagine you have a function that needs to yield the values from several different iterables in sequence. The old way would involve nested for loops:

With yield from, this becomes much cleaner and more efficient:

The yield from it expression handles the inner loop automatically. It pulls all items from the sub-generator it until it is exhausted, and then the outer generator continues. This is not just syntactic sugar; it establishes a transparent channel between the caller and the sub-generator, which is fundamental to advanced concepts like asyncio coroutines.

Summary

The iteration protocol is a core pillar of Pythonic code. While __getitem__ can provide implicit iteration, an explicit __iter__ method is preferred. Generators, with their yield keyword, are the simplest and most powerful tool for creating custom iterators, allowing you to write highly readable and memory-efficient code. For one-off iterable sequences, generator expressions provide a concise, lazy alternative to list comprehensions. Finally, yield from offers a clean way to compose and delegate between generators, paving the way for more advanced programming patterns.

Last updated