Skip to content

Iterators, Generators, and Classic Coroutines๐Ÿ”—

A Sequence of Words๐Ÿ”—

  • we implement Sentence class in Sequence Protocol thats why its iterable
import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

  def __init__(self, text):
    self.text = text
    self.words = RE_WORD.findall(text)

  def __getitem__(self, index):
    return self.words[index]

  def __len__(self):
    return len(self.words)

  def __repr__(self):
    return "Sequence(%s)" % reprlib.repr(self.text)

Why Sequence Are Iterable: The iter Function๐Ÿ”—

whenever python needs to iterate over an object x it calls iter(x)

  • Calls __iter__ on the object x
  • if __iter__ is not implemented but __get__ is, then iter() creates an iterator that tries to fetch items by index, from 0
  • if above all fails python raises TypeError, saying C Object is not iterable, where C is a class or target object.

all python sequences are iterable by definition since they all implement __getitem__. This is an extreme form of duck typing. An object is considered iterable not only when it implements the special method __iter__, but also when it implements __getitem__

NOTE : classes which implement __getitem__, works fine when passed to iter(x) and return iterator, but when checked with collections.abc.Iterable for isInstance check they fail.

In goose-typing approach, the definition for an iterable is simpler but not as flexible : an object is considered iterable if it implements the __iter__ method. No subclassing or registration is required because abc.Iterable implements __subclasshook__ so your class will pass issubclass and isinstance checks on abc.Iterable

Using iter with Callable๐Ÿ”—

iter_obj = iter(func, sentinel)

NOTE: Here func should be a callable, sentinel raises StopIteration instead of yielding the sentinel

def d6():
  return randint(1, 6)

d6_iter = iter(d6, 1)
for roll in d6_iter:
  print(roll)    # prints till 1 hits and exits # note doesn't print 1

Iterables v/s Iterators๐Ÿ”—

  • Iterable : Any object from which the iter built-in obtains iterator. Objects implementing an __iter__ method returning an iterator are iterable.
  • Python obtains iterator from Iterables
s = "ABC"
for c in s:
  print(c)
  • In above example str ABC is iterable, but there is a iterator behind the scene we use the same in for loop.

Pythonโ€™s standard interface for an iterator has two methods

  • __next__ : Returns the next item in series, raising StopIteration if there are no more.
  • __iter__ : Returns self; this allows iterators to be used where an iterable is expected
# Lib/types.py module source code in python3.9

# Iterators in Python aren't a matter of type but of protocol.  A large
# and changing number of builtin types implement *some* flavor of
# iterator.  Don't check the type!  Use hasattr to check for both
# "__iter__" and "__next__" attributes instead.
  • the best way to check if an object x is an iterator is to call isinstance(x, abc.Iterator). Thanks to Iterator.__subclasshook__, this test works even if the class of x is not a real or virtual subclass of Iterator.

Sentence Classes with __iter__๐Ÿ”—

Sentence Take#2 : A Classic Iterator๐Ÿ”—

# implementation follows the blueprint of the classic Iterator design pattern from the Design Patterns book.

import re
import reprlib

RE_WORD = re.compile(r'\w+')


class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'

    def __iter__(self): # isinstance check passes now
        return SentenceIterator(self.words) # returns iterator


class SentenceIterator:

    def __init__(self, words):
        self.words = words # hold reference to word
        self.index = 0 # starting index

    def __next__(self):
        try:
            word = self.words[self.index]
        except IndexError:
            raise StopIteration()
        self.index += 1
        return word

    def __iter__(self):
        return self

Donโ€™t Make the Iterable an Iterator for Itself๐Ÿ”—

  • Iterables have an __iter__ method that instantiates a new iterator every time.
  • Iterator implement a __next method that returns individual items, and an __iter__ method that returns self.
  • Therefore iterators are also iterable, but iterables are not interators
  • NOTE: defining __next__ in addition to __iter__ in the Sentence class, makes each Sentence instance at the same time an iterable and iterators over itself. But its a common anti-pattern. An Iterator pattern is supposed to support multiple traversals of aggregate objects. that is why each iterator should maintain its internal state.

Sentence Take#3 : A Generator Function๐Ÿ”—

import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

    def __iter__(self):
        for word in self.words:
            yield word # yield current word
        # explicit return in not required

How a generator Works๐Ÿ”—

Any python function with yield keyword in its body is a generator function. So it returns generator object.

A generator function builds a generator object that wraps the body of the function. When we invoke next() on the generator object, execution advances to the next yield in the function body, and the next() call evaluates to the value yielded when the function body is suspended. Finally, the enclosing generator object created by Python raises StopIteration when the function body returns, in accordance with the Iterator protocol.

Lazy Sentence๐Ÿ”—

Sentence Take #4 : Lazy Generator๐Ÿ”—

The Iterator interface is designed to be lazy : next(my_iterator) yields one item at a time.

Our Implementation till now is not lazy because the __init__ eagerly builds a list of all words in the text, binding it to the self.words attribute. This could take as much space as list.

Using an generator function that does regex matching

import re
import reprlib

RE_WORD = re.compile(r'\w+')

class Sentence:

    def __init__(self, text):
        self.text = text # no need to have words array

    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'

    def __iter__(self):
        for match in RE_WORD.finditer(self.text): # builds iter over mathces of RE_WORD
            yield match.group() # extract matched text from the MatchObject instance

Sentence Take#5 : Lazy Generator Expression๐Ÿ”—

As a list comprehension builds lists, a generator expression builds generator objects.

def gen_AB():
  print('start')
  yield 'A'
  print('continue')
  yield 'B'
  print('end.')

res1 = [x*3 for x in gen_AB()]  # list comprehension immideatily evaluates
# start
# continue
# end
for i in res1:
  print('-->', i)
# ---> AAA
# ---> BBB

res2 = (x*3 for x in gen_AB())  # its not consumed instead gets evaluated as called
res2    # generator object <genexp> at 0x...
for i in res2:
  print('-->', i)

start
---> AAA
continue
---> BBB
end.
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:

    def __init__(self, text):
        self.text = text

    def __repr__(self):
        return f'Sentence({reprlib.repr(self.text)})'

    def __iter__(self):
        return (match.group() for match in RE_WORD.finditer(self.text)) # generator expression

When to Use Generator Expressions๐Ÿ”—

  • generator expressions are just syntactic sugar on top generator functions
  • generator expression are short hand and donโ€™t require calling function while gnerator functions are more flexible allowing coding up complex logic.

When a generator expression is passed as the single argument to a function or constructor, you donโ€™t need to write a set of parentheses for the function call and another to enclose the generator expression. A single pair will do, like in the Vector call from the __mul__ method, reproduced here:

def __mul__(self, scalar):
    if isinstance(scalar, numbers.Real):
        return Vector(n * scalar for n in self)
    else:
        return NotImplemented
  • However if there are more args, you need to enclose it in parenthesis to avoid a SyntaxError

An Arithmetic Progression Generator๐Ÿ”—

range built-in generates a bounded arithmetic progression of integers.

class ArithmeticProgression:

    def __init__(self, begin, step, end=None): 
        self.begin = begin
        self.step = step
        self.end = end  # None -> "infinite" series

    def __iter__(self):
        result_type = type(self.begin + self.step) # get type of their sum
        result = result_type(self.begin) # final type
        forever = self.end is None # infinity check
        index = 0
        while forever or result < self.end: # termination
            yield result
            index += 1
            result = self.begin + self.step * index
# same thing using generator function
def aritprog_gen(begin, step, end=None):
    result = type(begin + step)(begin)
    forever = end is None
    index = 0
    while forever or result < end:
        yield result
        index += 1
        result = begin + step * index

Arithmetic Progression using itertools๐Ÿ”—

  • itertools has over 20 generator function that can be combine to make interesting functions
  • Example : itertools.count which return a number yielding generator.
  • NOTE: above function is unbound when no args are passed. Optinally you should pass (start, step). donโ€™t try to cast it in list list(itertools.count()) : this will overflow memory.
import itertools

def aritprog_gen(begin, step, end=None):
    first = type(begin + step)(begin)
    ap_gen = itertools.count(first, step)
    if end is None:
        return ap_gen
    return itertools.takewhile(lambda n: n < end, ap_gen)   # notice we return a generator

Generator Functions in the Standard Library๐Ÿ”—

Module Function Description
itertools compress Consumes two iterables in parallel; yields items from it whenever the corresponding item in selector_it is truthy
itertools dropwhile Consumes it, skipping items while predicate computes truthy, then yields every remaining item (no further checks are made)
built-in filter Applies predicate to each item of iterable, yielding the item if predicate(item) is truthy; if predicate is None, only truthy items are yielded
itertools filterfalse Same as filter, with the predicate logic negated: yields items whenever predicate computes falsy
itertools islice Yields items from a slice of it, similar to s[:stop] or s[start:stop:step] except it can be any iterable, and the operation is lazy
itertools takewhile Yields items while predicate computes truthy, then stops and no further checks are made
def vowel(c):
    return c.lower() in 'aeiou'

list(filter(vowel, 'Aardvark'))
# ['A', 'a', 'a']
list(itertools.filterfalse(vowel, 'Aardvark'))
# ['r', 'd', 'v', 'r', 'k']
list(itertools.dropwhile(vowel, 'Aardvardk')
# ['r', 'd', 'v', 'a', 'r', 'k']
list(itertools.takewhile(vowel, 'Aardvark'))
# ['A', 'a']
list(itertools.compress('Aardvark', (1, 0, 1, 1, 0, 1)))
# ['A', 'r', 'd', 'a']
list(itertools.islice('Aardvark', 4))
# ['A', 'a', 'r', 'd']

Here's the information converted into a markdown table with the source included:

Module Function Description Source
itertools accumulate(it, [func]) Yields accumulated sums; if func is provided, yields the result of applying it to the first pair of items, then to the first result and next item, etc. (built-in)
built-in enumerate(iterable, start=0) Yields 2-tuples of the form (index, item), where index is counted from start, and item is taken from the iterable (built-in)
built-in map(func, it1, [it2, โ€ฆ, itN]) Applies func to each item of it, yielding the result; if N iterables are given, func must take N arguments and the iterables will be consumed in parallel (built-in)
itertools starmap(func, it) Applies func to each item of it, yielding the result; the input iterable should yield iterable items iit, and func is applied as func(*iit) itertools
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]

import itertools

result1 = list(itertools.accumulate(sample))
# Output: [5, 9, 11, 19, 26, 32, 35, 35, 44, 45]

result2 = list(itertools.accumulate(sample, min))
# Output: [5, 4, 2, 2, 2, 2, 2, 0, 0, 0]

result3 = list(itertools.accumulate(sample, max))
# Output: [5, 5, 5, 8, 8, 8, 8, 8, 9, 9]

import operator

result4 = list(itertools.accumulate(sample, operator.mul))
# Output: [5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]

result5 = list(itertools.accumulate(range(1, 11), operator.mul))
# Output: [1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]
# Example 1
result1 = list(enumerate('albatroz', 1))
# Output: [(1, 'a'), (2, 'l'), (3, 'b'), (4, 'a'), (5, 't'), (6, 'r'), (7, 'o'), (8, 'z')]

import operator

result2 = list(map(operator.mul, range(11), range(11)))
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

result3 = list(map(operator.mul, range(11), [2, 4, 8]))
# Output: [0, 4, 16]

result4 = list(map(lambda a, b: (a, b), range(11), [2, 4, 8]))
# Output: [(0, 2), (1, 4), (2, 8)]

import itertools

result5 = list(itertools.starmap(operator.mul, enumerate('albatroz', 1)))
# Output: ['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']

sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]

result6 = list(itertools.starmap(lambda a, b: b / a, enumerate(itertools.accumulate(sample), 1)))
# Output: [5.0, 4.5, 3.6666666666666665, 4.75, 5.2, 5.333333333333333, 5.0, 4.375, 4.888888888888889, 4.5]
Module Function Description
itertools chain(it1, โ€ฆ, itN) Yields all items from it1, then from it2, etc., seamlessly
itertools chain.from_iterable(it) Yields all items from each iterable produced by it, one after the other, seamlessly; it will be an iterable where the items are also iterables, for example, a list of tuples
itertools product(it1, โ€ฆ, itN, repeat=1) Cartesian product: yields N-tuples made by combining items from each input iterable, like nested for loops could produce; repeat allows the input iterables to be consumed more than once
built-in zip(it1, โ€ฆ, itN, strict=False) Yields N-tuples built from items taken from the iterables in parallel, silently stopping when the first iterable is exhausted, unless strict=True is given
itertools zip_longest(it1, โ€ฆ, itN, fillvalue=None) Yields N-tuples built from items taken from the iterables in parallel, stopping only when the last iterable is exhausted, filling the blanks with the fillvalue
Module Function Description
itertools combinations(it, out_len) Yields combinations of out_len items from the items yielded by it
itertools combinations_with_replacement(it, out_len) Yields combinations of out_len items from the items yielded by it, including combinations with repeated items
itertools count(start=0, step=1) Yields numbers starting at start, incremented by step, indefinitely
itertools cycle(it) Yields items from it, storing a copy of each, then yields the entire sequence repeatedly, indefinitely
itertools pairwise(it) Yields successive overlapping pairs taken from the input iterable
itertools permutations(it, out_len=None) Yields permutations of out_len items from the items yielded by it; by default, out_len is len(list(it))
itertools repeat(item, [times]) Yields the given item repeatedly, indefinitely unless a number o
Module Function Description
itertools groupby(it, key=None) Yields 2-tuples of the form (key, group), where key is the grouping criterion and group is a generator yielding the items in the group
built-in reversed(seq) Yields items from seq in reverse order, from last to first; seq must be a sequence or implement the reversed special method
itertools tee(it, n=2) Yields a tuple of n generators, each yielding the items of the input iterable independently

Iterable Reducing Function๐Ÿ”—

Module Function Description
built-in all(it) Returns True if all items in it are truthy, otherwise False; all([]) returns True
built-in any(it) Returns True if any item in it is truthy, otherwise False; any([]) returns False
built-in max(it, [key=,] [default=]) Returns the maximum value of the items in it; a key is an ordering function, as in sorted; default is returned if the iterable is empty
built-in min(it, [key=,] [default=]) Returns the minimum value of the items in it; a key is an ordering function, as in sorted; default is returned if the iterable is empty
functools reduce(func, it, [initial]) Returns the result of applying func to the first pair of items, then to that result and the third item, and so on; if given, initial forms the initial pair with the first item
built-in sum(it, start=0) The sum of all items in it, with the optional start value added (use math.fsum for better precision when adding floats)

Subgenerators with yield from๐Ÿ”—

  • Before yield from keyword was introduced, subgenerators were declared as
def sub_gen():
  yield 1.1
  yield 1.2

def gen():
  yield 1
  for i in sub_gen():
    yield i
  yield 2
  • Now after python3.3
def gen():
  yield 1
  yield from sub_gen()
  yield 2

NOTE:

def sub_gen():
  yield 1.1
  yield 1.2
  return "DONE"

def gen():
  yield 1
  result = yield from sub_gen() # gen delegates yielding to sub_gen, and it yields till its finished.
  print("<--", result)
  yield 2

for x in gen():
  print(x)
# Output :
1
1.1
1.2
<-- Done!
2

Reinventing Chain๐Ÿ”—

def chain(*iterables):
    for it in iterables:
    for i in it:
      yield i

Traversing a Tree (Printing pythonโ€™s Exception Hierarchy)๐Ÿ”—

def tree(cls, lvl = 0):
    yield cls.__name__, lvl  # yield class and its level
    for sub_cls in cls.__subclasses__(): # for each subclass yield subclass and its level
        yield from tree(sub_cls, lvl + 1)

def display(cls):
    for cls_name, level in tree(cls):
        indent = ' ' * 4 * level  # some indentation
        print(f'{indent}{cls_name}')

if __name__ == '__main__':
    display(BaseException)

Generic Iterable Types๐Ÿ”—

  • we can annotate functions which accept iterable arguments using collection.abc.Iterable or typing.Iterable
from collections.abc import Iterable

FromTo = tuple[str, str]

def zip_replace(text: str, changes: Iterable[FromTo])-> str:
  for from_, to in changes:
    text = text.replace(from_, to)
    return text
from collections.abc import Iterator

def fib() -> Iterator[int]  # return type for generators coded as functions with yield
    a, b = 0, 1
  while True:
    yield a
    a, b = b, a+b
  • Note for generators there is a collections.abc.Generator type.
  • Iterator[T] is a shortcut for Generator[T, None, None] : a generator with no args and no return value.

Classic Coroutines๐Ÿ”—

  • generators are commonly used as iterators, but they are also used for coroutines
  • A coroutine is really a generator function, created with yield keyword in its body. And a coroutines object is physically a generator object.
  • The typing documentation describes the formal type parameters of Generator like this:

    Generator[YieldType, SendType, ReturnType]

    The SendType is only relevent in case of using it as a coroutine, return type is only relevant for coroutine, because iterators donโ€™t return values like regular function.

  • Generator type has the same tyupe parameters as typing.Coroutine : Coroutine[YieldType, SendType, ReturnType]

David Beazley created some of the best talks and most comprehensive workshops about classic coroutines. In his PyCon 2009 course handout, he has a slide titled โ€œKeeping It Straight,โ€ which reads:

  • Generators produce data for iteration
  • Coroutines are consumers of data
  • To keep your brain from exploding, donโ€™t mix the two concepts together
  • Coroutines are not related to iteration
  • Note: There is a use of having yield produce a value in a coroutine, but itโ€™s not tied to iteration.

Coroutine to Compute a Running Average๐Ÿ”—

from collections.abc import Generator

def averager() -> Generator[float, float, None]: # return type is none, args is float
    total = 0.0
    count = 0
    average = 0.0
    while True: # infinite
        term = yield average # yield average
        total += term
        count += 1
        average = total/count

coro_avg = averager()
next(coro_avg)  # 0.0   # initial value of average 0
coro_avg.send(10) # 10  # each call to send yield avg
coro_avg.send(30) # 20
coro_avg.send(5) # 15

Returning a Value from a coroutine๐Ÿ”—

from collections.abc import Generator
from typing import Union, NamedTuple

class Result(NamedTuple): # subclass of tupe
    count: int  # type: ignore
    average: float

class Sentinel:
    def __repr__(self): # readable
        return f'<Sentinel>'

STOP = Sentinel()

SendType = Union[float, Sentinel]   # SendType: TypeAlias = float | Sentinel

def averager2(verbose: bool = False) -> Generator[None, SendType, Result]: # yield type is none, it recieved data and returns the result
    total = 0.0
    count = 0
    average = 0.0
    while True:
        term = yield # this yields None, but recieves from send(item)
        if verbose:
            print('received:', term)
        if isinstance(term, Sentinel): # term is a sentinel break from the loop
            break
        total += term
        count += 1
        average = total / count
    return Result(count, average)



coro_avg = averager2()
next(coro_avg)
coro_avg.send(10) # no results
coro_avg.send(30)
coro_avg.send(6.5)
coro_avg.close() # close stops but doesn't return result


coro_avg = averager2()
next(coro_avg)
coro_avg.send(10)
coro_avg.send(30)
coro_avg.send(6.5)
try:
    coro_avg.send(STOP)  # makes sentinel break
except StopIteration as exc:
    result = exc.value # notice value attached to exc

result
Result(count=3, average=15.5)

Generic Type Hints for Classic Coroutines๐Ÿ”—

# typing.py definition of generator
T_co = TypeVar('T_co', covariant=True)
V_co = TypeVar('V_co', covariant=True)
T_contra = TypeVar('T_contra', contravariant=True)

# many lines omitted

class Generator(Iterator[T_co], Generic[T_co, T_contra, V_co],
                extra=_G_base):

That generic type declaration means that a Generator type hint requires those three type parameters weโ€™ve seen before:

my_coro : Generator[YieldType, SendType, ReturnType]

Using the notation introduced in โ€œCovariant typesโ€, the covariance of the first and third parameters is expressed by the :> symbols pointing in the same direction:

                   float :> int
Generator[float, Any, float] :> Generator[int, Any, int]
# --------------------------------------------------------
                     float :> int
Generator[Any, float, Any] <: Generator[Any, int, Any]

Variance rules of thumb

  1. If a formal type parameter defines a type for data that comes out of the object, it can be covariant.
  2. If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant.