Iterators, Generators, and Classic Coroutines๐
A Sequence of Words๐
- we implement
Sentence
class inSequence
Protocol thats why its iterable
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __getitem__(self, index):
return self.words[index]
def __len__(self):
return len(self.words)
def __repr__(self):
return "Sequence(%s)" % reprlib.repr(self.text)
Why Sequence Are Iterable: The iter
Function๐
whenever python needs to iterate over an object x
it calls iter(x)
- Calls
__iter__
on the objectx
- if
__iter__
is not implemented but__get__
is, theniter()
creates an iterator that tries to fetch items by index, from0
- if above all fails python raises
TypeError
, sayingC Object is not iterable
, whereC
is a class or target object.
all python sequences are iterable by definition since they all implement __getitem__
. This is an extreme form of duck typing. An object is considered iterable not only when it implements the special method __iter__
, but also when it implements __getitem__
NOTE : classes which implement __getitem__
, works fine when passed to iter(x)
and return iterator, but when checked with collections.abc.Iterable
for isInstance
check they fail.
In goose-typing approach, the definition for an iterable is simpler but not as flexible : an object is considered iterable if it implements the __iter__
method. No subclassing or registration is required because abc.Iterable
implements __subclasshook__
so your class will pass issubclass
and isinstance
checks on abc.Iterable
Using iter with Callable๐
NOTE: Here func
should be a callable, sentinel raises StopIteration
instead of yielding the sentinel
def d6():
return randint(1, 6)
d6_iter = iter(d6, 1)
for roll in d6_iter:
print(roll) # prints till 1 hits and exits # note doesn't print 1
Iterables v/s Iterators๐
- Iterable : Any object from which the
iter
built-in obtains iterator. Objects implementing an__iter__
method returning an iterator are iterable. - Python obtains iterator from Iterables
- In above example
str
ABC
is iterable, but there is a iterator behind the scene we use the same infor
loop.
Pythonโs standard interface for an iterator has two methods
__next__
: Returns the next item in series, raisingStopIteration
if there are no more.__iter__
: Returnsself
; this allows iterators to be used where an iterable is expected
# Lib/types.py module source code in python3.9
# Iterators in Python aren't a matter of type but of protocol. A large
# and changing number of builtin types implement *some* flavor of
# iterator. Don't check the type! Use hasattr to check for both
# "__iter__" and "__next__" attributes instead.
- the best way to check if an object
x
is an iterator is to callisinstance(x, abc.Iterator)
. Thanks toIterator.__subclasshook__
, this test works even if the class ofx
is not a real or virtual subclass ofIterator
.
Sentence Classes with __iter__
๐
Sentence Take#2 : A Classic Iterator๐
# implementation follows the blueprint of the classic Iterator design pattern from the Design Patterns book.
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return f'Sentence({reprlib.repr(self.text)})'
def __iter__(self): # isinstance check passes now
return SentenceIterator(self.words) # returns iterator
class SentenceIterator:
def __init__(self, words):
self.words = words # hold reference to word
self.index = 0 # starting index
def __next__(self):
try:
word = self.words[self.index]
except IndexError:
raise StopIteration()
self.index += 1
return word
def __iter__(self):
return self
Donโt Make the Iterable an Iterator for Itself๐
- Iterables have an
__iter__
method that instantiates a new iterator every time. - Iterator implement a
__next
method that returns individual items, and an__iter__
method that returns self. - Therefore iterators are also iterable, but iterables are not interators
- NOTE: defining
__next__
in addition to__iter__
in the Sentence class, makes each Sentence instance at the same time an iterable and iterators over itself. But its a common anti-pattern. An Iterator pattern is supposed to support multiple traversals of aggregate objects. that is why each iterator should maintain its internal state.
Sentence Take#3 : A Generator Function๐
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
self.words = RE_WORD.findall(text)
def __repr__(self):
return 'Sentence(%s)' % reprlib.repr(self.text)
def __iter__(self):
for word in self.words:
yield word # yield current word
# explicit return in not required
How a generator Works๐
Any python function with yield
keyword in its body is a generator function. So it returns generator object.
A generator function builds a generator object that wraps the body of the function. When we invoke next()
on the generator object, execution advances to the next yield
in the function body, and the next()
call evaluates to the value yielded when the function body is suspended. Finally, the enclosing generator object created by Python raises StopIteration
when the function body returns, in accordance with the Iterator
protocol.
Lazy Sentence๐
Sentence Take #4 : Lazy Generator๐
The Iterator
interface is designed to be lazy : next(my_iterator)
yields one item at a time.
Our Implementation till now is not lazy because the __init__
eagerly builds a list of all words in the text, binding it to the self.words
attribute. This could take as much space as list.
Using an generator function that does regex matching
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text # no need to have words array
def __repr__(self):
return f'Sentence({reprlib.repr(self.text)})'
def __iter__(self):
for match in RE_WORD.finditer(self.text): # builds iter over mathces of RE_WORD
yield match.group() # extract matched text from the MatchObject instance
Sentence Take#5 : Lazy Generator Expression๐
As a list comprehension builds lists, a generator expression builds generator objects.
def gen_AB():
print('start')
yield 'A'
print('continue')
yield 'B'
print('end.')
res1 = [x*3 for x in gen_AB()] # list comprehension immideatily evaluates
# start
# continue
# end
for i in res1:
print('-->', i)
# ---> AAA
# ---> BBB
res2 = (x*3 for x in gen_AB()) # its not consumed instead gets evaluated as called
res2 # generator object <genexp> at 0x...
for i in res2:
print('-->', i)
start
---> AAA
continue
---> BBB
end.
import re
import reprlib
RE_WORD = re.compile(r'\w+')
class Sentence:
def __init__(self, text):
self.text = text
def __repr__(self):
return f'Sentence({reprlib.repr(self.text)})'
def __iter__(self):
return (match.group() for match in RE_WORD.finditer(self.text)) # generator expression
When to Use Generator Expressions๐
- generator expressions are just syntactic sugar on top generator functions
- generator expression are short hand and donโt require calling function while gnerator functions are more flexible allowing coding up complex logic.
When a generator expression is passed as the single argument to a function or constructor, you donโt need to write a set of parentheses for the function call and another to enclose the generator expression. A single pair will do, like in the Vector call from the __mul__
method, reproduced here:
def __mul__(self, scalar):
if isinstance(scalar, numbers.Real):
return Vector(n * scalar for n in self)
else:
return NotImplemented
- However if there are more args, you need to enclose it in parenthesis to avoid a
SyntaxError
An Arithmetic Progression Generator๐
range
built-in generates a bounded arithmetic progression of integers.
class ArithmeticProgression:
def __init__(self, begin, step, end=None):
self.begin = begin
self.step = step
self.end = end # None -> "infinite" series
def __iter__(self):
result_type = type(self.begin + self.step) # get type of their sum
result = result_type(self.begin) # final type
forever = self.end is None # infinity check
index = 0
while forever or result < self.end: # termination
yield result
index += 1
result = self.begin + self.step * index
# same thing using generator function
def aritprog_gen(begin, step, end=None):
result = type(begin + step)(begin)
forever = end is None
index = 0
while forever or result < end:
yield result
index += 1
result = begin + step * index
Arithmetic Progression using itertools๐
itertools
has over 20 generator function that can be combine to make interesting functions- Example :
itertools.count
which return a number yielding generator. - NOTE: above function is unbound when no args are passed. Optinally you should pass
(start, step)
. donโt try to cast it in listlist(itertools.count())
: this will overflow memory.
import itertools
def aritprog_gen(begin, step, end=None):
first = type(begin + step)(begin)
ap_gen = itertools.count(first, step)
if end is None:
return ap_gen
return itertools.takewhile(lambda n: n < end, ap_gen) # notice we return a generator
Generator Functions in the Standard Library๐
Module | Function | Description |
---|---|---|
itertools | compress | Consumes two iterables in parallel; yields items from it whenever the corresponding item in selector_it is truthy |
itertools | dropwhile | Consumes it , skipping items while predicate computes truthy, then yields every remaining item (no further checks are made) |
built-in | filter | Applies predicate to each item of iterable, yielding the item if predicate(item) is truthy; if predicate is None, only truthy items are yielded |
itertools | filterfalse | Same as filter , with the predicate logic negated: yields items whenever predicate computes falsy |
itertools | islice | Yields items from a slice of it , similar to s[:stop] or s[start:stop:step] except it can be any iterable, and the operation is lazy |
itertools | takewhile | Yields items while predicate computes truthy, then stops and no further checks are made |
def vowel(c):
return c.lower() in 'aeiou'
list(filter(vowel, 'Aardvark'))
# ['A', 'a', 'a']
list(itertools.filterfalse(vowel, 'Aardvark'))
# ['r', 'd', 'v', 'r', 'k']
list(itertools.dropwhile(vowel, 'Aardvardk')
# ['r', 'd', 'v', 'a', 'r', 'k']
list(itertools.takewhile(vowel, 'Aardvark'))
# ['A', 'a']
list(itertools.compress('Aardvark', (1, 0, 1, 1, 0, 1)))
# ['A', 'r', 'd', 'a']
list(itertools.islice('Aardvark', 4))
# ['A', 'a', 'r', 'd']
Here's the information converted into a markdown table with the source included:
Module | Function | Description | Source |
---|---|---|---|
itertools | accumulate(it, [func]) | Yields accumulated sums; if func is provided, yields the result of applying it to the first pair of items, then to the first result and next item, etc. | (built-in) |
built-in | enumerate(iterable, start=0) | Yields 2-tuples of the form (index, item), where index is counted from start, and item is taken from the iterable | (built-in) |
built-in | map(func, it1, [it2, โฆ, itN]) | Applies func to each item of it, yielding the result; if N iterables are given, func must take N arguments and the iterables will be consumed in parallel | (built-in) |
itertools | starmap(func, it) | Applies func to each item of it, yielding the result; the input iterable should yield iterable items iit, and func is applied as func(*iit) | itertools |
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
import itertools
result1 = list(itertools.accumulate(sample))
# Output: [5, 9, 11, 19, 26, 32, 35, 35, 44, 45]
result2 = list(itertools.accumulate(sample, min))
# Output: [5, 4, 2, 2, 2, 2, 2, 0, 0, 0]
result3 = list(itertools.accumulate(sample, max))
# Output: [5, 5, 5, 8, 8, 8, 8, 8, 9, 9]
import operator
result4 = list(itertools.accumulate(sample, operator.mul))
# Output: [5, 20, 40, 320, 2240, 13440, 40320, 0, 0, 0]
result5 = list(itertools.accumulate(range(1, 11), operator.mul))
# Output: [1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]
# Example 1
result1 = list(enumerate('albatroz', 1))
# Output: [(1, 'a'), (2, 'l'), (3, 'b'), (4, 'a'), (5, 't'), (6, 'r'), (7, 'o'), (8, 'z')]
import operator
result2 = list(map(operator.mul, range(11), range(11)))
# Output: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
result3 = list(map(operator.mul, range(11), [2, 4, 8]))
# Output: [0, 4, 16]
result4 = list(map(lambda a, b: (a, b), range(11), [2, 4, 8]))
# Output: [(0, 2), (1, 4), (2, 8)]
import itertools
result5 = list(itertools.starmap(operator.mul, enumerate('albatroz', 1)))
# Output: ['a', 'll', 'bbb', 'aaaa', 'ttttt', 'rrrrrr', 'ooooooo', 'zzzzzzzz']
sample = [5, 4, 2, 8, 7, 6, 3, 0, 9, 1]
result6 = list(itertools.starmap(lambda a, b: b / a, enumerate(itertools.accumulate(sample), 1)))
# Output: [5.0, 4.5, 3.6666666666666665, 4.75, 5.2, 5.333333333333333, 5.0, 4.375, 4.888888888888889, 4.5]
Module | Function | Description |
---|---|---|
itertools | chain(it1, โฆ, itN) | Yields all items from it1, then from it2, etc., seamlessly |
itertools | chain.from_iterable(it) | Yields all items from each iterable produced by it, one after the other, seamlessly; it will be an iterable where the items are also iterables, for example, a list of tuples |
itertools | product(it1, โฆ, itN, repeat=1) | Cartesian product: yields N-tuples made by combining items from each input iterable, like nested for loops could produce; repeat allows the input iterables to be consumed more than once |
built-in | zip(it1, โฆ, itN, strict=False) | Yields N-tuples built from items taken from the iterables in parallel, silently stopping when the first iterable is exhausted, unless strict=True is given |
itertools | zip_longest(it1, โฆ, itN, fillvalue=None) | Yields N-tuples built from items taken from the iterables in parallel, stopping only when the last iterable is exhausted, filling the blanks with the fillvalue |
Module | Function | Description |
---|---|---|
itertools | combinations(it, out_len) | Yields combinations of out_len items from the items yielded by it |
itertools | combinations_with_replacement(it, out_len) | Yields combinations of out_len items from the items yielded by it, including combinations with repeated items |
itertools | count(start=0, step=1) | Yields numbers starting at start, incremented by step, indefinitely |
itertools | cycle(it) | Yields items from it, storing a copy of each, then yields the entire sequence repeatedly, indefinitely |
itertools | pairwise(it) | Yields successive overlapping pairs taken from the input iterable |
itertools | permutations(it, out_len=None) | Yields permutations of out_len items from the items yielded by it; by default, out_len is len(list(it)) |
itertools | repeat(item, [times]) | Yields the given item repeatedly, indefinitely unless a number o |
Module | Function | Description |
---|---|---|
itertools | groupby(it, key=None) | Yields 2-tuples of the form (key, group), where key is the grouping criterion and group is a generator yielding the items in the group |
built-in | reversed(seq) | Yields items from seq in reverse order, from last to first; seq must be a sequence or implement the reversed special method |
itertools | tee(it, n=2) | Yields a tuple of n generators, each yielding the items of the input iterable independently |
Iterable Reducing Function๐
Module | Function | Description |
---|---|---|
built-in | all(it) | Returns True if all items in it are truthy, otherwise False; all([]) returns True |
built-in | any(it) | Returns True if any item in it is truthy, otherwise False; any([]) returns False |
built-in | max(it, [key=,] [default=]) | Returns the maximum value of the items in it; a key is an ordering function, as in sorted; default is returned if the iterable is empty |
built-in | min(it, [key=,] [default=]) | Returns the minimum value of the items in it; a key is an ordering function, as in sorted; default is returned if the iterable is empty |
functools | reduce(func, it, [initial]) | Returns the result of applying func to the first pair of items, then to that result and the third item, and so on; if given, initial forms the initial pair with the first item |
built-in | sum(it, start=0) | The sum of all items in it, with the optional start value added (use math.fsum for better precision when adding floats) |
Subgenerators with yield from๐
- Before
yield from
keyword was introduced, subgenerators were declared as
- Now after python3.3
NOTE:
def sub_gen():
yield 1.1
yield 1.2
return "DONE"
def gen():
yield 1
result = yield from sub_gen() # gen delegates yielding to sub_gen, and it yields till its finished.
print("<--", result)
yield 2
for x in gen():
print(x)
# Output :
1
1.1
1.2
<-- Done!
2
Reinventing Chain๐
Traversing a Tree (Printing pythonโs Exception Hierarchy)๐
def tree(cls, lvl = 0):
yield cls.__name__, lvl # yield class and its level
for sub_cls in cls.__subclasses__(): # for each subclass yield subclass and its level
yield from tree(sub_cls, lvl + 1)
def display(cls):
for cls_name, level in tree(cls):
indent = ' ' * 4 * level # some indentation
print(f'{indent}{cls_name}')
if __name__ == '__main__':
display(BaseException)
Generic Iterable Types๐
- we can annotate functions which accept iterable arguments using
collection.abc.Iterable
ortyping.Iterable
from collections.abc import Iterable
FromTo = tuple[str, str]
def zip_replace(text: str, changes: Iterable[FromTo])-> str:
for from_, to in changes:
text = text.replace(from_, to)
return text
from collections.abc import Iterator
def fib() -> Iterator[int] # return type for generators coded as functions with yield
a, b = 0, 1
while True:
yield a
a, b = b, a+b
- Note for generators there is a
collections.abc.Generator
type. Iterator[T]
is a shortcut forGenerator[T, None, None]
: a generator with no args and no return value.
Classic Coroutines๐
- generators are commonly used as iterators, but they are also used for coroutines
- A coroutine is really a generator function, created with
yield
keyword in its body. And a coroutines object is physically a generator object.
-
The typing documentation describes the formal type parameters of Generator like this:
Generator[YieldType, SendType, ReturnType]
The
SendType
is only relevent in case of using it as a coroutine, return type is only relevant for coroutine, because iterators donโt return values like regular function.
- Generator type has the same tyupe parameters as
typing.Coroutine
:Coroutine[YieldType, SendType, ReturnType]
David Beazley created some of the best talks and most comprehensive workshops about classic coroutines. In his PyCon 2009 course handout, he has a slide titled โKeeping It Straight,โ which reads:
- Generators produce data for iteration
- Coroutines are consumers of data
- To keep your brain from exploding, donโt mix the two concepts together
- Coroutines are not related to iteration
- Note: There is a use of having
yield
produce a value in a coroutine, but itโs not tied to iteration.
Coroutine to Compute a Running Average๐
from collections.abc import Generator
def averager() -> Generator[float, float, None]: # return type is none, args is float
total = 0.0
count = 0
average = 0.0
while True: # infinite
term = yield average # yield average
total += term
count += 1
average = total/count
coro_avg = averager()
next(coro_avg) # 0.0 # initial value of average 0
coro_avg.send(10) # 10 # each call to send yield avg
coro_avg.send(30) # 20
coro_avg.send(5) # 15
Returning a Value from a coroutine๐
from collections.abc import Generator
from typing import Union, NamedTuple
class Result(NamedTuple): # subclass of tupe
count: int # type: ignore
average: float
class Sentinel:
def __repr__(self): # readable
return f'<Sentinel>'
STOP = Sentinel()
SendType = Union[float, Sentinel] # SendType: TypeAlias = float | Sentinel
def averager2(verbose: bool = False) -> Generator[None, SendType, Result]: # yield type is none, it recieved data and returns the result
total = 0.0
count = 0
average = 0.0
while True:
term = yield # this yields None, but recieves from send(item)
if verbose:
print('received:', term)
if isinstance(term, Sentinel): # term is a sentinel break from the loop
break
total += term
count += 1
average = total / count
return Result(count, average)
coro_avg = averager2()
next(coro_avg)
coro_avg.send(10) # no results
coro_avg.send(30)
coro_avg.send(6.5)
coro_avg.close() # close stops but doesn't return result
coro_avg = averager2()
next(coro_avg)
coro_avg.send(10)
coro_avg.send(30)
coro_avg.send(6.5)
try:
coro_avg.send(STOP) # makes sentinel break
except StopIteration as exc:
result = exc.value # notice value attached to exc
result
Result(count=3, average=15.5)
Generic Type Hints for Classic Coroutines๐
# typing.py definition of generator
T_co = TypeVar('T_co', covariant=True)
V_co = TypeVar('V_co', covariant=True)
T_contra = TypeVar('T_contra', contravariant=True)
# many lines omitted
class Generator(Iterator[T_co], Generic[T_co, T_contra, V_co],
extra=_G_base):
That generic type declaration means that a Generator
type hint requires those three type parameters weโve seen before:
Using the notation introduced in โCovariant typesโ, the covariance of the first and third parameters is expressed by the :> symbols pointing in the same direction:
float :> int
Generator[float, Any, float] :> Generator[int, Any, int]
# --------------------------------------------------------
float :> int
Generator[Any, float, Any] <: Generator[Any, int, Any]
Variance rules of thumb
- If a formal type parameter defines a type for data that comes out of the object, it can be covariant.
- If a formal type parameter defines a type for data that goes into the object after its initial construction, it can be contravariant.