Ch1 2
An Array of Sequencesπ
There are two types of Sequences
- Container Sequences :
list
,tuple
, andcollections.deque
: holds references to object it contains (of any type) - Flat Sequences :
str
,bytes
, andarray.array
: stores the value of its contents in its own memory space (space efficient but can only hold primitive types)
Another way of grouping sequence is by mutability
- Mutable Sequence (changed) :
list
,bytearray
,array.array
andcollections.deque
- Immutable Sequence :
tuple
,str
andbytes
NOTE: mutable sequences inherit all methods from immutable sequences and implement several additional methods.
List Comprehensionπ
Example : build a list of unicode characters
symbols = 'abcde'
codes = [ord(symbol) for symbol in symbols]
print(codes)
# using a loop
# codes = []
# for symbol in symbols:
# codes.append(ord(symbol))
NOTE : use list comprehension when you are building a list but doing a lot of more than that always use for loops as they are more readible.
Local Scope in List Comprehension
If we use :=
(walrus operator) while assigning value then its accesible even after list comprehension or generator expression.
NOTE: listcomps are as fast as map/filter/reduce operation sometimes outperforming them.
Catersian Productsπ
colors = ['black', 'white']
size = ['S', 'M', 'L']
tshirts = [(color, size) for color in colors for size in sizes]
# it does cartesion product of colors x size
# ouptut : [('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'),
# ('white', 'M'), ('white', 'L')]
Generator Expressions (genexp)π
you could always use listcomp, but genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed to another constructor.
symbols = 'lion'
tuple(ord(symbol) for symbol in symbols) # notice now its not [] instead ()
import array
array.array('I', ord(symbol) for symbol in symbols)
# in previous catesian product lets say if we had 1000 records then its would be very memory heavy to generate all those lists
for tshirts in (f'{c}{s}' for c in colors for s in sizes):
print(tshirt)
Tuplesπ
Tuples are not just immutable list
they can be used for two purpose, immutable list and records with no field names.
lax_coordinates = (33.9425, -118.408056)
# tuple unpacking
city, year, pop, chg, area = ('Tokyo', 2003, 32_450, 0.66, 8014)
# slots based tuple unpacking
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567')]
for passport in sorted(traveler_ids):
print('%s/%s' % passport)
# nested unpacking
name, _, _, (lat, lon) = ('Tokyo', 'JP', 36.933, (35.689722, 139.691667))
Tuple as Immutable Listsπ
Python interpretor and standard library makes extensive use of tuple as immutable list. It has 2 benefits
- Clarity (length never changes)
- Performance (uses less memory than a list of same length)
NOTE: tuple
immutability applies only to references contained in it but if references are mutable objects then they can be modified but not their reference.
Ex : a = (10, 'alpha', [1,2])
, this is a bad practice as list can be modified in this tuple effectively changing tuple.
An object is only hashable if its value cannot ever change, as a result we canβt use unhashable tuple as a dict key or a set element.
# handy function to quickly check if an object is hashable
def fixed(obj):
try:
hash(obj)
except TypeError:
return False
return True
Are Tuples more efficient than list in Python
a, b = b, a # unpacking based swapping trick
# another use is argument unpacking
t = (20, 8)
# q, r = divmod(20,8)
q, r = divmod(*t)
Using *
to Grab Excess Itemsπ
a, b, *rest = range(5) # (1, 2, [3,4,5])
a, *mid, c = range(5) # (1, [2, 3, 4], 5)
*s , b, c = range(5) # ([1, 2, 3, 4, 5])
Unpacking with *
in Function Calls and Sequence Literalsπ
def fun(a, b, c, d, *rest):
return a, b, c, d, rest
fun(*[1,2], 3, *range(4,7)) # (1, 2, 3, 4, (5, 6))
tupl = *range(4), 4 # (1,2,3,4)
lst = [*range(4), 4] # [1,2,3,4]
st = {*range(4), 4, *(5,6,7)} # {1, 2, 3, 4, 5, 6, 7}
Pattern Matching with Sequencesπ
Matching commands like BEEPER 440 3
def handle_command(self, message):
match message:
case ['BEEPER', frequency, times]:
self.beep(times, frequency)
case ['NECK', angle]:
self.rotate_neck(angle)
case ['LED', ident, intensity]:
self.leds[ident].set_color(ident, red, green, blue)
case _:
raise InvalidCommand(message)
It looks like simple C match-case but its supports destructuring
(commonly used in Scala and Elixir).
# note how [] can be multiline
metro_areas = [
('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
]
def func():
for record in metro_ares:
match record:
case [name, _, _, (lat, lon)] if lon <= 0:
print(f"{name:15} | {lat:9.4f} | {lon:9.4f}")
Here sequence pattern matches only if
- subject is a sequence
- subject and pattern have same number of items (could be avoided by variable unpacking but should not use it due to unexpected bugs)
- each corresponding items matches, including nested ones
In standard library these types are compatible with sequence patterns : list, memoryview, array.array, tuple, range, collection.deque
. Unlike unpacking patterns donβt destructure iterables that are not sequences like iterators.
We could also use as
keyword in case : case [name, _, _, (lat, lon) as coord]
.
We could make patterns more specific using types : case [str(name), _, _, (float(lat), float(lon))]
or this also works case[str(name), *_, (float(lat), float(lon))]
(ignore all variables in between)
Interesting Example : Link Peter Norvig wrote lis.py
: an interpretor for a subset of Scheme dialect of List Programming language in 132 lines of code. Notice how he move his code commits from elif to match-case as python 3.10 progresses
Alternative patterns for lambdaπ
Syntax of lambda in Scheme : (lambda(params ...) body1 body2)
, this could be written as case ['lambda', [*params], *body] if body
Shortcut syntax for function definitionπ
syntax : define (name param ...) body1 body2 ...)
case ['define', [Symbol() as name, *params], *body] if body:
env[name] = Procedure(param, body, env)
Slicingπ
A common feature of list
, tuple
, str
and all sequence types in Python is support of slicing operation.
NOTE: slices and ranges exclude the last element (following 0 indexing of C)
NOTE: length of slice = stop-start, slicing an array into two equal part at x is arr[:x]
and arr[x:]
NOTE: notation a:b:c
is only valid within []
, it produces a slice object: slice(a, b, c)
which is evaluated usinng seq.__getitem__(slice(a,b,c))
.
Multidimensional Slicing and Ellipsisπ
[]
operator can take multiple indexes or slices separated by commas. Python calls a.__getitem__((i,j))
to evaluate a[i,j]
. Notice how (i,j) is a tuple. Its used NumPy packages a lot, where we can fetch items of 2D numpy.ndarray
like this : a[m:n, k:l]
Except for memoryview
, the built-in sequence types in python are one-dimensional.
The ellipsis (...
) is recognized as a token by the Python parser. It is an alias to Ellipsis
object. It can be passed as an argument to functions and as part of a slice specification, as in f(a, ..., z)
or a[i:...]
.Numpy uses ...
as a shortcut for multidimensional slicing. for e.g. 4 dimensional array, x[i, ...]
is evaluated as x[i, :, :, :]
Assigning to slicingπ
Mutable objects can be grafted, excised or modified in place using slicing
Using + and * with Sequencesπ
+
concatenates the two same type of sequences while *
concatenates the same sequence to itself. (as many times scalar is specified: [1]*4] = [1, 1, 1, 1]
)
NOTE: take note of a*n
expressions as if you do mylist = [[] * 3]
it will create a list with three references to same list, causing weird behaviour.
Augmented Assignment with Sequencesπ
+=
and *=
behave quite differently depending on first operand. +=
internally calls __iadd__
(in-place addition). In case of mutable sequences (list
, bytearray
, array.array
) a
will be changed in-place more like a.extend(b)
. However if __iadd__
is not implemented then expression will have same effect as a = a + b
, evalute addition and assign to a
.
l = [1,2,3]
id(l) # prints object id
l *= 2
id(l) # should be same as before
# tuple
t = (1,2,3)
id(t)
t *= 2
id(t) # should be differnt id (object is augmented)
list.sort v/s sorted Built-inπ
list.sort
method sorts list in-place, without making copy. It return None
to remind that it changes the receiver and doesnβt create a new list. (IMPORTANT API Convention in python).
sorted
creates a new list and return it. It accepts any iterable object as argument, including immutable sequences and generators. Regardless of type sorted always returns sorted list!
Both function accept two optional, keyword only arguments:
reverse
: If true items are in descending order
key
: A one-argument function that applied to each item to produce its sorting key. default is the item itself but commonly used ones are key=str.lower
or key=len
.
Once sorted you can efficiently search using binary search using bisect
module of python. you can use bisect.insort
to keep your sorted sequences sorted. link
When a List is not the Answerπ
list
is flexible and easy to use, but depending on specific requirements there are better options like array
which save lots of memory when you need to handle millions of floating-point values. If you are constantly removing items from opposite ends of a list use collections.deque
Arrayπ
array.array
is more efficient for storing same type of data. it supports all mutable sepquence operations as well as additional methods for fast loading and saving like .frombytes
and .tofile
from array import array
from random import random
floats = array('d', (random() for i in range(10**7)))
print(floats[-1])
# should use context-manager for this
fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()
# with open('floats.bin', 'wb') as fp:
# floats.tofile(fp)
floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10**7)
fp.close()
floats2[-1]
NOTE: array
doesnβt support in-place sort method like list.sort()
, we will need to use built-in sorted
function to rebuild array
a = array.array(a.typecode, sorted(a))
Memory Viewsπ
The built-in memoryview
class is a shared memory sequence type that lets you handles slices of arrays without copying bytes.
A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets.
from array import array
octets array('B', range(6))
m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]
m2 = m1.cast('B', [2,3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]
m3 = m1.case('B', [3,2])
m3.tolist() # [[0,1],[2,3],[4,5]]
You can use it to create batches from array.
Brief introduction to Numpyπ
For advanced array and matrix operation, Numpy is used. Numpy implements multi-dimensional, homogenous arrays and matrix types that hold not only numbers but also use-defined records and provides efficient element-wise operations.
import numpy as np
a = np.arange(12)
type(a)
print(a.shape)
a.shape = 3, 4
print(a) # [[0,1,2,3],[4,5,6,7],[8,9,10,11]]
print(a[:, 1]) # [1, 5, 9]
a.transpose() # [[0,4,8], [1,5,9], [2,6,10], [3,7,11]]
# trick used in ml to get features and labels
X = a[:,:-1]
y = a[:, -1]
Deques and Other Queuesπ
.append
and .pop
methods make a list
usable as a stack or a queue. But inserting and removing from head of a list is costly because the entire list is shifted in memory.
collections.deque
is a thread-safe double ended queue designed for fast inserting and removing from both ends. Its is also the way to keep list of last seen items or something because it can be bounded(fixed size).
when you try to insert in a full bounded deque, it discards item from opposite end and inserts the item
Besides deque there are some other Python standard library implementations of queues
from collection import deque
dq = deque(range(10), maxlen = 10)
print(dq)
dq.rotate(3)
print(dq)
dq.rotate(-4)
print(dq) # deque is restored
dq.appendleft(-1)
dq.extend([11,12,13]) # notice how lenght is fixed
dq.extendleft([10,20,30]) # notice how items on other end are discarded
queue
: This provides the synchronized(thread-safe) classesSimpleQueue
,Queue
,LifoQueue
andPriorityQueue
. These can be used for safe communication between threads. All exceptSimpleQueue
can be bounded by providing amaxsize
argument greater than 0 to the constructor. However they donβt discard items to make room asdeque
does. Instead it blocks the inserts of a new item until some thread makes room in the queue for items by taking from queue. It is usefull for implement n-worker threads models etc.multiprocessing
: Implements its ownSimpleQueue
and boundedQueue
but designed for interprocess communication. A specializedmultiprocessing.JoinableQueue
is provided for task management.asyncio
: ProvidesQueue
,LifoQueue
,PriorityQueue
andJoinableQueue
with APIs inspired by the class inqueue
andmultiprocessing
modules but adapted for asynchronous programming.headpq
: doesnβt implement queue class instead provide functions likeheappush
andheappop
that lets you use mutable sequence as heap queue or priority queue.