Skip to content

Ch1 2

An Array of SequencesπŸ”—

There are two types of Sequences

  • Container Sequences : list, tuple, and collections.deque : holds references to object it contains (of any type)
  • Flat Sequences : str, bytes, and array.array : stores the value of its contents in its own memory space (space efficient but can only hold primitive types)

Another way of grouping sequence is by mutability

  • Mutable Sequence (changed) : list, bytearray, array.array and collections.deque
  • Immutable Sequence : tuple, str and bytes

NOTE: mutable sequences inherit all methods from immutable sequences and implement several additional methods.

List ComprehensionπŸ”—

Example : build a list of unicode characters

symbols = 'abcde'
codes = [ord(symbol) for symbol in symbols]
print(codes)

# using a loop
# codes = []
# for symbol in symbols:
#       codes.append(ord(symbol))

NOTE : use list comprehension when you are building a list but doing a lot of more than that always use for loops as they are more readible.

Local Scope in List Comprehension

If we use := (walrus operator) while assigning value then its accesible even after list comprehension or generator expression.

x = 'ABC'
codes = [last := ord(c) for c in x]
print(codes)    # [65, 66, 67]
print(last)     # 67

NOTE: listcomps are as fast as map/filter/reduce operation sometimes outperforming them.

Catersian ProductsπŸ”—

colors = ['black', 'white']
size = ['S', 'M', 'L']

tshirts = [(color, size) for color in colors for size in sizes]
# it does cartesion product of colors x size
# ouptut : [('black', 'S'), ('black', 'M'), ('black', 'L'), ('white', 'S'),
# ('white', 'M'), ('white', 'L')]

Generator Expressions (genexp)πŸ”—

you could always use listcomp, but genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed to another constructor.

symbols = 'lion'
tuple(ord(symbol) for symbol in symbols)    # notice now its not [] instead ()
import array
array.array('I', ord(symbol) for symbol in symbols)

# in previous catesian product lets say if we had 1000 records then its would be very memory heavy to generate all those lists
for tshirts in (f'{c}{s}' for c in colors for s in sizes):
    print(tshirt)

TuplesπŸ”—

Tuples are not just immutable list they can be used for two purpose, immutable list and records with no field names.

lax_coordinates = (33.9425, -118.408056)
# tuple unpacking
city, year, pop, chg, area = ('Tokyo', 2003, 32_450, 0.66, 8014)

# slots based tuple unpacking
traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567')]
for passport in sorted(traveler_ids):
  print('%s/%s' % passport)

# nested unpacking 
name, _, _, (lat, lon) = ('Tokyo', 'JP', 36.933, (35.689722, 139.691667))

Tuple as Immutable ListsπŸ”—

Python interpretor and standard library makes extensive use of tuple as immutable list. It has 2 benefits

  • Clarity (length never changes)
  • Performance (uses less memory than a list of same length)

NOTE: tuple immutability applies only to references contained in it but if references are mutable objects then they can be modified but not their reference.

Ex : a = (10, 'alpha', [1,2]) , this is a bad practice as list can be modified in this tuple effectively changing tuple.

An object is only hashable if its value cannot ever change, as a result we can’t use unhashable tuple as a dict key or a set element.

# handy function to quickly check if an object is hashable
def fixed(obj):
  try:
    hash(obj)
  except TypeError:
    return False
  return True

Are Tuples more efficient than list in Python

a, b = b, a # unpacking based swapping trick
# another use is argument unpacking
t = (20, 8)
# q, r = divmod(20,8)
q, r = divmod(*t)

Using * to Grab Excess ItemsπŸ”—

a, b, *rest = range(5)  # (1, 2, [3,4,5])
a, *mid, c = range(5)       # (1, [2, 3, 4], 5)
*s , b, c = range(5)        # ([1, 2, 3, 4, 5])

Unpacking with * in Function Calls and Sequence LiteralsπŸ”—

def fun(a, b, c, d, *rest):
    return a, b, c, d, rest

fun(*[1,2], 3, *range(4,7)) # (1, 2, 3, 4, (5, 6))

tupl = *range(4), 4     # (1,2,3,4)
lst = [*range(4), 4]    # [1,2,3,4]
st = {*range(4), 4, *(5,6,7)} # {1, 2, 3, 4, 5, 6, 7}

Pattern Matching with SequencesπŸ”—

Matching commands like BEEPER 440 3

def handle_command(self, message):
  match message:
    case ['BEEPER', frequency, times]:
      self.beep(times, frequency)
    case ['NECK', angle]:
      self.rotate_neck(angle)
    case ['LED', ident, intensity]:
      self.leds[ident].set_color(ident, red, green, blue)
    case _:
      raise InvalidCommand(message)

It looks like simple C match-case but its supports destructuring (commonly used in Scala and Elixir).

# note how [] can be multiline
metro_areas = [
    ('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),
    ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),
]
def func():
  for record in metro_ares:
    match record:
      case [name, _, _, (lat, lon)] if lon <= 0:
        print(f"{name:15} | {lat:9.4f} | {lon:9.4f}")

Here sequence pattern matches only if

  • subject is a sequence
  • subject and pattern have same number of items (could be avoided by variable unpacking but should not use it due to unexpected bugs)
  • each corresponding items matches, including nested ones

In standard library these types are compatible with sequence patterns : list, memoryview, array.array, tuple, range, collection.deque. Unlike unpacking patterns don’t destructure iterables that are not sequences like iterators.

We could also use as keyword in case : case [name, _, _, (lat, lon) as coord].

We could make patterns more specific using types : case [str(name), _, _, (float(lat), float(lon))] or this also works case[str(name), *_, (float(lat), float(lon))] (ignore all variables in between)

Interesting Example : Link Peter Norvig wrote lis.py : an interpretor for a subset of Scheme dialect of List Programming language in 132 lines of code. Notice how he move his code commits from elif to match-case as python 3.10 progresses

Alternative patterns for lambdaπŸ”—

Syntax of lambda in Scheme : (lambda(params ...) body1 body2), this could be written as case ['lambda', [*params], *body] if body

Shortcut syntax for function definitionπŸ”—

syntax : define (name param ...) body1 body2 ...)

case ['define', [Symbol() as name, *params], *body] if body:
  env[name] = Procedure(param, body, env)

SlicingπŸ”—

A common feature of list, tuple, str and all sequence types in Python is support of slicing operation.

NOTE: slices and ranges exclude the last element (following 0 indexing of C)

# slice = arr[start:stop:step]

NOTE: length of slice = stop-start, slicing an array into two equal part at x is arr[:x] and arr[x:]

NOTE: notation a:b:c is only valid within [], it produces a slice object: slice(a, b, c) which is evaluated usinng seq.__getitem__(slice(a,b,c)).

Multidimensional Slicing and EllipsisπŸ”—

[] operator can take multiple indexes or slices separated by commas. Python calls a.__getitem__((i,j)) to evaluate a[i,j]. Notice how (i,j) is a tuple. Its used NumPy packages a lot, where we can fetch items of 2D numpy.ndarray like this : a[m:n, k:l]

Except for memoryview, the built-in sequence types in python are one-dimensional.

The ellipsis (...) is recognized as a token by the Python parser. It is an alias to Ellipsis object. It can be passed as an argument to functions and as part of a slice specification, as in f(a, ..., z) or a[i:...].Numpy uses ... as a shortcut for multidimensional slicing. for e.g. 4 dimensional array, x[i, ...] is evaluated as x[i, :, :, :]

Assigning to slicingπŸ”—

Mutable objects can be grafted, excised or modified in place using slicing

l = list(range(10))
l[2:5] = [20,30] # modifying
del l[5:7]  # removing items

Using + and * with SequencesπŸ”—

+ concatenates the two same type of sequences while * concatenates the same sequence to itself. (as many times scalar is specified: [1]*4] = [1, 1, 1, 1])

NOTE: take note of a*n expressions as if you do mylist = [[] * 3] it will create a list with three references to same list, causing weird behaviour.

Augmented Assignment with SequencesπŸ”—

+= and *= behave quite differently depending on first operand. += internally calls __iadd__ (in-place addition). In case of mutable sequences (list, bytearray, array.array) a will be changed in-place more like a.extend(b). However if __iadd__ is not implemented then expression will have same effect as a = a + b , evalute addition and assign to a.

l = [1,2,3]
id(l)   # prints object id
l *= 2
id(l) # should be same as before

# tuple
t = (1,2,3)
id(t)
t *= 2
id(t)   # should be differnt id (object is augmented)

list.sort v/s sorted Built-inπŸ”—

list.sort method sorts list in-place, without making copy. It return None to remind that it changes the receiver and doesn’t create a new list. (IMPORTANT API Convention in python).

sorted creates a new list and return it. It accepts any iterable object as argument, including immutable sequences and generators. Regardless of type sorted always returns sorted list!

Both function accept two optional, keyword only arguments:

reverse : If true items are in descending order

key: A one-argument function that applied to each item to produce its sorting key. default is the item itself but commonly used ones are key=str.lower or key=len.

Once sorted you can efficiently search using binary search using bisect module of python. you can use bisect.insort to keep your sorted sequences sorted. link

When a List is not the AnswerπŸ”—

list is flexible and easy to use, but depending on specific requirements there are better options like array which save lots of memory when you need to handle millions of floating-point values. If you are constantly removing items from opposite ends of a list use collections.deque

ArrayπŸ”—

array.array is more efficient for storing same type of data. it supports all mutable sepquence operations as well as additional methods for fast loading and saving like .frombytes and .tofile

from array import array
from random import random
floats = array('d', (random() for i in range(10**7)))
print(floats[-1])
# should use context-manager for this
fp = open('floats.bin', 'wb')
floats.tofile(fp)
fp.close()

# with open('floats.bin', 'wb') as fp:
#   floats.tofile(fp)
floats2 = array('d')
fp = open('floats.bin', 'rb')
floats2.fromfile(fp, 10**7)
fp.close()
floats2[-1]

NOTE: array doesn’t support in-place sort method like list.sort(), we will need to use built-in sorted function to rebuild array

a = array.array(a.typecode, sorted(a))

Memory ViewsπŸ”—

The built-in memoryview class is a shared memory sequence type that lets you handles slices of arrays without copying bytes.

A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets.

from array import array
octets array('B', range(6))
m1 = memoryview(octets)
m1.tolist() # [0, 1, 2, 3, 4, 5]
m2 = m1.cast('B', [2,3])
m2.tolist() # [[0, 1, 2], [3, 4, 5]]
m3 = m1.case('B', [3,2])
m3.tolist() # [[0,1],[2,3],[4,5]]

You can use it to create batches from array.

Brief introduction to NumpyπŸ”—

For advanced array and matrix operation, Numpy is used. Numpy implements multi-dimensional, homogenous arrays and matrix types that hold not only numbers but also use-defined records and provides efficient element-wise operations.

import numpy as np
a = np.arange(12)
type(a)
print(a.shape)
a.shape = 3, 4
print(a) #  [[0,1,2,3],[4,5,6,7],[8,9,10,11]]
print(a[:, 1]) # [1, 5, 9]
a.transpose() # [[0,4,8], [1,5,9], [2,6,10], [3,7,11]]
# trick used in ml to get features and labels
X = a[:,:-1]
y = a[:, -1]

Deques and Other QueuesπŸ”—

.append and .pop methods make a list usable as a stack or a queue. But inserting and removing from head of a list is costly because the entire list is shifted in memory.

collections.deque is a thread-safe double ended queue designed for fast inserting and removing from both ends. Its is also the way to keep list of last seen items or something because it can be bounded(fixed size).

when you try to insert in a full bounded deque, it discards item from opposite end and inserts the item

Besides deque there are some other Python standard library implementations of queues

from collection import deque
dq = deque(range(10), maxlen = 10)
print(dq)
dq.rotate(3)
print(dq)
dq.rotate(-4)
print(dq) # deque is restored
dq.appendleft(-1)
dq.extend([11,12,13]) # notice how lenght is fixed
dq.extendleft([10,20,30]) # notice how items on other end are discarded
  • queue : This provides the synchronized(thread-safe) classes SimpleQueue, Queue, LifoQueue and PriorityQueue. These can be used for safe communication between threads. All except SimpleQueue can be bounded by providing a maxsize argument greater than 0 to the constructor. However they don’t discard items to make room as deque does. Instead it blocks the inserts of a new item until some thread makes room in the queue for items by taking from queue. It is usefull for implement n-worker threads models etc.
  • multiprocessing : Implements its own SimpleQueue and bounded Queue but designed for interprocess communication. A specialized multiprocessing.JoinableQueue is provided for task management.
  • asyncio : Provides Queue, LifoQueue, PriorityQueue and JoinableQueue with APIs inspired by the class in queue and multiprocessing modules but adapted for asynchronous programming.
  • headpq : doesn’t implement queue class instead provide functions like heappush and heappop that lets you use mutable sequence as heap queue or priority queue.