ADTs in Python
I'm writing a piece of software that consists of a Rust core and a Python GUI wrapper. These two components exchange serialised data through a pair of pipes. In Rust, we can model everything neatly using structs and enums -- but how do we mirror the data model in Python?
Requirements:
- 1-to-1 correspondence of data models on both sides (necessary for serialisation)
- full mypy-compatibility
- compositionality
- efficiency -- run-time metaprogramming and fancy pattern matching metaprograms won't work here
Constraints:
- we need to use tuples to save memory in big data structures
- NamedTuple must be the only base class for any subclass
The best compromise I've found so far is the following. Suppose we're writing a generate&test algorithm with four available modes:
from typing import Union, NamedTuple, List
class Exhaustive(NamedTuple):
tag : int = 0
class UniformSample(NamedTuple):
sample_count : int
tag : int = 1
class ExponentialSample(NamedTuple):
sample_count : int
tag : int = 2
class Fixed(NamedTuple):
samples : List[int]
tag : int = 3
Generator = Union[
Exhaustive,
UniformSample,
ExporentialSample,
Fixed,
]
The tag fields are given with default values so you need not worry about them when using the ADT. The defaults have to be filled in manually but this is actually good because we want to keep the numeric tag values in sync with the Rust side of the binary serialisation protocol, anyway. (More on tags below.)
Then we can use this as follows:
gen1 : Generator = Exhaustive()
gen2 : Generator = UniformSample(sample_count=64)
gen3 : Generator = ExponentialSample(sample_count=64)
gen4 : Generator = Fixed(samples=[1,2,3])
Branching on the ADT is a bit clumsy but works:
def to_string(g : Generator) -> str:
if isinstance(g, Exhaustive):
'exhaustive'
elif isinstance(g, UniformSample):
f'{g.sample_count} uniformly random samples'
elif isinstance(g, ExponentialSample):
f'{g.sample_count} exponentially random samples'
elif isinstance(g, Fixed):
f'samples from {g.samples}'
else:
raise ValueError('not a generator')
This works well with mypy, which recognises that in each branch, g is indeed the appropriate subclass and does not report bogus type errors.
The whole thing composes nicely:
class Options(NamedTuple):
random_seed : int
generator : Generator
If your data structure is a struct, not an enum, like Options, there's no need for the tag.
The tags are important because named tuples are simply tuples. Without tags, Python would confuse values constructed by different constructors:
class UniformSampleX(NamedTuple):
sample_count : int
class ExponentialSampleX(NamedTuple):
sample_count : int
>>> UniformSampleX(sample_count=3) == ExponentialSampleX(sample_count=3)
True
>>> UniformSampleX(sample_count=3) == (3,)
True
>>> ExponentialSampleX(sample_count=3) == (3,)
True
Tags are also useful for metaprogramming, which you'll likely need for serialisation:
def enum_tag(ctor : type) -> int:
return ctor._field_defaults['tag']
>>> enum_tag(UniformSample)
1
>>> enum_tag(ExponentialSample)
2
With this approach, we can have neat and compositional serialisation but I will not talk about that here.
Also, by "metaprogramming", I mean "import time" metaprogramming -- the metaprogram that runs once when a module is imported. This does not impede the performance of the program afterwards.