Verified leftpad in Idris
The theorem
prover showdown gives a description of three simple programs and
asks for their verified implementations. I looked specifically at
leftpad
.
There are implementations in dependently typed languages, but I think they're proving too much. I want a program that's correct by construction, and which therefore does not have to contain proofs at all.
Perhaps the requirements include having external proofs; then my implementation does not qualify. In my reading of the challenge, however, they don't.
Correct by construction
With dependent types, we can explain to the computer what we mean and then, in an interactive dialogue, the computer will guide us through writing the program that does what we said. Because the program was derived from the specification, it naturally conforms to it.
This is very different from the "traditional" approach where we write an implementation, then a specification, and then work hard to show that the two correspond.
What a padding is
Let's start with explaining the rules of padding to the computer. A
value of type Padding s target
describes the padding that
we need to prepend to the string s
to obtain a string with
length target
.
data Padding : (s : List Char) -> (target : Nat) -> Type where
Pad : (pad : Nat) -> Padding s (length s + pad)
Nop : Padding (ys ++ x :: xs) (length ys)
The constructor Pad
describes adding pad
characters to string s
to obtain a string of target length
length s + pad
.
The constructor Nop
represents not padding because the
string is too long already. We represent this by modelling the total
length as the length of a substring of the input string -- the substring
is ys
, whereas the input string is
ys ++ x :: xs
, which is strictly longer, even if
xs = []
.
Computing paddings
Now that we have expressed what we mean, Idris will interactively guide us through writing the following function which computes the padding for any given string and target length.
padding : (s : List Char) -> (target : Nat) -> Padding s target
= Pad t
padding [] t :: xs) Z = Nop {ys = []}
padding (x :: xs) (S t) with (padding xs t)
padding (x :: xs) (S (length xs + pad)) | Pad p = Pad p
padding (x :: ys ++ z :: zs) (S (length ys)) | Nop = Nop {ys=y::ys} padding (y
The first two cases are easy:
- If the string is empty, then we need
t
padding characters; - If the string is non-empty but the target length is zero then we don't pad.
If both s
and target
are
non-empty/non-zero, then we strip one character from both and compute
padding for this smaller case. If we pad in the smaller case, then the
padding does not change in the larger case; if we don't, then the target
length grows by one because we added y
.
The bits to the left of |
in the last two lines may look
somewhat noisy. Fortunately, I didn't have to write them. The computer
did; I just told it "please tell me what the possible cases of padding
here are", and the computer answered Pad p
and
Nop
, simultaneously refining the necessary circumstances to
the left of the vertical bars.
Padding strings
Finally, we can use this view to implement the actual padding function.
pad : (s : List Char) -> (c : Char) -> (target : Nat) -> List Char
with (padding s target)
pad s c target + pad) | Pad pad = replicate pad c ++ s
pad s c (length s ++ x :: xs) c (length ys) | Nop {ys} = ys ++ x :: xs pad (ys
First, we compute the amount of padding with
padding s target
.
- If we need to pad with
pad
characters, we producereplicate pad c ++ s
. - If we should not pad, this is because we requested a string of
length
length ys
but provided a longer string, namelyys ++ x :: xs
. Therefore we just return this string from the input,ys ++ x :: xs
, unchanged.
Again, Idris automatically fills in the circumstances to the left of
the vertical bar in both cases. In the first case, it recognises that
target = length s + pad
. In the second case, it shows that
s = ys ++ x :: xs
and that target = length ys
.
These equalities necessarily follow from the fact that the padding is
either Pad pad
or Nop
, respectively.
Is this verified?
You could argue that one could replace replicate pad c
with replicate
(2*pad) c
and the compiler would not complain. Sure, it wouldn't
-- but replicate pad c
is really glue code (or part of the
spec, if you prefer). The core of the algorithm is the function
padding
-- and there's indeed no space for errors
there.
You could write proofs that pad
indeed does use the
padding computed by padding
and not some other padding but
this is a definitional equality -- the proof is really trivial and I
don't think it would add any value. If you can't trust programs this
elementary then you can't trust your specs, either.