Verified leftpad in Idris

The theorem prover showdown gives a description of three simple programs and asks for their verified implementations. I looked specifically at leftpad.

There are implementations in dependently typed languages, but I think they're proving too much. I want a program that's correct by construction, and which therefore does not have to contain proofs at all.

Perhaps the requirements include having external proofs; then my implementation does not qualify. In my reading of the challenge, however, they don't.

Correct by construction

With dependent types, we can explain to the computer what we mean and then, in an interactive dialogue, the computer will guide us through writing the program that does what we said. Because the program was derived from the specification, it naturally conforms to it.

This is very different from the "traditional" approach where we write an implementation, then a specification, and then work hard to show that the two correspond.

What a padding is

Let's start with explaining the rules of padding to the computer. A value of type Padding s target describes the padding that we need to prepend to the string s to obtain a string with length target.

data Padding : (s : List Char) -> (target : Nat) -> Type where
  Pad : (pad : Nat) -> Padding s (length s + pad)
  Nop : Padding (ys ++ x :: xs) (length ys)

The constructor Pad describes adding pad characters to string s to obtain a string of target length length s + pad.

The constructor Nop represents not padding because the string is too long already. We represent this by modelling the total length as the length of a substring of the input string -- the substring is ys, whereas the input string is ys ++ x :: xs, which is strictly longer, even if xs = [].

Computing paddings

Now that we have expressed what we mean, Idris will interactively guide us through writing the following function which computes the padding for any given string and target length.

padding : (s : List Char) -> (target : Nat) -> Padding s target
padding       []     t  = Pad t
padding (x :: xs)    Z  = Nop {ys = []}
padding (x :: xs) (S t) with (padding xs t)
  padding (x :: xs) (S (length xs + pad))      | Pad p = Pad p
  padding (y :: ys ++ z :: zs) (S (length ys)) | Nop   = Nop {ys=y::ys}

The first two cases are easy:

If the string is empty, then we need t padding characters;
If the string is non-empty but the target length is zero then we don't pad.

If both s and target are non-empty/non-zero, then we strip one character from both and compute padding for this smaller case. If we pad in the smaller case, then the padding does not change in the larger case; if we don't, then the target length grows by one because we added y.

The bits to the left of | in the last two lines may look somewhat noisy. Fortunately, I didn't have to write them. The computer did; I just told it "please tell me what the possible cases of padding here are", and the computer answered Pad p and Nop, simultaneously refining the necessary circumstances to the left of the vertical bars.

Padding strings

Finally, we can use this view to implement the actual padding function.

pad : (s : List Char) -> (c : Char) -> (target : Nat) -> List Char
pad s c target with (padding s target)
  pad s c (length s + pad)          | Pad pad  = replicate pad c ++ s
  pad (ys ++ x :: xs) c (length ys) | Nop {ys} = ys ++ x :: xs

First, we compute the amount of padding with padding s target.

If we need to pad with pad characters, we produce replicate pad c ++ s.
If we should not pad, this is because we requested a string of length length ys but provided a longer string, namely ys ++ x :: xs. Therefore we just return this string from the input, ys ++ x :: xs, unchanged.

Again, Idris automatically fills in the circumstances to the left of the vertical bar in both cases. In the first case, it recognises that target = length s + pad. In the second case, it shows that s = ys ++ x :: xs and that target = length ys. These equalities necessarily follow from the fact that the padding is either Pad pad or Nop, respectively.

Is this verified?

You could argue that one could replace replicate pad c with replicate (2*pad) c and the compiler would not complain. Sure, it wouldn't -- but replicate pad c is really glue code (or part of the spec, if you prefer). The core of the algorithm is the function padding -- and there's indeed no space for errors there.

You could write proofs that pad indeed does use the padding computed by padding and not some other padding but this is a definitional equality -- the proof is really trivial and I don't think it would add any value. If you can't trust programs this elementary then you can't trust your specs, either.