Computer :(

/home/rrix:blog:tags:cgit:rss

Learning Elisp by Solving My Own Problems

I interview a lot of people at work, probably once or twice a week on a quiet week. My questions usually range across the board from operating systems, to shell scripting to programming/CS. I'm not a huge fan of the technical interview lately, but that's a post for another day and until then, we've got to work with what we've got. One of my favorite questions lately is to implement the classic cryptographic cipher the Vigenère cipher. It's a rather simple modified Caeser cipher but it pushes people in to areas they don't know as well and is a good sign of how they tackle more complicated problems.

What follows is my first Emacs Lisp implementation of a naive vigenere cipher, without a decryption function, and without any performance or optomization work done on it. Basically, I spent 20 minutes wading through ElDoc and C-h f to put this together with my meager understanding of Lisps and Emacs Lisp in particular.

Baby's first cipher

There are, generally, two ways to go about solving this problem: the most naive way is to build (Or type out!) a giant dictionary roughly as so (in python for example):

dictionary = {
  'a': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'],
  'b': ['b', 'c', 'd', 'e' ... # get terrible RSI
  .... # oh my god my hands hurt so much
  'z': ['z', 'a', 'b', 'c', #fuck this shit
}

Cool, and now I have a thing where I can iterate over my cipher key, and pull the write dictionary out, and then push in to it the right number of characters and pull out my cipher key. Ouch though. Let's not do that, we can do some nice math to do it:

C_i = E_K(M_i) = (M_i+K_i) \mod {26}

That is to say, the offset of the message character plus the offset of the key character, modulo 26. Pretty simple stuff. Let's write it in Elisp.

Encrypt a single character

Let's build the E_K function, a function that takes a single key character, and a single plaintext character, and ciphers the plaintext character, returning a crypted character.

(defun rrix/make-vigenere-char (keychr plainchr)
  "Convert PLAINCHR to the characeter ciphered by KEYCHR's cipher"
  (let ((key-offset (- keychr ?A))
        (plain-offset (- plainchr ?A)))
    (+ ?A (mod (+ plain-offset key-offset) 26))))

This is ugly, and probably uses the let call unnecessarily. We could write a macro that expands to match (- INPUT ?A) and get rid of the let statement, leaving us with a simple prefix notation of the math above. For now, though, this works, and will convert each character that we need to the cipher characters.

The neat thing about Lisp is that there's no explicit return, it's Elisp functions return the value of the last function called, barring some exceptions provided by the CL emulation layer. The upshot of this is that if I rewrite this using macros, we could even macroize this away. Lots of optimization work could be done here, still.

Iterating over the key and the cipher simultaneously

Cool, so we can encrypt a single character, now how do we go about walking through both of these lists simultaneously?

In Python we have itertools.cycle to push us through the key repeatedly, since we can and should assume that our key is smaller than our plaintext. Iterate over them both simultaneously, and work across them. I couldn't find anything like that in Elisp, and I should find or build one of those myself.

We could pop the first character from the key and push it to the end of the list, and do that until our plaintext is done. That would work, and it'd work nicely, but when I solved this, I solved it a third way, by building a repeated key string that is the same length as the plaintext. For a plaintext of 12 characters, and the initial key of "LEMON" we end up with a string of "LEMONLEMONLE".

 1: (defun rrix/repeat-vigenere-key (len key)
 2:   "Repeat a KEY over LEN characters.
 3: 
 4: Repeating LEMON over 12 characters would yield LEMONLEMONLE"
 5:   (let ((outstr (make-string len ?x))
 6:         (iter 0)
 7:         (keylen (length key)))
 8:     (while (< iter len)
 9:       (aset outstr iter (aref key (mod iter keylen)))
10:       (setq iter (+ iter 1)))
11:     outstr))

I'm not terribly pleased with this one. It's messy and it let s some variables we probably could do with out. Basically, we build in to outstr and return that, by doing modding iter every to keylen. Simple, and it works, but this is an incredibly un-attractive way to do this, and I really should build an itertools.cycle equivalent. I'm also not a fan of the iterator bumping, it's really inelegant.

Putting it all together

With those two building blocks in place, it's pretty easy to combine them:

 1: (defun rrix/vigenere-crypt (key plaintext)
 2:   "Encrypt PLAINTEXT with KEY via the Vigenere cipher and return it as a string.
 3: 
 4: It is assumed, for now, that all input is lowercase and without whitespace or punctuation."
 5:   (let ((pt-length (length plaintext)))
 6:     (let ((repeated-key (rrix/repeat-vigenere-key pt-length key))
 7:           (iter 0)
 8:           (outstr (make-string pt-length ?x)))
 9:       (while (< iter pt-length)
10:         (aset outstr iter (rrix/make-vigenere-char (aref repeated-key iter)
11:                                                    (aref plaintext iter)))
12:         (setq iter (+ iter 1)))
13:       outstr)))

This has a pair of nested let in it. Why do I do this? How can I not do this? usage of pt-length could be macroized away, and I wouldn't need it in the outstr definition. Or I could just find a way to do that all without needing to attack it that way at all.

The meat is the 10, where we pull the correct character out of repeated-key and plaintext, and push them in to rrix/make-vigenere-char. Finally we make sure outstr is the correct value by putting it as the last form in BODY.

What I've learned

I've learned that I can implement a problem that I understand in about 20 minutes in a language I don't understand, so long as it's got good documentation and is decently integrated in to my editor. However, the solution that I came up with is very inefficient and non-idiomatic to the language.

Where can I go to improve this? Let me know.