Stateless Tests in a Stateful Protocol

January 25th, 2010

0

Lets suppose you want to test the correctness of a protocol. I know I do, all the time; get me that input fuzzer. A typical way to start might be a minor variation on how I learned to test my code in CSE 142 and hadn’t really thought too much about since: plug in some “representative” values and see if it does the right thing. But let’s suppose someone came over with a pile of documentation and said “Here. Some genius in the back room made this insanely complex server that apparently speaks this protocol, and somehow managed to do it without so much as a single unit test. Can you test it? By the way, we want some guarantees about security.” Writing a representative case for every little logical edge case of a protocol could take months, and maybe you had pictured yourself doing more glamorous things. You’re a hacker, so you look for a higher-order solution to the problem. For concreteness, lets say they handed us this implementation of a 1-dimensional go server.

I occasionally use a particular heuristic in software design. I can’t remember where I found it and don’t know why it works, but it seems to do well for me. If the code for a program is more than an order of magnitude or so longer than some English that fully specifies it, it’s probably written in the wrong language — not the wrong programming language necessarily, but the wrong primatives, statements, logical connections, etc. Here, we have a nice English protocol spec, so I imagine that the code to test the protocol should only be a little longer, barring whatever code I need to set up the right language and environment for the real core of the test code. If we are testing a network server, our tests will need the same primatives that a client would have, such as code to set up the connection, make well-formed requests, etc. But even after searching CPAN or cannibalizing the client code from the other team, a quick napkin calculation says that any tests composed of just sending representative requests and checking the response would have to be much longer than the protocol specification if we’re going to have any confidence that it’s comprehensive. JUnit or srfi-64 are only going to get us so far. With only these tools, we might imagine testing this protocol with something like this:

I’ve written tests like this. They’re unmaintainable and the read like assembly code. The worst part is, how do you know that the test does what you want? How do you know if your pre-calculated expected responses are correct? If someone doesn’t trust you, how hard would it be for them to read the code and be convinced that the test works, or that it is comprehensive? I think the problem here is that our test above is written in the wrong language. Not only is the sort of basic furniture we expect absent, such as notions of groups and liberties, but we’re forced to deal with things in a stateful, procedural way which over-specifies our real desires. This is like saying “Find a car, fill the gas tank, write down the odometer reading, drive the car for a while, refill the gas tank, write down how much fuel you put in, divide that number by the absolute value of the difference in the odometer readings. If that number is greater than or equal to 40, the car is fuel-efficient.” instead of saying “Fuel-efficient cars get at least 40 miles per gallon.” It feels like my taxes — if only the IRS could just tell me what they really want!

So what language features can we introduce that will help us make the tests easier to understand, easier to write, and more confidence-instilling? If we could dream up an ideal language to write our tests in, what would it be? What are tests supposed to say anyway? What do we want to know? I think the answer can be relatively simple: we want to know that the software does things it is supposed to do, and doesn’t do things it isn’t supposed to do. There are probably also a bunch of things we don’t care about. We ostensibly know what we do care about, because it’s in the specification. Thus, a test is just a set of constraints on the behavior of a program. Could we just rephrase the documentation into a computable form, and have these as the tests for 1ndigo? What about this:

Surely anything the software did within these constraints would be correct, and anything outside of them would be incorrect. This has to be, because all we did was rephrase the documentation and clarify it a bit. The test above still looks suspiciously like English, but hopefully now we can at least imagine writing it in scheme or python without too much extra word-count, assuming we had set up a good environment for talking about ideas about liberties, groups, board states, stone colors, and so on. Granted, setting up this environment may be a royal pain in the ass. In some cases it may be as hard to do as the thing you’re testing — what if the complexity of the server that we’re trying to test lies primarily in how it calculates groups and liberties? Then are we not just implementing our server twice and comparing the implementations? These are important questions, and often the answer will depend on the ratio of the complexity of the conceptual model governing a protocol to the complexity of the implementation of the server. In the case of say, the HTTP protocol, the conceptual model is much simpler than a typical implementation.

And so far, we have still left out a big piece of the testing problem: How do we get the software to do stuff so that we can see whether whatever it does follows our constraints? Presumably, we would at least have to drive a car to measure its fuel efficiency, even if no one gave us a procedure for doing so. What if clear-board radioed the mars lander, put-stone ordered X dozen eggs from Amazon Fresh, and show-board returned the board state and then promptly opened the hyena cage at the zoo and proceeded to put a random stone every minute until the zookeepers wrangled the hyenas back in? If all we knew of the behavior of this server implementation was its response to the initial request to show-board, we would be forced to conclude that it fulfilled all of the constraints outlined above and behaved correctly.

So the next question is, how do we excite the server to do enough different things that we can be confident that it is behaving correctly on all inputs? In the case of our 1ndigo, the space of valid protocol requests is small, so we can just enumerate it: show-board, clear-board, and a put-stone for each of the points — just 21 possibilities. However, making sure that we ran each of these still wouldn’t comprehensively test our protocol, because there are many possible board states, and we expect different responses depending on the board state. Unluckily for us, there could be as many as 1,162,261,467 different possible board states (real go has something like 10^100, to an order of magnitude in order of magnitude). Building up state is an art form, like making a painting. In this case, lets just do this and call it good:

To cover more cases, we could replay the games of great masters, or try to guess at what would break our particular implementation. At that point, we’ve already entered the established art of the input-fuzzing hacker and state space speculator. Perhaps we could use some fuzzing techniques too — why not?

At this point, you are thinking that I forgot that I called the article
“Stateless Tests for a Stateful Protocol”, and wondering whether I’ll get to the punchline. I was going to call it “The Unbearable Lightness of Being” but that name was already taken. The "Stateless"ness is what will allow us to tie all these ideas together. Lets forget our go-protocol and jump back into the abstract.

Testing Constraints on a State Space

If you examine our constraints, you’ll notice that they never make reference to any particular state, so I call them stateless. Of course, coming from a functional programming community, we know that statelessness is bliss, much like selflessness, and non-being. Our constraints are all just logical relations between input and output, so all we need is to be clear about what exact input and output we’re talking about. For any protocol composed of requests that generate responses, such as 1ndigo, we would expect for the constraints to partition the cartesian product of requests and responses. That is, given a request and a response, our constraints should tell us whether they are valid or not. This is true on its own in a stateless protocol, but in a stateful protocol, the validity of a given request and response also depends on, well, the state. That’s fine, because we can just as easily model our constraint function with an “environmental” dependence:

Easily said, but what is the environment exactly? Well, apparently if we’re writing constraints about it, we have a complete conceptual model of it and how it relates to the protocol. That’s all that really matters. In general, however, it may change depending on the requests we send and responses we get, so we may also need to update it. If there are extrinsic factors involved in modeling the environment, maybe we want to record them before, during, or after the request-response exchange.

Then of course, we need a scheme for generating requests in the protocol. In general, these may also depend on the environment, so maybe instead of passing around explicit requests, we can pass around functions that will generate requests by filling in appropriate values from the environment where necessary.

With these ideas, we can fairly straightforwardly chain protocol requests together like so:

This chaining allows us to separate the bulk of our work during testing (elaborating constraints and enumerating ways to change the server state) into two independent components that are both made entirely of pure functions. The only part of our code that has side-effects is neatly buried in run-exchange: dispatch-request. Since run-exchanges is just a fold, it will sequence requests in the same order that it gets them, because each run-exchange depends functionally on the previous exchange via the environment parameter. This bears some resemblance to Haskell’s IO Monad, although here one role is inverted: Rather than inventing an environmental state parameter in order to introduce a functional dependence in the case of the IO Monad, here we are introducing a functional dependence so that we can model the environment explicitly. In both cases, we are reducing all the code that has side-effects to a very restricted quarantine. In the rest of the world, we’ve turned a nasty stateful testing process into a pure-functional, logical expression that reads like the English that specifies it.

At Border Stylo, we are employing a similar technique to separate concerns in testing some high-reliability back-end servers. This technique has allowed us to reduce such an important guarantee as the proper functioning of a permissions system to the almost laughably simple:

Tagged with: pure-function, state, test, Scheme, control-structures

Related Posts

Author

Mitchell Johnson

Small

Mitchell is a furnace engineer. He likes scheme, physics and mathematical biology and speaks (in tongues) for the trees.

0 Comments Leave a comment

Leave a comment

Anonymous
Right now

Your comment preview

Reply to comment






Allowed Tags

_emphasis_
*strong*
??citation??
-deleted text-
+inserted text+
^superscript^
~subscript~
@code@

Add code using a GIST
gist: gistid