1 May 2020

Six Statements About Testing

Testing, 1, 2, 3. And 4 and 5 and 6.

I volunteered to do a learning lab session on testing at work; there were some common patterns and things I noticed we were doing when writing tests which I thought we could be doing better.

Instead of just coming up with an internal-only presentation that would never see the light of day outside the company (or just the immediate team, even), I thought it might be fun(???) to write a couple of blog posts instead and later convert those into a presentation. The first one’d be about general observations/things to think about, while the second one would cover Go-specific things we could do.

This is the first of those two posts. The second one is here.1

This is all, of course, hugely opinionated. It’s also not intended to be a criticism of anyone.

1. Testing isn’t the most fun thing to do.

Why do we write tests? The short answer is probably that we’ve been told it’s The Right Thing To Do, and not because we really want to.

We’re told this when learning - whether you’ve come from a CS degree, a bootcamp, or picked things up on your own or otherwise, chances are that when learning about software engineering you’ve been told that testing is Good. Testing helps reduce bugs, bugs are Bad, therefore we should write tests.

We’re told this at work, too, usually. It’s part of the performance criteria, everyone else is writing them or saying to and they’ve been doing this whole engineering thing for a while, so I guess it’s a good idea if I write some tests too.

But testing’s not glamorous. Tests don’t deliver business value in an obvious way - it’s unlikely that a product manager or customer is going to come to you and request that you write them. It’s unlikely that you’re getting an Exceeds Expectations or a promotion because you wrote a lot of tests. So we write them because it’s just what you have to do, because we’ve been told it’s The Right Thing To Do and we know that, but in the end they’re a chore to get through as part of the process.

Even if testing is a net positive thing to have if done well, I think it’s still okay to acknowledge that it just straight up isn’t much fun for most folks.

2. Tests don’t get code reviewed, but they should.

You’ll definitely be called out at work for adding an entire feature without any tests, but for the most part if there’s something there that passes?

LGTM, stamped, ship it.

It’s quote-unquote funny that one thing that’s been common to each of the jobs I’ve had so far is that tests don’t get as much attention at code review time (or any time, really) as the feature code that’s being shipped alongside. It’s noticeable both online and in person: test code rarely gets commented on and is often scrolled past in the same way that generated code is.

Tests are just code. It’s code that’s supposed to check that our feature code does what we want it to. That means we should be checking that it’s actually doing so.

That means looking at tests during code review2.

3. Tests don’t make your code correct.

You might say that tests make sure your code is correct, but that’s just what we want them to do. What they actually do is make sure the code does what the test is checking for. If it’s checking the wrong thing, then hey. Tests pass; your code’s wrong.

How many times have you written code, wrote some tests, and later discovered a bug or an edge case you hadn’t handled? Or found that some code wasn’t behaving the way it was supposed to, and that incorrect behaviour was included as a test case?

Tests only do what you write them to do.

4. Coverage doesn’t mean anything.

How do we know if our tests are checking the right thing? Well, the answer is definitely not coverage.

Low coverage is possibly-to-likely a bad thing. If a codebase has 0% coverage that’s a big red flag that should probably get looked at soon. But any “reasonable” amount of coverage up to and including 100% in of itself is meaningless without knowing how that coverage was achieved. You can write tests that hit every line of code in your code base and still test absolutely nothing of value. Hit every execution path, leave out any assertions, and there you go. 100% coverage, none of it useful.

This is both extreme and contrived, so here’s the real point: just as a test only ensures code does what the test is checking for, coverage tautologically only ensures you’re executing lines of code as part of your test. It doesn’t say anything about whether or not that code is correct. Chasing after coverage encourages writing tests which overfit3 to the implementation and which basically only ensure there’re no runtime errors.

Making sure your code runs is great, but that’s not the same as making sure it’s doing the right thing.

5. Test behaviour, not implementation.

Testing your implementation means you’ve tested that your code runs. It doesn’t mean your code is doing what it should. To do that, you need to be testing its behaviour.

One way to do this is often referred to as black-box testing - treat the code as a black-box and ignore everything you know about how it works internally. Given a particular set of inputs, what is the code supposed to do, and what is supposed to be returned, or what side effects are expected?

You then need to think about the different types or classes of inputs you might expect - sometimes called boundary-value analysis4 or testing. What’re some valid inputs to the function? Write a test with one or two. What’re all the invalid inputs? Write tests for each. You want to figure out your range of accepted inputs, determine the edges of that range, and write tests that cover the edge itself, something just outside the range on the “wrong” side of the edge, and something just inside on the “right” side.

For instance, say you have a function which takes a positive integer representing a user’s age. You’d want to test one or two valid/“safe” values (say 24 or 65), the invalid ones (0, -1), and one that’s barely okay (1).

Test tables in Go are perfect for this sort of test, iterating through inputs and expected values.

Preconditions are also inputs of a sort. Let’s say your code is calling an external service to retrieve some data. You’re probably mocking that service as part of your test, so think about all the ways it could behave. What if it returns an error? Or it doesn’t return an error, but it returns an empty result, or one that’s missing some fields?

As an addendum to this point - you probably don’t need to be asserting calls on every single one of your mocks. Unless calling the thing that’s being mocked is an explicit part of the expected behaviour of your code, asserting that the mock is being called is more checking-for-implementation. I think this is more subjective than any of the other points I’m trying to make here, though, so YMMV even further.

6. Done well, tests can be like documentation.

Note that I didn’t say they can be documentation. In no way is a set of tests a substitute for good documentation; in no way is code alone its own documentation.

A good set of tests can, however, supplement documentation by showing how the code was expected to behave at the time it was written. If you’ve done things well, your tests are a set of inputs to your code and corresponding sets of returned values, changed fields, mocked function calls, or other expected side effects. That’s a pretty good overview of what the code’s supposed to do, or at least the understanding of such at the time.

This of course only works if your tests all pass. Just like documentation, tests which fall out of date and fail because they no longer match the expected behaviour are no longer useful - both as a reference and otherwise.

  1. import cycle not allowed
    post six-statements-about-testing
     imports tips-to-take-test-tables-to-the-top
     imports six-statements-about-testing
  2. In an ideal world, we’d be holding tests to a similar level of scrutiny as “real” code because they are real code. But this isn’t an ideal world and I’m just some person on the internet, so you do you.

    Unless you’re dealing with mission-critical stuff - nuclear reactors, medical devices, airplanes, etc. - in which case you really should be quite thorough with checking your tests and your code, or at the very least better than certain events would seem to indicate.

    Ahem. ↩︎

  3. If your tests fail every time you make a code change, they’re probably being coupled too tightly to your implementation. ↩︎

  4. The Wikipedia article for this uses a lot of big words and mathematical symbols, if you’re into that. ↩︎