But when it comes to testing it’s amazing how many people are willing to send in the villains one at a time in a nice, controlled manner that matches our expectations for what the system is supposed to be able to do, as opposed to what it’s going to face in reality. I’ve heard of cases where a design goal of 1000 simultaneous users was ‘simulated’ with 1000 threads each using non-overlapping range of ascending identifiers and a sleep command. This has at best an arm’s length relationship with reality, for reasons I list below.
Real life traffic is erratic.
If you look deeply enough you will always find that real traffic is bursty and semi-random. Metrics like load averages and busy percentages hide this from us, but real traffic is fundamentally jittery. This means that the systems handling it will need to cope with transient overloads and activity spikes effortlessly. All “Kung Fu” villain testing will do is give you a false sense of security in this scenario. What you need is pseudo random load generation that creates spikes, where each thread waits a random interval unless it’s a certain number of seconds past the minute, in which case they don’t wait at all. This will lead to repeated spikes of activity so you can see what happens when the system reaches its limits.
With enough data unlikely events turn into statistical certainties.
Real world data looks homogenous but contains gotchas. If you are processing a billion records a day a one in 10 million chance event will happen 14 minutes or so. Bear in mind that the unlikely event doesn’t have to exist in reality – Any complex system that’s moving billions of records around will deliver some of them in the wrong order, some late, some more than once and some not at all. If your application code freezes or throws a wobbly in this situation you can see odd behavior emerge over time in your real system that “Kung Fu Villain” testing will never reproduce. A suitably devious testing regime can mimic real world behavior by occasionally losing, echoing or delaying records. You should also be especially paranoid about locking. “Kung Fu villain” testing will never show up subtle flaws in your locking algorithm, such as not having a plan for if your client loses interest and never clears a lock. But if doing so ‘breaks’ one of your application processing threads and in real life this happens every few minutes then you have a de facto resource leak.
Beware of 100% fake data…
Even the lowest budget Kung Fu movie won’t show the same villain attacking the hero twice, let alone fifty times. Neither should you settle for sending a near identical copy of the same record over and over again. If you do you run the risk of getting highly misleading results – somewhere in the code between your client and your server a subroutine (such as Oracle’s SQL*Net) might decide to compress all the identical strings its seeing, leading to performance you’ll never see in reality. Another issue is that your persistence engine may not slow down linearly as record sizes increase, something which will not be visible with identical data. If you’re serious about testing you should get the largest set of real data you can, and then make a clear but subtle change to it over and over again to generate a data set big enough for your purposes.