Design Reviews are one of the most important activities in the software development process. If a bad design is approved and implemented, it is very expensive to correct that afterwards. Therefore, we want to have high confidence in our decisions when we are reviewing a proposed design.
Suppose someone is presenting us the design for a server-side system, whose non-functional attributes are described as follows:
The server is stateless and session data is stored in a shared database. Each instance of the server will hold a cache for 10,000 active sessions, being able to handle 400 transactions per second (TPS), with average latency of 150 milliseconds. The service is expected to be up 99% of the time.
How does that sound? If you have some experience with server-side development, this numbers probably appear to be quite standard. If you are reviewing such a design, the two main questions are:
- Are these performance attributes reasonable?
- Do they satisfy the requirement of the Service Level Agreement (SLA)?
If your answer to both questions is affirmative, then you would probably approve it.
But what would be your opinion about a service described as follows:
The server is stateless and session data is stored in a shared database. Each instance of the server will hold a cache for 15,000 active sessions, being able to handle 300 transactions per second (TPS), with average latency of 100 milliseconds. The service is expected to be up 99.9% of the time.
This description sounds very similar to the previous one, right? Actually, they are so similar that if you have approved the first one you would approve this one also, and vice-versa. But when we compare the two descriptions we may ask very interesting questions. To help this comparison I will present the two options in a table:
|Attribute||Option 1||Option 2|
|Cache Size (# sessions)||10,000||15,000|
Here are some of the questions raised by comparing these two options:
- What is the expected number of active sessions at any point time? If this number is 30,000 then the first option would require 3 instances of the server, while on the second option two instances are enough.
- What is the expected TPS per active session? In the first option the ratio is 400/10,000 = 4/100 while in the second option it is 300/15,000 = 2/100. So the ratio in the first option is twice as bigger as in the second option.
- What is more important, the latency or the throughput? There is a clear trade-off here, because Option 1 has better throughput while Option 2 has better latency.
- How do you compare the Robustness of the two options? If you think about up-time, there is a 0.9% improvement in Option 2: From 99% to 99.9%. But if you think about down-time, Option 1 could be down 1% of the time, while Option 2 can only be down 0.1% of the time. So you could say that the second option is ten times better. In practice, for an entire year, the first option could be down during 3 days and 15 hours, while the second option can be down during less than 9 hours.
I believe that it is clear in this example that comparing two options is different from reviewing each option separately. In other words, the reasoning involved in the review of a single option is not the same as in the combined analysis. This is not a special case, and actually these differences have been studied by psychologists who reached interesting conclusions and defined some general phenomena, such as:
- The Distinction Bias
- The Evaluability Hypothesis
- The Framing Effect
Now let’s take a look at each one of these theories and observe how they relate to our concrete design review example.
The Distinction Bias is an important concept that was studied in the context of Decision Theory. This is a definition from Wikipedia:
“Distinction bias is the tendency to view two options as more dissimilar when evaluating them simultaneously than when evaluating them separately.”
In our example above, the two design options initially looked very similar, but when they were compared side-by-side their differences became more apparent.
There are situations in which the Distinction Bias may cause bad choices. For example, this happens when we decide to pay more for a product that has a feature which we don’t really need. But in the case of design reviews, I believe that the Distinction Bias is normally beneficial, because the people doing the reviews are professionals that should understand the real meaning and importance of the diverse attributes.
The Evaluability Hypothesis describes another phenomenon that was observed by psychologists when studying how people choose between options. The basic observation is that there are attributes that are easy to evaluate in isolation, while others are hard. These hard-to-evaluate attributes need to be compared to something else to be understood.
When people assess options independently, they tend to focus on the easy-to-evaluate attributes. But when comparing two options they can take in consideration the hard-to-evaluate attributes. This may cause an Evaluation Reversal: The option ranked as the best one in isolated assessments may be considered the second-best in a joint-assessment.
In our previous example, if we asked software engineers to rank the two options in isolation, it is likely that the second option would be considered the best. It was better in three of four attributes: Bigger cache, lower latency and higher up-time. But there is an attribute that may be hard to evaluate in isolation: The TPS per active session, or the rate between the throughput and the cache size. As we saw, it is much higher in the first option than in the second, and thus in a joint evaluation the first option could be chosen.
The Framing Effect is a cognitive bias in which people make different decisions when presented with equivalent options depending on the way these options are expressed. More specifically:
“People react differently to a particular choice depending on whether it is presented as a loss or as a gain.”
In our previous example, we observed this phenomenon when analyzing the robustness of the two design options, which can be expresses either as a small increase in the up-time or a big decrease in the down-time.
Possible questions that could cause this Framing Effect would be:
- Are you willing to pay $X to increase the up-time from 99% to 99.9%?
- Are you willing to pay $X to decrease the down-time from 1% to 0.1%?
- How do you rate a system that is up 99% of the time?
- How do you rate a system that is down 1% of the time?
In both cases the questions are logically equivalent. But if you present them to groups of people, you can expect that the answers will not be consistent.
The understanding of the psychological biases behind people decisions should be used to improve our processes. These are in summary the lessons we can learn:
- Distinction Bias: For better insight, we should always compare alternatives instead of reviewing a single option.
- Evaluability Hypothesis: To really understand the meaning of hard-to-evaluate attributes, we need diverse options.
- Framing Effect: The answer to a question depends on how it is formulated. Different questions with the same logical meaning may get different answers.
Thus, by making sure we have diverse options and by asking the right questions it is possible to greatly increase the efficacy of our design reviews.
What do you think? What has been your personal experience with design reviews? Have you observed these psychological biases in practice? Please share your comments below.
Did you like this post?