The Psychology of Reviews: Distinction Bias, Evaluability Hypothesis and the Framing Effect

Design Reviews are one of the most important activities in the software development process. If a bad design is approved and implemented, it is very expensive to correct that afterwards. Therefore, we want to have high confidence in our decisions when we are reviewing a proposed design.

Suppose someone is presenting us the design for a server-side system, whose non-functional attributes are described as follows:

The server is stateless and session data is stored in a shared database. Each instance of the server will hold a cache for 10,000 active sessions, being able to handle 400 transactions per second (TPS), with average latency of 150 milliseconds. The service is expected to be up 99% of the time.

How does that sound? If you have some experience with server-side development, this numbers probably appear to be quite standard. If you are reviewing such a design, the two main questions are:

  • Are these performance attributes reasonable?
  • Do they satisfy the requirement of the Service Level Agreement (SLA)?

If your answer to both questions is affirmative, then you would probably approve it.

But what would be your opinion about a service described as follows:

The server is stateless and session data is stored in a shared database. Each instance of the server will hold a cache for 15,000 active sessions, being able to handle 300 transactions per second (TPS), with average latency of 100 milliseconds. The service is expected to be up 99.9% of the time.

This description sounds very similar to the previous one, right? Actually, they are so similar that if you have approved the first one you would approve this one also, and vice-versa. But when we compare the two descriptions we may ask very interesting questions. To help this comparison I will present the two options in a table:

Attribute Option 1 Option 2
Cache Size (# sessions) 10,000 15,000
Throughput (TPS) 400 300
Latency (ms) 150 100
Robustness (up-time) 99% 99.9%

Here are some of the questions raised by comparing these two options:

  • What is the expected number of active sessions at any point time? If this number is 30,000 then the first option would require 3 instances of the server, while on the second option two instances are enough.
  • What is the expected TPS per active session? In the first option the ratio is 400/10,000 = 4/100 while in the second option it is 300/15,000 = 2/100. So the ratio in the first option is twice as bigger as in the second option.
  • What is more important, the latency or the throughput? There is a clear trade-off here, because Option 1 has better throughput while Option 2 has better latency.
  • How do you compare the Robustness of the two options? If you think about up-time, there is a 0.9% improvement in Option 2: From 99% to 99.9%. But if you think about down-time, Option 1 could be down 1% of the time, while Option 2 can only be down 0.1% of the time. So you could say that the second option is ten times better. In practice, for an entire year, the first option could be down during 3 days and 15 hours, while the second option can be down during less than 9 hours.

I believe that it is clear in this example that comparing two options is different from reviewing each option separately. In other words, the reasoning involved in the review of a single option is not the same as in the combined analysis. This is not a special case, and actually these differences have been studied by psychologists who reached interesting conclusions and defined some general phenomena, such as:

  • The Distinction Bias
  • The Evaluability Hypothesis
  • The Framing Effect

Now let’s take a look at each one of these theories and observe how they relate to our concrete design review example.

Distinction Bias

The Distinction Bias is an important concept that was studied in the context of Decision Theory. This is a definition from Wikipedia:

“Distinction bias is the tendency to view two options as more dissimilar when evaluating them simultaneously than when evaluating them separately.”

In our example above, the two design options initially looked very similar, but when they were compared side-by-side their differences became more apparent.

There are situations in which the Distinction Bias may cause bad choices. For example, this happens when we decide to pay more for a product that has a feature which we don’t really need. But in the case of design reviews, I believe that the Distinction Bias is normally beneficial, because the people doing the reviews are professionals that should understand the real meaning and importance of the diverse attributes.

Evaluability Hypothesis

The Evaluability Hypothesis describes another phenomenon that was observed by psychologists when studying how people choose between options. The basic observation is that there are attributes that are easy to evaluate in isolation, while others are hard. These hard-to-evaluate attributes need to be compared to something else to be understood.

When people assess options independently, they tend to focus on the easy-to-evaluate attributes. But when comparing two options they can take in consideration the hard-to-evaluate attributes. This may cause an Evaluation Reversal: The option ranked as the best one in isolated assessments may be considered the second-best in a joint-assessment.

In our previous example, if we asked software engineers to rank the two options in isolation, it is likely that the second option would be considered the best. It was better in three of four attributes: Bigger cache, lower latency and higher up-time. But there is an attribute that may be hard to evaluate in isolation: The TPS per active session, or the rate between the throughput and the cache size. As we saw, it is much higher in the first option than in the second, and thus in a joint evaluation the first option could be chosen.

Framing Effect

The Framing Effect is a cognitive bias in which people make different decisions when presented with equivalent options depending on the way these options are expressed. More specifically:

“People react differently to a particular choice depending on whether it is presented as a loss or as a gain.”

In our previous example, we observed this phenomenon when analyzing the robustness of the two design options, which can be expresses either as a small increase in the up-time or a big decrease in the down-time.

Possible questions that could cause this Framing Effect would be:

  • Are you willing to pay $X to increase the up-time from 99% to 99.9%?
  • Are you willing to pay $X to decrease the down-time from 1% to 0.1%?

Or also:

  • How do you rate a system that is up 99% of the time?
  • How do you rate a system that is down 1% of the time?

In both cases the questions are logically equivalent. But if you present them to groups of people, you can expect that the answers will not be consistent.

Conclusions

The understanding of the psychological biases behind people decisions should be used to improve our processes. These are in summary the lessons we can learn:

  • Distinction Bias: For better insight, we should always compare alternatives instead of reviewing a single option.
  • Evaluability Hypothesis: To really understand the meaning of hard-to-evaluate attributes, we need diverse options.
  • Framing Effect: The answer to a question depends on how it is formulated. Different questions with the same logical meaning may get different answers.

Thus, by making sure we have diverse options and by asking the right questions it is possible to greatly increase the efficacy of our design reviews.

What do you think? What has been your personal experience with design reviews? Have you observed these psychological biases in practice? Please share your comments below.

Did you like this post?

About Hayim Makabee

Veteran software developer, enthusiastic programmer, author of a book on Object-Oriented Programming, co-founder and CEO at KashKlik, an innovative Influencer Marketing platform.
This entry was posted in Efficacy, Psychology of Programming, Software Architecture and tagged , , . Bookmark the permalink.

10 Responses to The Psychology of Reviews: Distinction Bias, Evaluability Hypothesis and the Framing Effect

  1. Pavel Bekkerman says:

    Thanks, Hayim. Very interesting. Makes me wonder what are other decision making patterns (…of irrationality). Why don’t they teach this stuff to everybody?

    • Thanks, Pavel. I agree that everybody should learn about these basic patterns of decision making. If you are interested in this subject, I recommend the books of Dan Ariely. Enjoy!

  2. Gene Hughson says:

    I have to echo Pavel…when your job is making decisions about the stuff that’s hard to change (architecture), the tools you have to make those decisions are the most important ones in the toolbox. Great post!

  3. Great post, Hayim. I also wonder about a 4th factor. I don’t know what to call it, but perhaps something like “The Dependency Effect”. What I am referring to is the fact that certain factors may be inter-dependent. For example, transactions may have read/write locking conflicts, where throughput, latency and # of sessions might be cross-dependent. The Law of Unintended Consequences might derive from this effect. Thoughts?

    • Thanks, Alfred! I think that your idea of dependency between factors is very interesting, and certainly in my experience some errors were done because these dependencies were not taken in consideration. This issue is probably related to the concept of “Relevance Paradox”. I will think about that and it may become the subject of a new post, thanks.

  4. Chris Lang says:

    This is a great topic and psychological concepts such as ‘framing effect’ and ‘distinction bias’ are touched on in graduate level management classes such as org theory, but in my professional experience they have not made their way into decision making processes. Limited resources and time are the obvious reasons why they haven’t, but as your article mentions, if a bad design is signed off on, deficiency corrections down the road can be very expensive. This is something that every software organization understands, but without a complete understanding of how bad designs get approved, they are typically at the mercy of their design teams.

    In fairness, a good designer will demand that the goals or tolerance thresholds of the system be clearly identified and prioritized before any designing takes place, for instance that uptime must be greater than 99% and latency cannot exceed 150 ms. But meeting the minimum criterion does not guarantee a design that is robust. Many tradeoffs should be presented for side by side analysis to ensure that the long term goals of the system are being addressed in the *best* manner possible. With many options to choose from, the effects of innocent but harmful psychological factors (‘groupthink’ is another one) can be minimized. In other words, good article.

  5. Great article Hayim! Such patterns will definitely help making reviews more effective.

Leave a comment