Benchmark Drives Behavior

An anecdotal look at how behavior follows what we measure and how this can lead to aberrant and/or potentially fraudulent outcomes.

Introduction

Guido van Drunen is a forensic accounting expert, with decades of experience in high-profile investigative work. Having recently retired from KPMG he is writing a series of articles about how he has seen benchmarks drive behavior – often with aberrant or potentially perverse outcomes. The purpose of these articles is to make readers think more deeply and maybe differently about key metrics, how they are created, how they are measured and what behaviors they encourage. The first article discusses how failing to have proper alignment of interests due to poorly conceived benchmarks can cause problems and limit long term success, while also increasing risk. To learn more about Guido’s insights read on………

What started years ago as a simple discussion amongst colleagues arising from anecdotal observations has over the years evolved into this initial article. This is the first in a series of articles looking at the impacts of benchmarks and outcomes. These are issues I have either encountered during my career or learned about from colleagues at other organizations. I will attempt to discern what these benchmarks are/were designed to achieve, and regrettably, what some of the perverse outcomes were that resulted. Having benchmarks which are not adequately thought through, properly implemented, and monitored correctly often invokes the law of unintended consequences. These consequences are often harsh and have far-reaching impacts. I will preface this article by saying that I am looking at the issue through the lens of a forensic accountant with a bias towards outcomes that can be indicators of potential or actual fraud, waste, and abuse. The article is based on a limited data set and primarily on my anecdotal observations. While written by a finance professional, the observations cited are not limited exclusively to the finance world. However, who better to talk about benchmarks than one of the bean counters that spent a large portion of their professional career focused on dealing with the fallout from manipulated or poorly designed benchmarks.

I acknowledge that it is both easy and dangerous to anecdotally establish a level of correlation related to outcomes good, bad, or indifferent and the benchmarks utilized, whatever the objective. Correlation does not imply causation. I have often heard it said that “you get what you measure” or as Einstein said, “Not everything that counts can be counted, and not everything that can be counted counts.”¹ However, a quote often inaccurately attributed to Peter Drucker (who has significantly influenced management thinking) is “you can’t manage what you cannot measure.”² Drucker’s take was more nuanced, and consistent with the point of view I intend to crystallize via this series – that benchmarks alone aren’t adequate to measure performance. The aforementioned quotes nonetheless serve to place us on notice that we must focus on the risk of improper metrics and benchmarks. Benchmarks should not be created in a vacuum. I hope this article will help further this message.

What is a Benchmark?

To start with, there is nuanced distinction between a benchmark and benchmarking which we should be aware of.

A Benchmark is a standard or point of reference against which things may be compared or assessed.³

Benchmarking is a process of measuring the performance of a company’s products, services, or processes against those of another business considered to be the best in the industry, aka “best in class.” The point of benchmarking is to identify internal opportunities for improvement not to see how we can make ourselves look good. (Note the first hurdle is the risk in defining who is best.)

For the purpose of this series of articles, a benchmark is defined as a measure or target against which performance is gauged and the achievement or failure to achieve the measure or target has consequences either positive or negative.

Before we get into specific examples, the aforementioned, and the observations in this article are not intimating that there should be no measurements or benchmarks. Simply that it is required that measurements and benchmarks should be developed carefully, in conjunction with input from those individuals who will be measured and other stakeholders to ensure that they achieve the desired outcome, in the long term. History, business, and the social sciences are full of measurements and benchmarks that created an environment that was not conducive to the long-term health of an organization, society, and the individual.

Not taking a “Yes sir, no sir, three bags full sir” approach to accepting newly developed and promulgated benchmarks is key. We must as leaders, stewards, individuals that are measured, and individuals that are doing the measuring properly and appropriately question, in a constructive manner, issues related to benchmarks that may be counterproductive. We need to proactively identify those risks that can arise because of existing, amended, or new benchmarks and failure to implement them properly or monitor them properly. This is everyone’s responsibility, preferably before the benchmark is implemented, failure to question and understand can lead to problems.

Finally, the creation of benchmarks should identify those behaviors that are desired and valued by an organization and ensure that regardless of the benchmarks those behaviors are nurtured, and that organizational ideals and values are not lost or placed in a nice to have or a back seat role. A rhetorical question for the reader is, how often and how well have the designs of HR systems been directly tied and linked to corporate mission statements and codes of conduct not just in words but supported by a documentary trail of evidence demonstrating clear linkages between benchmarks and corporate values. Questioning to understand existing and new systems and benchmarks is required to effectively manage outcomes. So, let’s talk about “Benchmarks” and what we can do to ensure we can maximize the benefits and limit their ability to create Kafkaesque situations.

Examples of a Benchmark in action

One example, performance appraisals, are effectively a system with predetermined measures against which we benchmark individuals and subsequently reward them with money or other perks. This has in many instances led to machinations on the part of managers and employees to achieve the benchmarks set “by hook or by crook.” Again, this might be exactly the desired outcome if the benchmarks are designed properly. However, if for example the senior leadership team has 90% of its compensation tied to options and deferred stock, one must be very careful as this might not always be in the best long-term interests of the company. The claw back regime which is now in place (although very sparingly used) is a direct result of some of the perceived abuses of these remuneration/compensation systems that can be tied back to benchmarks. Benchmarks that did either not serve the long-term best interests of the organizations or were improperly designed, monitored or both in many instances. Another HR related example is that of the old GE model “rank and yank”. It is interesting to note that GE has dispensed with this approach. One issue GE noted was people who had passed away during the year being included in the rankings due to people being so un-enamored with the system. Regardless, some organizations feel this approach may still meet their needs. Year-end reviews often use unique words to conduct this benchmarking exercise such as calibrate, force rank, gauge, assess, review etc. Once people get the hang of this process, they either adapt, leave the organization, or get asked to leave the organization for either not conforming to, or achieving what may not be the organization’s desired outcomes. Often those departing employees are key contributors who just do not want the hassle. I recall an employee once telling me post a review session that he was surprised with the caliber of people the organization could do without.

Again, it is not that we should remove benchmarks, tracking mechanisms or reporting of results to gauge performance for organizations or people. I simply stress that there is often a disconnect between the requirements set by an organization and the reward structures incentivizing certain behaviors along with the ability to manipulate outcomes. We must remember these processes for gauging performance are always to some extent subjective. Regrettably, the achievement of the measures or benchmarks set by an organization do not always align with the long-term goals and objectives of that organization.

So how do we deal with the problem of bad benchmarks? Let us first cover some of the poorly designed benchmarks and what they caused or resulted in. If we identify some of the instances where things went wrong in the past, then we can start feeding these experiences and observations into our own personal assessment model and help with questioning the existing benchmarks to ensure that they truly result in what was intended. These examples are anecdotal and serve to highlight “what could go wrong” if we either design a poor benchmark or improperly monitor the achievement of the benchmark as set.

Examples of benchmarks gone awry and insights from some of the individuals impacted by those benchmarks.

To paraphrase Daniel Ariely,⁴ do funky benchmarks cause people to behave in a way that would result in an improper outcome or is it the environment that is created by the benchmark that cause people to act in this way.

By their very nature benchmarks are created to track performance and ultimately attempt to get people to behave in a certain way. Regrettably sometimes poor benchmarks force people to act in a manner to make someone believe (the person gauging performance) that they acted in a certain manner or did certain things to get to the number/benchmark you wanted. This then in turn enables them to achieve the number they wanted. By the number “they wanted” I mean the following: incentives, bonuses, additional budget, the ability to hire new team members, recognition that they are the leader in their field, not losing their job, being promoted etc. In short, I hit my benchmark so give me what I am due.

To further shed light on the foregoing, here are some examples I encountered during my career:

1. Safety Culture

One organization I was involved with had an ingrained culture of safety and was rightfully proud of it. We have all seen the signs “X number of days without a safety incident.” Clearly the focus on safety resulted in reduced incidents and I am sure much of this reduction was because of the training and other programs introduced. However, the tracking mechanism also made it so that having a safety incident was a big deal and at times resulted in lower bonuses, impacts to promotion etc. We learned at some sites, as a result of the reporting requirements, people falling off a ladder and getting hurt but not “seriously hurt” were told to take the rest of the week off and the incidents were not reported. In short, the program resulted in improved reported safety, but also caused people to do certain things which meant some of the safety gains were illusory due to the ramifications of not hitting the benchmarks.

2. Performance Metrics

With this next example I am probably going to be annoying HR and a variety of other professionals across the board. I note that if we blindly accept the “conventional wisdom” which is often only conventional because everybody does it that way, then we as individuals and our organizations will not be able to maximize our potential and are effectively limiting the potential for longer term success. Creating benchmarks that are extremely complex with an intent to drive certain behavior(s) and reward people fairly can have an impact which is materially different than what was envisaged. An example of this is the allocation of performance related credit for a sale, or the completion of a project, how does someone do this fairly? I have observed several organizations where a sale was made and people who had limited or no involvement were asking for partial credit. Why? Because that is how their performance was measured. Alternatively, this way of measuring performance also led to situations where someone instrumental in a large sale did not receive credit and was subsequently asked to leave the organization for performance reasons, while credit was given to others with less attributable impact. Granted this is not solely the fault of a bad benchmarks but also improper oversight and monitoring. In short, it is a confluence of factors. However, the more complex benchmarks are the harder it is to monitor them and/or the scope for manipulation increases. The impacts of this go beyond the transactions alone and can be pernicious in eroding the culture of an organization, as people focus on short-term numbers as opposed to the longer-term goals of the organization. Additionally, it has been noted by many that the best way to lose good employees is to not take action against bad employees, another one of those unintended consequences

3. Financial Statements and MD&A

I am aware of the pressures and concerns that are in place that make this type of challenging of the status quo extremely difficult. Take for example the quarterly filing requirements for an SEC registrant. Is this information truly key for investors? The Management Discussion and Analysis (“MD&A”) portion of financial statements have become so all-encompassing that only analysts have the time to wade through them in detail. They must be filed, but what portion of these filings is utilized for investment decisions? In short, often, the content of these filings is to meet certain “legal or statutory” requirements but they have become so bland and general in nature that they fulfill no informational purpose other than being able to point to a certain paragraph and say “see we disclosed it” even if only in a general sense. Benchmark met? Yes. Value added? Nil.

4. Casualties in War

An extreme example of one metric that truly went off the rails was the use of body counts to determine “success” in the Vietnam war. The use of this metric was designed as part of a war of attrition and under the circumstances that existed at the time to not take over all of Vietnam but to maintain the status quo in South Vietnam. Researchers/historians stated (aptly) that the body counts were exaggerated by American commanders by up to 100% as this was the measure of success. Furthermore, the measure was meant to count the lives of enemy combatants, but it appears that 220k civilians killed were “collateral damage” and may have been included in those body counts. There were also reports of inflation and fabrication of body counts in “after-action reports.” All the aforementioned (driven by a benchmark) resulted in information that was used for key decision-making purposes being incorrect. In short, if you reward me or if I can present a better picture based on the number of casualties of my opponent, then this is what I will give you. This is a classic case of the benchmark (how many killed) resulting in positive short-term outcomes (more resources). It may have induced people to provide incorrect data knowingly or unknowingly.

5. Sales Metrics and Commission Calculation

Bringing this back to the financial realm another example is sales. Nothing helps a stock price like sales growth, leaving aside the recent Gamestop events. As we seek sales growth, we incentivize behavior that helps us achieve that goal. This can be reflected at all levels of the organization, including for example low-level sales professionals entering into undisclosed side agreements whilst simultaneously benefitting from commissions which are not truly earned. At the opposite end of the spectrum are the two early 2000’s frauds at World Com and Enron where revenues were overstated as part of improper accounting schemes with a view to achieving benchmarks expectations and simultaneously making certain individuals wealthy, albeit in the short term.

Summary of Initial Observations

The above examples are not just a benchmarking problem, but also the fault of improper monitoring (which will be covered in future articles). However, it is a confluence of factors starting with the benchmark that results in these behaviors and the subsequent related outcomes. The impacts of this go beyond the transactions alone and can be pernicious/damaging in eroding the culture of an organization and causing disproportionate focus on short-term numbers as opposed to the longer-term goals of the organization. These are just a few of the issues I have observed where benchmarks that were designed with the best of intentions (i.e., to drive and measure certain outcomes with a view to conscious improvement) but resulted in things going off the rails. In part this was due to the poor design process of the benchmark, the complexity of the benchmark, playing or gaming of the benchmark and finally a lack of monitoring and questioning of benchmarking results and components. In short, having a benchmark and not properly monitoring it can create a separate plethora of problems.

This is where oversight and internal controls come in, and willingness to speak out and not compromise, or “go along to get along.” It also brings into the picture the need to hold those individuals that develop the benchmarks accountable for what they have developed. There should be a requirement to consider, in-depth, “what could go wrong.” Once this has been identified there should be a focused plan on how to establish and implement guardrails and safety bumpers to limit the risks if the decision is made to continue with the benchmark as designed. This could potentially be an area where input from an organization’s compliance function would be useful. I suggest that a few useful exercises in an organization would be to (1) scrutinize the benchmarks that are in place as part of the annual risk assessment; then (2) to determine what the risks related to those benchmarks are; and (3) to conduct compliance audits related to the identified benchmark risks.

Public company auditors already do some of this with their fraud brainstorming sessions as required by the AICPA to address fraud risks, but this is limited to the financial statements. That still leaves a lot of the benchmarks uncovered by the external auditors. One needs to ask what is the organizational approach to addressing this risk? For example, we have seen a proliferation of alternative measures used to assess company performance such as “adjusted EBITDA or EBIT,” “closed deals,” “customer churn,” etc. These metrics are non-GAAP measures and are not generally part of what gets audited, although there is an enhanced focus by regulators on non-GAAP measures. If these measures are utilized as benchmarks for performance and reward purposes, there need to be strong controls and guardrails in place to ensure they incentivize the correct long-term outcomes.

Interim Conclusion

The views and musings contained in this article and subsequent ones at this stage are not necessarily supported by rigorously tested social science-based analytics on any scale. I simply point out some of the pitfalls and obtuse outcomes that I have experienced during a long career whilst working in law enforcement, industry, and as a consultant. The intention of this and subsequent articles is to make people think more deeply, thoughtfully, and maybe differently about the following: “how should we approach the creation of metrics and benchmarks, how do people react to benchmarks, and how they try to achieve the benchmarks based on organizational rewards structures.” Ultimately, how do we develop controls around this to protect everyone?

As noted at the outset, this is intended to be the first of a series of articles and I am early in the process. At this stage with the limited data and empirical research available, a robust conclusion would not be appropriate. That said, and as part of my interim conclusion, I consider it key for all individuals dealing with benchmarks either through creating them, assessing them, contributing data to them, being measured against them and striving to achieve them for their own performance goals, to think long and hard about what they are doing. The need for benchmarking is acknowledged, however it is a requirement that we also ensure that process does not become more important than outcome. Watch this space, as this multifaceted journey into this key area of risk, opportunity, fraud, and operating discipline continues. The next article in this series will deal with 4 cases of issues negatively impacting corporations or individuals that have strong linkages to benchmarks along with additional anecdotal observations.

Guido van Drunen

^{1. Albert Einstein ↩}
^{2. https://www.drucker.institute/thedx/measurement-myopia/ ↩}
^{3. Benchmark Encyclopedia.com ↩}
^{4. https://en.wikipedia.org/wiki/Dan_Ariely ↩}