Thursday, January 22, 2015

The Universal Pattern of Huge Software Losses

Apology: This post was supposed to appear automatically as a follow-up to my three cases of large, costly software failures, but evidently I had a software failure of my own so the google scheduler didn't do what I thought I asked for. So, here is the first follow-up, a bit late. I hope it's worth waiting for.

To complete my data gathering, I'll present two more loss cases, then proceed to describing the pattern that governs all of these cases, followed by the number one rule for preventing such losses in the first place.

Case history 4: A broker's statement

I know this story from the outside, as a customer of a large brokerage firm:
One month, a spurious line of $100,000.00 was printed on the summary portion of 1,500,000 accounts, and nobody knew why it was there. Twenty percent of clients called about it, using perhaps 50,000 hours of account representative time, or $1,000,000 at least. An unknown amount of customer time was used, and the effect on customer confidence was unknown. The total cost of this failure was at least $2,000,000, and the failure resulted from one of the simplest known errors in COBOL coding: failing to clear a blank line in a printing area.

Case history 5: A buying club statement

I know this story, too, from the outside, as a customer of a mail-order company, and also from the inside, as their consultant:
One month, a new service phone number for customer inquiries was printed on each bill. Unfortunately, the phone number had one digit incorrect, producing the number of a local doctor instead of the mail-order company. The doctor's phone was continuously busy for a week until he could get it disconnected. Many patients suffered, though I don't know if anyone died as a result of not being able to reach the doctor. The total cost of this failure would have been hard to calculate except for the fact that the doctor sued the mail-order company and won a large settlement. One of the terms of the settlement was that the doctor not reveal the amount, but I presume it was big enough. The failure resulted from an even simpler error in COBOL coding: copying a constant wrong.

The Universal Pattern of Huge Losses

I'll stop here, because I suspect you are getting bored with reading all these cases. Let me assure you, however, that they were anything but boring to the top management of the organizations involved. Rather than give a number of similar cases I have in my files, let's consider each case as a data point and try to extract some generalized meaning.

Every such case that I have investigated follows a universal pattern:

1. There is an existing system in operation, and it is considered reliable and crucial to the operation.

2. A quick change to the system is desired, usually from very high in the organization.

3. The change is labeled "trivial."

4. Nobody notices that statement 3 is a statement about the difficulty of making the change, not the consequences of making it, or of making it wrong.

5. The change is made without any of the usual software engineering safeguards, however minimal, that the organization has in place.

6. The change is put directly into the normal operations.

7. The individual effect of the change is small, so that nobody notices immediately.

8. This small effect is multiplied by many uses, producing a large consequence.

The Universal Pattern of Management Coping With a Large Loss

Whenever I have been able to trace management action subsequent to the loss, I have found that the universal pattern continues. After the failure is spotted:

9. Management's first reaction is to minimize its magnitude, so the consequences are continued for somewhat longer than necessary.

10. When the magnitude of the loss becomes undeniable, the programmer who actually touched the code is fired—for having done exactly what the supervisor said.

11. The supervisor is demoted to programmer, perhaps because of a demonstrated understanding of the technical aspects of the job.

12. The manager who assigned the work to the supervisor is slipped sideways into a staff position, presumably to work on software engineering practices.

13. Higher managers are left untouched. After all, what could they have done?

The First Rule of Failure Prevention

Once you understand the Universal Pattern of Huge Losses, you know what to do whenever you hear someone say things like:

• "This is a trivial change."

• "What can possibly go wrong?"

• "This won't change anything."

When you hear someone express the idea that something is too small to be worth observing, always take a look. That's the First Rule of Failure Prevention.


Nothing is too small to not be worth observing.


What's Next?

Now that you're familiar with the pattern, we'll take a breather until the next post. There I'll provide other guides for preventing such failures.

Note

This essay is adapted from a portion of Chapter 2 from Responding to Significant Software Events.

This book, in turn, is part of the Quality Software Bundle, with is an economical way to obtain the entire nine volumes of the Quality Software Series (plus two more relevant volumes).

- See more at: http://secretsofconsulting.blogspot.com/#sthash.SRafTDef.dpuf

Thursday, January 15, 2015

Some Very Expensive Software Failures

Why Concentrate on Failure?

"So long as a man attends to his business the public does not count his drinks. When he fails they notice if he takes even a glass of root beer." - Corra May Harris

Logically, direct measurement of value should be the first place an organization starts to look at itself, but that's not how it usually happens. Instead, the trigger for most organizations to embark on some self-examination is failure—either one whale of a failure or thousands of annoying failure mosquitoes.

This concentration on failure may seem illogical—and may be illogical in many circumstances—but it does fit with our understanding of quality as subjective value. Of all the troublesome aspects of using computers, failures are by far the most annoying to the most people. Without ever conducting a detailed impact case study, or even a greatest single benefit study, people know that they don't like it when their computer fails. Thus, customers heap abundant praise and appreciation on the software organization that doesn't fail them.

Of course, the definition of failure changes with time, as expectations change. Once customers become accustomed to a certain level of service, a lapse from that level becomes a failure. Some customers have come to expect a succession of "breakthroughs" in software, so that achieving only a modest gain is seen as a failure. Thus, the first step in managing failures is to manage customer expectations—but that's always the first step in managing quality.

What Do Failures Cost?

Some perfectionists in software engineering are overly preoccupied with failure, and most others don't rationally analyze the value they place on failure-free operation. Nonetheless, when we do measure the cost of failure carefully, we generally find that great value can be added by producing more reliable software. In this section, we'll take a look at a few examples that should convince you.

Case history 1: A national bank

The national bank of Country X issued loans to all the banks in the country. Each loan was confirmed by a telegram showing the amount of the loan, the repayment conditions, and the interest rate. The telegram became the legal loan document for the loan. The COBOL program that composed and sent these telegrams had been in operation for almost 15 years, and had worked flawlessly. Somebody noticed, however, that the serial number field would run out of digits and begin duplicating serial numbers within a few months. As each loan was legally identified by the serial number on the telegram, duplication could not be allowed.

Management directed that the serial number field be expanded. The programming manager assigned the job to one of the team leaders, who gave it to a programmer, saying, "Expand the serial number field by two digits." The programmer made this trivial change, ran a few tests, and the system was put into operation the next day. Everything worked fine.

Some time later, a financial analyst noticed a slight discrepancy between estimated loan receipts and actual loan receipts. After much searching, it was discovered that the serial number expansion had overlaid the low order digits of the interest rate field, causing the final two digits of every interest rate to be truncated to "00." Although the difference between 7.3845% and 7.3800% is quite small, when you are lending hundreds of billions of dollars, it quickly adds up to something significant. In this case, it added up to more than a billion dollars that the national bank could never recover.

Case history 2: A public utility

A utility company was changing its billing algorithm to accommodate rate changes (a utility company euphemism for "rate increases"). All this involved was updating a few numerical constants in the existing billing program.

Management directed that the constants be updated. The programming manager assigned the job to one of the team leaders, who gave it to a programmer, saying, "Replace these constants in the program." The programmer made this trivial change, ran a few tests, and the system was put into operation the next day. Everything worked fine.

Some time later, the Comptroller's office noticed a slight discrepancy between estimated receipts and actual receipts. After much searching, it was discovered that two low order digits in one of the constants had been entered with "75" transposed to "57", causing a number of the bills to be short by a small amount. Billing millions of customers, this small difference added up to X dollars that the utility could never recover.

The reason I say "X dollars" is that I've heard this story from four different clients, with different values of X. Estimated losses ranged from a low of $42 million to a high of $1.1 billion. Given that this happened four times to my clients, and given how few public utilities are clients of mine, I'm sure it's actually happened many more times.

Case history 3: A state lottery

I know of this one through the public press, so I can tell you that it's about the New York State Lottery:

A few years ago, the New York State legislature authorized a special lottery to raise extra money for some worthy purpose. As this special lottery was a variant of the regular lottery, the program to print the lottery tickets had to be modified. Fortunately, all this involved was changing one digit in the existing program.

Management directed that the change be made. The programming manager assigned the job to one of the team leaders, who gave it to a programmer, saying, "Change this digit to a five." The programmer made this trivial change, ran a few tests, and the system was put into operation the next day. Everything worked fine.


A few weeks later, when ticket sales were in full swing, one of the players bought two tickets and noticed that they had identical numbers. As there were supposed to be no duplicates in this lottery, he brought his tickets to the Daily News, which printed a photo of him and his two tickets on the front page. Public confidence in the lottery plunged, and the explanation that the error was "trivial" did not restore public confidence. In order to satisfy the public outcry, all lotteries were shut down pending the report of a blue ribbon investigating committee (this is government, after all). Altogether, it took 11 months for the matter to be resolved and the lotteries to be reestablished. At that time, the lotteries had been netting the state about $4 million to $5 million per month, so the total loss of revenue was estimated between $44 million and $55 million.

What's Next?

I have many more cases of failure, but to keep this blog short, I'll pause here. In my next blog essay, I'll give a few more cases, then describe the universal pattern of huge losses. After that, I'll provide some guides for preventing such failures.

Note

This essay is adapted from a portion of Chapter 2 from Responding to Significant Software Events. 

This book, in turn, is part of the Quality Software Bundle, with is an economical way to obtain the entire nine volumes of the Quality Software Series (plus two more relevant volumes).

Sunday, January 04, 2015

How You Can Help Your Favorite Author

In spite of popular myths, writing books is a tough way to make a living. The writing itself is tough enough for most people, which is probably why most people don't consider becoming writers themselves. But for all the writers I know, the writing is probably the easiest part of the job. The hard part is promoting your books so people will buy them and, hopefully, will read them.

One of my writing buddies recently released a new book, A Murder of Crows. To accompany the release, she wrote a blog essay in which she said, "I am, of course, in panic mode, because I have a release-day checklist in which I do more than whisper a quiet announcement on Facebook and slink back into my hidey hole, as usual, so make snarky comments and post nerdy links. The dreaded promotion phase of writing. I think I know like two writers who enjoy promoting themselves. The rest of us are terrified."
I'm not one of those two writers. I don't enjoy promoting myself, not because I dread it, but because I don't enjoy doing things I don't do well. I didn't adopt a writing career because I was a good self-promoter. Fortunately, I have done quite well in my writing career because other people do a good job of promoting my stories.
That's true for most any successful writer. They succeed because their fans promote them. So, if you would like to help your favorite author, here are some options:
1) Buy their book today. With the advent of e-books, this act is easier and cheaper than it's ever been in the past. Various services report the  sales of books, and the more that are sold, the more the stores promote them. The more people who buy the book, the better it seems to look to the stores' algorithms and the more visible they make those books to complete strangers who happen to be looking for something to read. Buying the book is not required, but it should be a huge help to your author's reputation.
2) Ask the for a free copy of the book. For instance, I have formats set up for all major ereaders as well as on your computer. If you have no intention of reading the book but want to pass it on to someone who miiiiiight want it like six years from now, I have no problem with that. Ask away
3) Review the book at Amazon, Goodreads, LibraryThing, Smashwords, on your blog or other social media, on the back of a napkin, sealed up into a bottle and tossed into the ocean...[Note: Anywhere you mention the book—or link to it—on the Internet helps brainwash Google and other search engines into making it just slightly more visible.]
4) Pass the word. There’s a local commercial that always ends up with, “If you like our service, tell a friend. If you don’t, tell me.” That’s it exactly.
As far as social media goes, you may copy the picture of the cover off the author's blog and use it to help promote the book. (Most social media give extra weight to posts with pictures attached, which is why you see so many dang cats).
5) Sign up for the author's mailing list, which might produce a newsletter or other forms of information about the book or other books by the same or similar authors.
6) Chat with the author. Dayle says, and I agree: "I will listen to you complain about/praise the book. I will take typo oopsies if you catch any in the book. I will take invitations to write on your blog. I will talk with your book group. I will send free copies of books to libraries. If you have a favor to ask, ask it. Because trading favors is what makes the writing world go ’round.
7) Give the author hugs. Appreciative feedback encourages authors to write more and better books and stories.
In summary, to help your favorite author,
  1. buy books
  2. review someplace
  3. tell a friend
  4. give the author hugs
An Offer of Free Books to Reviewers
That summarizes (or plagiarizes) Dayles blog post. Now here's something of my own. I've always offered free books to reviewers, but don't always receive reviews in return. In fact, my experience and that of author colleagues is this: Only about one in three (1/3) of free books produce reviews. The rest produce nothing. So, I'm going to try a slightly different policy for reviewers, as follows:
  1. You obtain one of my books (buy, beg, borrow, steal, or find laying on the sidewalk).
  2. You read the book, or at least attempt to read it.
  3. You write a review telling what you think others ought to know about the book. The review can be long or short, favorable or unfavorable, serious or funny.
  4. You submit the review to be published somewhere, and also send a copy to me.
  5. As your reward, I will send you two of my ebooks—you have dozens of books to choose from. Tell me your choices when you send me the copy of your review. You will receive your reward a day or two later.
  6. (You can then repeat the process by reviewing one or both of your reward books. You can repeat until I have no more books with which to reward you.)
That's all there is to it. You can find all of my ebooks at <https://leanpub.com/u/jerryweinberg>—there are currently 44 books listed there, so you won't run out quickly.
And by the way, if I'm not one of your favorite authors, check with your favorites and ask what kind of reward they offer. The important thing is to support those writers you love, so the world can share your pleasure.