Baseball pitcher in the wind up

This is the seventh in a series of articles about how we can learn about software architecture by studying and comparing it to the sport of Baseball. This series was inspired by the book Management by Baseball.

“I don’t think baseball could survive without all the statistical appurtenances involved in calculating pitching, hitting and fielding percentages. Some people could do without the games as long as they got the box scores.” – John M. Culkin

Did you know that for the current season right handed batters that face CC Sabathia (the pitcher in the photo for this article) are batting .218 against him (which is a good opponents batting average for a pitcher)? But against left-handed batters his opponents’ batting average is a fantastic .136?

Did you know that the 1888 Washington Senators had the lowest team batting average when they batted a mere .207?

Did you know that Cal Ripken, Jr., who is best known as baseball’s “iron man” for playing in 2,632 consecutive games, is also the all time leader for grounding into double plays with 350 of them over his career?

All professional sports have a battery of statistics that are tracked as part of the game and the subsequent reporting of those games, but most people would agree that baseball statistics are far more numerous and are actually a part of the fabric of the game. One of the most fascinating aspects is how we use statistics to compare players from different eras. Because the game has evolved slowly over the years and has not been radically redefined along the way, we are able to make meaningful comparisons. As a result we can meaningfully compare the consecutive game hitting streak that Joe DiMaggio had in the 1940s, with the one that Willie Keeler had in the 1890s to the one that Pete Rose had in the 1970s.

More than just being a history lesson and interesting footnotes to articles about games, statistics are actually used in the decision making process before and during a game. A manager (as an example) may look at the matchup between the batters in his lineup and how they have fared against the probably pitchers for that day’s game. He may chose to give a batter the day off as a result of this. The general manager will also use statistics to judge whether they should draft a player, or sign a particular player as a free agent.

As much as statistics are important in baseball, we should remember the role of statistics is to enhance the experience, not to become the experience. Don’t spend all your time computing the batting average of the player coming up to bat with runners in scoring position (RISP) with less than 2 outs, which is a statistic that you will hear of frequently. This quote says it best:

“Baseball isn’t statistics – baseball is (Joe) DiMaggio rounding second.” – Jimmy Breslin

Software Architecture Statistics

Software engineering has some really interesting statistics that are used as part of the development process. Some of the most common or the most interesting statistics are:

  • Lines of code – when we are working on an application, particularly one that already exists (a legacy application), the number of lines of code is an interesting piece of data. For example, if you are going to make major changes to an application it will be a lot easier to change one that has 100,000 lines of code than one with 2,000,000 lines of code (just due to the sheer size of the application).
  • Cyclomatic Complexity – measures how complicated the application is by analyzing the number of paths through the program. This adds some “depth” to the lines of code statistic, because a program that is 50,000 lines of code can be much more complex than one that is 100,000 lines of code.
  • Function Points – a way to calculate the size of an application by how many things it does. Function points are very user centric, so “back end” applications are skewed a bit in the number of function points.

While software architecture has some interesting statistics, we as an industry do not use them as much as we should (this statement is part opinion and part anecdotal observation). We also fail horribly as an industry to track statistics over time, so that we can do meaningful comparisons to past projects. The best example of past experience would be comparing estimates to actuals on a project and comparing an upcoming release to a past release. Generally those kinds of statistics are not captured in a meaningful fashion, although modern Application Lifecycle Management and Agile Software Development tools and techniques are changing that for the positive.

As we start to embrace more and better statistics in software architecture, we need to keep in mind that software architecture is still a very human process. Statistics can help us judge our actions and make meaningful decisions, but the statistics cannot become the outcome of the process.

“If you dwell on statistics, you get shortsighted. If you aim for consistency, the numbers will be there at the end.” – Tom Seaver