# Code Metrics - SLOC Count

August 19, 2013

Many programmers will start having flashbacks at the title of this article because it contains the words 'metrics' and 'SLOC'.  Newer programmers are probably wondering what all of the fuss is about - most probably have no negative connotations with the term 'code metrics' and some may not even know what SLOC is.  While there is much baggage associated with metrics and SLOC you shouldn't be afraid to gather fundamentally useful data such as SLOC count from your programming projects and there are (free) tools out there to help you do just that.

First, some background for everyone who is confused.  SLOC stands for Source Lines Of Code.  It's essentially a count of how many useful (i.e., not comments, not blank) lines of code a particular project contains. It is often used as an indicator of complexity of a project - the more SLOC, the more complex something is. But what exactly are we counting?

First off, I'm discussing this with the C programming langauge in mind.  The general ideas will hold for other languages although there will always be edge cases that C won't have.  The first indicator of lines of code we have is file size.  The larger a source file is, the more it must contain.  While that's true, it's not helpful.  In C one line is generally one statement, one action.  What we're aiming for is a rough count of how many actions a particular bit of code performs. The file size only tells us how many characters it contains, not how many lines.  So let's get more complex - we'll count the number of lines by counting the number of newline character the file contains - that definitely gives us the number of lines, no? Not exactly.  Consider these two bits of source code:

if(a==1) b=2;

if(a==1)
b=2;


If you assume this is C, a trained eye will be able to determine that these two bits of code perform equivalent actions.  However, the second snippet has more newline characters.  That increases its Physical Lines Of Code (PLOC) measurement.  However, since both of them do the same thing, they have the same number of Logical Lines Of Code. Logical Lines Of Code take into account how a particaular language actually works to determine the number of statements and thus, the complexity. C doesn't care nearly as much about whitespace as Python does.  As a result, you can't use the same strategy to calculate LLOC for C as you can for Python.  When counting lines of code to determine complexity, LLOC is a better measure than PLOC.

Now earlier I said that the more SLOC a project has, the more complex it generally is. This simple idea has spawned a variety of objectionable strategies in some business people's brains (this is where the flashbacks come into play). For example, if you're a programmer and you write one million SLOC you obviously did more work than someone who wrote 10,000 SLOC, correct?  That seems to make sense.  And if there are two programmers on a team where one writes 1000 SLOC in a month and the other writes 500 who wins?  Managment is always looking out for poor performers. Convenient tools such as SLOC count often give managment types justification for denying bonuses, raises or even firing people.  But as convenient as it is, SLOC count is not definitive for all purposes.  A good programmer can solve a problem in fewer lines of code than a less-experienced (or even, dare I say, worse) programmer. Additionally, complexity is often very undesirable when writing code. More code means more opportunities for bugs, poor, large (high-SLOC) alrgorithms are often slow and difficult.  If the only metric that is used to evaluate programmers is SLOC, management tends to promote slow, verbose code written by tedious and ineffective programmers. If this is kept up long enough programmers who create efficient and succinct code are chastised and eventually laid off when they don't make the 'grade' like their comrades do. Queue angry flashbacks.

Still, SLOC is not a fundamentally evil tool - especially when used by someone who is aware of its shortcomings.  Consider that even if you're a programmer, you're not always writing software from scratch. Sometimes you have to work with existing codebases.  If you're a contractor or work at a company where you're in charge of negotiating contracts you need tools like SLOC to determine the relative difficulty of working with a codebase so you can request the proper amount of money from a client. With metrics like SLOC count on your side you have a justification for the entirely legitimate (and very profitable) prices you charge your clients.

So how can you count SLOC?  There are a number of programs out there that can do it for you. Some are free, some are paid, some are closed-source, some are open-source. As a poor and impatient programmer I always tend towards utilities that are free (as in beer), open-source and easily available on SourceForge.  In this case, my preferred tool is CLOC (which, oddly enough, stands for Count Lines Of Code).  You can find it here. In addition to the aforementioned reasons, I like it because it is command-line, easy to use, can generate a variety of different reports, and works on a surprising number of languages.  There are Windows and Linux executables available and, of course, the source as well.

It's surprisingly easy to use. Once the exectuable is on your computer you need only specify the input files and options and it will generate a raw PLOC count for the files you specified:

cloc-1.5.8 ./*.c ./*.h

20 text files.

classified 20 files 20 unique files.

1 file ignored.

http://cloc.sourceforge.net v 1.58 T=1.0 s (19.0 files/s, 7645.0 lines/s)

-------------------------------------------------------------------------------

Language files blank comment code

-------------------------------------------------------------------------------

C 7 904 2252 1953

C/C++ Header 9 207 605 213

-------------------------------------------------------------------------------

SUM: 13 1111 2857 2166

-------------------------------------------------------------------------------

There are, of course, more advanced output options.  While this breaks the output down by file type, you can also break it down by individual files:

cloc-1.5.8 ./*.c ./*.h --by-file

20 text files.

classified 20 files 20 unique files.

1 file ignored.

http://cloc.sourceforge.net v 1.58 T=0.5 s (38.0 files/s, 15290.0 lines/s)

----------------------------------------------------------------------------------------------

File blank comment code

----------------------------------------------------------------------------------------------

temp_const.c 18 40 1029

main.c 546 1414 630

crc.c 123 262 120

bit.c 108 305 84

main.h 28 150 73

common.h 50 171 71

common.c 33 58 50

rs232.c 52 116 35

bit.h 22 50 16

compile.h 22 34 14

calibration.h 15 56 11

crc.h 23 55 11

rs232.h 15 31 9

ver.c 24 57 5

ver.h 21 40 4

temp_const.h 11 18 4

----------------------------------------------------------------------------------------------

SUM: 1111 2857 2166

----------------------------------------------------------------------------------------------

Obviously, these will look better on the command line or in a real text editor.

If you need it, there are advanced features. You can reassign the automatic file extension recognition (in case you don't use standard extensions like .C for C source), you can specify files to ignore, use a variety of different output formats, compare diffs of different sources and many more options. I encourage you to download it and play around.  SLOC is a fundamental metric you can use to help become a better programmer and CLOC can help you do it.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.