Hi, Any pointers to research that can shed light on the robustness/vulnerability of *particular* code metrics in the context of modeling: - effort required - "code correctness" (non-bugginess) And, in practice (for shops regularly *using* metrics in their development/test models), whether the type of metric employed has a measurable impact on the coding styles of developers (consciously or subconsciously). I.e., do they try to "game" the system? Thx, --don
Code metrics
Started by ●March 7, 2015
Reply by ●March 7, 20152015-03-07
On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote:> Hi, > > Any pointers to research that can shed light on the > robustness/vulnerability of *particular* code metrics in the context of > modeling: > - effort required - "code correctness" (non-bugginess) > > And, in practice (for shops regularly *using* metrics in their > development/test models), whether the type of metric employed has a > measurable impact on the coding styles of developers (consciously or > subconsciously). I.e., do they try to "game" the system?I don't know if things have changed in the last decade or so, but the last time I really paid attention to this, it was felt that _any_ code metric could be gamed. Search on "Dilbert" and "write me a new mini-van". There's one that measured overall code complexity that my 2nd favorite manager of all time really put a lot of reliance on -- and he was my absolute favorite manager when it came to software issues and developing his people. I absolutely can't remember the name of it, though. -- www.wescottdesign.com
Reply by ●March 7, 20152015-03-07
Hi Tim, On 3/7/2015 2:52 PM, Tim Wescott wrote:> On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote:>> Any pointers to research that can shed light on the >> robustness/vulnerability of *particular* code metrics in the context of >> modeling: >> - effort required - "code correctness" (non-bugginess) >> >> And, in practice (for shops regularly *using* metrics in their >> development/test models), whether the type of metric employed has a >> measurable impact on the coding styles of developers (consciously or >> subconsciously). I.e., do they try to "game" the system? > > I don't know if things have changed in the last decade or so, but the > last time I really paid attention to this, it was felt that _any_ code > metric could be gamed. Search on "Dilbert" and "write me a new mini-van".But, what do they *gain* by doing so? I.e., hindsight indicates how "correct" their code was along with how long it took to write. All their efforts do is obfuscate any data that *might* be used to help make these predictions ahead of time. I.e., if you actively *track* evolving code metrics along with completeness and correctness, gaming the numbers just lets someone *later* say, effectively, developer X's metrics are worth LESS (predictively) than developer Y's.> There's one that measured overall code complexity that my 2nd favorite > manager of all time really put a lot of reliance on -- and he was my > absolute favorite manager when it came to software issues and developing > his people. I absolutely can't remember the name of it, though.
Reply by ●March 7, 20152015-03-07
Don Y <this@is.not.me.com> wrote:> On 3/7/2015 2:52 PM, Tim Wescott wrote: >> On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote:>>> Any pointers to research that can shed light on the >>> robustness/vulnerability of *particular* code metrics in the context of >>> modeling: >>> - effort required - "code correctness" (non-bugginess)(snip)>> I don't know if things have changed in the last decade or so, but the >> last time I really paid attention to this, it was felt that _any_ code >> metric could be gamed. Search on "Dilbert" and "write me a new mini-van".> But, what do they *gain* by doing so? I.e., hindsight indicates how > "correct" their code was along with how long it took to write. > All their efforts do is obfuscate any data that *might* be used to > help make these predictions ahead of time.Both might be correct, one might be better or faster or easier to update. Some years ago, I was working on a program written by someone else (who may or may not have had any code metrics) that had a long series of IF statements, each with one assignment, where I would have written a loop and a look-up table. The loop and look-up table might be three lines, the series of IF statments was 16, but could have been much more, or a little less. If someone is paid by lines of code produced, or had a productivity measurement done by lines of code per day, they might have incentive to write the IF statements.> I.e., if you actively *track* evolving code metrics along with > completeness and correctness, gaming the numbers just lets someone > *later* say, effectively, developer X's metrics are worth LESS > (predictively) than developer Y's.If you have software to measure code complexity, I suppose, but for a large project it is likely too hard to compare. By the time it comes to updating, or otherwise keeping old code running, it will be long forgotten who wrote it and why.>> There's one that measured overall code complexity that my 2nd favorite >> manager of all time really put a lot of reliance on -- and he was my >> absolute favorite manager when it came to software issues and developing >> his people. I absolutely can't remember the name of it, though.There are many stories in "Mythical Man Month" about OS/360, some related to byte targets. When RAM (magnetic core) cost on the order of $1/byte, keeping things small was pretty important. But without the appropriate metric, the result was moving things to places where they weren't counted. Many important OS contol blocks are in user space, where other OS might have kept them in system space. I suspect it is about as hard to do the metric right as it is to write the software in the first place. -- glen
Reply by ●March 7, 20152015-03-07
Hi Glen, On 3/7/2015 5:55 PM, glen herrmannsfeldt wrote:> Don Y <this@is.not.me.com> wrote: > >> On 3/7/2015 2:52 PM, Tim Wescott wrote: >>> On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote: > >>>> Any pointers to research that can shed light on the >>>> robustness/vulnerability of *particular* code metrics in the context of >>>> modeling: >>>> - effort required >>>> - "code correctness" (non-bugginess) > > (snip) > >>> I don't know if things have changed in the last decade or so, but the >>> last time I really paid attention to this, it was felt that _any_ code >>> metric could be gamed. Search on "Dilbert" and "write me a new mini-van". > >> But, what do they *gain* by doing so? I.e., hindsight indicates how >> "correct" their code was along with how long it took to write. >> All their efforts do is obfuscate any data that *might* be used to >> help make these predictions ahead of time. > > Both might be correct, one might be better or faster or easier to > update. > > Some years ago, I was working on a program written by someone else > (who may or may not have had any code metrics) that had a long series > of IF statements, each with one assignment, where I would have written > a loop and a look-up table. The loop and look-up table might be > three lines, the series of IF statments was 16, but could have been > much more, or a little less. > > If someone is paid by lines of code produced, or had a productivity > measurement done by lines of code per day, they might have incentive > to write the IF statements.Yes, but are folks *really* paid/incentivized that way? As a regular employee, it was always about the *job* -- no one evaluated your daily effort. As a contractor, it was either "per job" or "per hour" payment. Again, no one knew if you wrote 5 lines of code for that job/hour or 50,000!>> I.e., if you actively *track* evolving code metrics along with >> completeness and correctness, gaming the numbers just lets someone >> *later* say, effectively, developer X's metrics are worth LESS >> (predictively) than developer Y's. > > If you have software to measure code complexity, I suppose, but for a > large project it is likely too hard to compare. By the time it comes > to updating, or otherwise keeping old code running, it will be long > forgotten who wrote it and why.Yes. My point was there's no reason to "game" the numbers ahead of time (i.e., prior to completion when "all the results are in"). [This assumes there is no other incentive, see above]>>> There's one that measured overall code complexity that my 2nd favorite >>> manager of all time really put a lot of reliance on -- and he was my >>> absolute favorite manager when it came to software issues and developing >>> his people. I absolutely can't remember the name of it, though. > > There are many stories in "Mythical Man Month" about OS/360, some > related to byte targets. When RAM (magnetic core) cost on the order > of $1/byte, keeping things small was pretty important. But without > the appropriate metric, the result was moving things to places where > they weren't counted. Many important OS contol blocks are in user space, > where other OS might have kept them in system space.Sure! Ages ago, a network connection was priced based on the class of machine sitting on the wire. So, put a dinky PC there and hide your VAXen behind it! :>> I suspect it is about as hard to do the metric right as it is to > write the software in the first place.I'm not sure you have to get it "right". E.g., after the fact, you can (try to) correlate correctness, effort, etc. to data that you collected *during* the process. E.g., we each have our own schemes for estimating effort a priori. And, some of us track "defects" during the process to get a feel for "how close to done" we are. The question boils down to the value of having some measure that can be applied to evaluating projects before hand as well as along the way. Instead of just "winging it". *But*, if folks are going to change their behaviors midstream, this would tend to invalidate those observations: "I've been losing weight at a rate of 1 pound per week. But, I changed my diet, yesterday..." :<
Reply by ●March 8, 20152015-03-08
On Sat, 07 Mar 2015 18:34:41 -0700, Don Y <this@is.not.me.com> wrote:>Hi Glen, > >On 3/7/2015 5:55 PM, glen herrmannsfeldt wrote: >> Don Y <this@is.not.me.com> wrote: >> >>> On 3/7/2015 2:52 PM, Tim Wescott wrote: >>>> On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote: >> >>>>> Any pointers to research that can shed light on the >>>>> robustness/vulnerability of *particular* code metrics in the context of >>>>> modeling: >>>>> - effort required >>>>> - "code correctness" (non-bugginess) >> >> (snip) >> >>>> I don't know if things have changed in the last decade or so, but the >>>> last time I really paid attention to this, it was felt that _any_ code >>>> metric could be gamed. Search on "Dilbert" and "write me a new mini-van". >> >>> But, what do they *gain* by doing so? I.e., hindsight indicates how >>> "correct" their code was along with how long it took to write. >>> All their efforts do is obfuscate any data that *might* be used to >>> help make these predictions ahead of time. >> >> Both might be correct, one might be better or faster or easier to >> update. >> >> Some years ago, I was working on a program written by someone else >> (who may or may not have had any code metrics) that had a long series >> of IF statements, each with one assignment, where I would have written >> a loop and a look-up table. The loop and look-up table might be >> three lines, the series of IF statments was 16, but could have been >> much more, or a little less. >> >> If someone is paid by lines of code produced, or had a productivity >> measurement done by lines of code per day, they might have incentive >> to write the IF statements. > >Yes, but are folks *really* paid/incentivized that way? As a regular >employee, it was always about the *job* -- no one evaluated your daily >effort. As a contractor, it was either "per job" or "per hour" payment. >Again, no one knew if you wrote 5 lines of code for that job/hour or >50,000!It has happened. In the early 80s I did some work in a shop where the new programming manager instituted LOCs as a productivity metric, which then factored into raises and bonuses. We were contractors, so I was often able to ignore some of the politics. But when some of the less bright bulbs working for the company started using their suddenly massively improved "productivity" to take pot-shots at *us*, and we were officially notified that our performance was deficient on that basis, it took only a few printouts of before and after versions of programs* from the VCS to cause *quite* the ruckus. Needless to say I was not popular with the afore mentioned dim bulbs, or the manager in question, but most of the programmers loved me for it (they were obviously being screwed by these guys too), and corporate used us for years for straight answers. *This may be the only time I ever thought that the English-like readability** of Cobol was actually an asset. When the "after" version had the same exact Cobol sentence spread over more lines, it was pretty obvious what was happening, even if you had no clue about programming. **Such as it is.
Reply by ●March 8, 20152015-03-08
On Sat, 07 Mar 2015 18:34:41 -0700, Don Y <this@is.not.me.com> wrote:>On 3/7/2015 5:55 PM, glen herrmannsfeldt wrote: > >> If someone is paid by lines of code produced, or had a productivity >> measurement done by lines of code per day, they might have incentive >> to write the IF statements. > >Yes, but are folks *really* paid/incentivized that way? As a regular >employee, it was always about the *job* -- no one evaluated your daily >effort. As a contractor, it was either "per job" or "per hour" payment. >Again, no one knew if you wrote 5 lines of code for that job/hour or >50,000!Writing fast and sloppy has been glorified and institutionalized through the use of "agile" methods, rapid releases, push updating, and using the customer as unpaid testers. "Hey Joe! Customer had a problem doing ______" "No problem! I'll push a fix in the morning update." George --- Weinberg's Second Law: If builders built buildings the way programmers wrote programs, then the first woodpecker that came along would destroy civilization.
Reply by ●March 8, 20152015-03-08
Hi George, On 3/8/2015 12:00 AM, George Neuner wrote:> On Sat, 07 Mar 2015 18:34:41 -0700, Don Y <this@is.not.me.com> wrote: >> On 3/7/2015 5:55 PM, glen herrmannsfeldt wrote: >> >>> If someone is paid by lines of code produced, or had a productivity >>> measurement done by lines of code per day, they might have incentive >>> to write the IF statements. >> >> Yes, but are folks *really* paid/incentivized that way? As a regular >> employee, it was always about the *job* -- no one evaluated your daily >> effort. As a contractor, it was either "per job" or "per hour" payment. >> Again, no one knew if you wrote 5 lines of code for that job/hour or >> 50,000! > > Writing fast and sloppy has been glorified and institutionalized > through the use of "agile" methods, rapid releases, push updating, and > using the customer as unpaid testers. > > "Hey Joe! Customer had a problem doing ______" > "No problem! I'll push a fix in the morning update."But the numbers you'd gather from *that* effort would (roughly) translate to a similar effort undertaken in that same style. And, would give you a way of comparing a *different* development style to that one with measurable results: "Yeah, we got the code to the user a lot quicker, the old way. But, it ended up more complex and more costly and we had a customer grumbling all the time we were issuing those endless updates! -- 'when is this thing going to be *done*?'"
Reply by ●March 8, 20152015-03-08
Hi Robert, On 3/7/2015 11:50 PM, Robert Wessel wrote:> On Sat, 07 Mar 2015 18:34:41 -0700, Don Y <this@is.not.me.com> wrote: >> On 3/7/2015 5:55 PM, glen herrmannsfeldt wrote: >>> Don Y <this@is.not.me.com> wrote: >>> >>>> On 3/7/2015 2:52 PM, Tim Wescott wrote: >>>>> On Sat, 07 Mar 2015 11:55:34 -0700, Don Y wrote: >>> >>>>>> Any pointers to research that can shed light on the >>>>>> robustness/vulnerability of *particular* code metrics in the context of >>>>>> modeling: >>>>>> - effort required >>>>>> - "code correctness" (non-bugginess)>>>>> I don't know if things have changed in the last decade or so, but the >>>>> last time I really paid attention to this, it was felt that _any_ code >>>>> metric could be gamed. Search on "Dilbert" and "write me a new mini-van". >>> >>>> But, what do they *gain* by doing so? I.e., hindsight indicates how >>>> "correct" their code was along with how long it took to write. >>>> All their efforts do is obfuscate any data that *might* be used to >>>> help make these predictions ahead of time. >>> >>> Both might be correct, one might be better or faster or easier to >>> update. >>> >>> Some years ago, I was working on a program written by someone else >>> (who may or may not have had any code metrics) that had a long series >>> of IF statements, each with one assignment, where I would have written >>> a loop and a look-up table. The loop and look-up table might be >>> three lines, the series of IF statments was 16, but could have been >>> much more, or a little less. >>> >>> If someone is paid by lines of code produced, or had a productivity >>> measurement done by lines of code per day, they might have incentive >>> to write the IF statements. >> >> Yes, but are folks *really* paid/incentivized that way? As a regular >> employee, it was always about the *job* -- no one evaluated your daily >> effort. As a contractor, it was either "per job" or "per hour" payment. >> Again, no one knew if you wrote 5 lines of code for that job/hour or >> 50,000! > > It has happened.Does it *still* happen? Haven't people learned anything in the 30 years since? Code reviews?? (or, do you bribe your peers?? :> )> In the early 80s I did some work in a shop where the > new programming manager instituted LOCs as a productivity metric, > which then factored into raises and bonuses.But that's just an example of a PHB who doesn't understand the technology he's managing! It wouldn't take much effort to talk him/her into a corner: "So, all that matters is how MUCH code I write, correct? Whether it works or not isn't a factor? Likewise, efficiency?" followed by a trivial example: ASSERT(i>0, j>0) product = 0 for (i = multiplier; i > 0; i--) { for (j = multiplicand; j > 0; j--) { product++; } }> We were contractors, so I was often able to ignore some of the > politics. But when some of the less bright bulbs working for the > company started using their suddenly massively improved "productivity" > to take pot-shots at *us*, and we were officially notified that our > performance was deficient on that basis, it took only a few printouts > of before and after versions of programs* from the VCS to cause > *quite* the ruckus.Was this a result of them manipulating their programming styles (and, thus, metrics)? Or, were they just sloppier coders to begin with? I.e., this leads to one of several outcomes -- most of which are bad for the organization (and, immediately, the "manager" involved): - fire you less productive people and let the "stars" do it all (i.e., shittier developers) - let those with the lower (better) metrics learn how to inflate their metrics (i.e., shittier code) - rearrange the task allocation so the "stars" can take on the tougher responsibilities -- as they are obviously more capable of doing so! (i.e., shittier result -- time, money, quality, etc.) It's just a typical short-term anomaly that comes back to bite folks in the end. E.g., the "stars" are now stuck ALWAYS writing shitty (and shittier!) code lest their metrics start to drop. Even an idiot soon realizes that he's got to "produce" come the end of the day. "You've written 27MB of sources and the product still just sits there 'initializing memory'..."> Needless to say I was not popular with the afore mentioned dim bulbs, > or the manager in question, but most of the programmers loved me for > it (they were obviously being screwed by these guys too), and > corporate used us for years for straight answers.I don't think it makes sense to compare metrics between developers. Even assuming "honest" folks, there are just too many variables in style and problem/development domain. Two functionally equivalent approaches to a problem could have significantly different metrics. OTOH, I would think individual developers would benefit from knowing how their coding style, etc. pans out *quantitatively*: "Oh, I don't keep score, Judge" "But how do you measure yourself with other golfers?" "By height." People are notoriously bad at remembering how much previous efforts "cost": "It was a three month effort..." "Yeah, but were you working on it EXCLUSIVELY for those three months?" And, few places seem to actively quantify bug detection and removal rates: "We finished in just over 6 months..." "Yeah, but you were still encountering a bug-per-day at that time. How do you consider that 'finished'? Just because the boss pulled the plug at that point??" Do you know if a refactor *really* gained you anything -- in terms of performance, correctness, complexity reduction, etc.? Do you know if a change in your coding style produces *measurable* improvement? (i.e., should you even bother recommending it to others?)> *This may be the only time I ever thought that the English-like > readability** of Cobol was actually an asset. When the "after" > version had the same exact Cobol sentence spread over more lines, it > was pretty obvious what was happening, even if you had no clue about > programming. > > **Such as it is. >
Reply by ●March 8, 20152015-03-08
Don Y wrote:> Hi, > > Any pointers to research that can shed light on the > robustness/vulnerability of *particular* code metrics in the context of > modeling: - effort required > - "code correctness" (non-bugginess) > > And, in practice (for shops regularly *using* metrics in their > development/test models), whether the type of metric employed has a > measurable impact on the > coding styles of developers (consciously or subconsciously). I.e., do > they try to "game" the system?There is:- "The Impact of fault models on software robustness" see <http://dl.acm.org/citation.cfm?id=1985793.1985801> "Choosing Error Models for OS Robustness Evaluations" by Stefan Winter see <http://citeseer.ist.psu/viewdoc/download?doi=10.1.1.159.2001&rep=rep1&type=pdf> "Exception Handling" by Charles P. Shelton see <http://users.ece.cmu.edu/~koopman/des_s99/exceptions/> (note thatthis is a paper by one of Phil Koopmans Students. I hope they are useful for what you need. I rather try and remove problems in the software by getting the requirements specification de-bugged first. I find that specs that result in Clear, Concise, Correct, Complete, Coherent and Confirmable statements of requirements will lead to many fewer problems in the code. This requires that even the cyclomatic complexity of the requirements specifications should be minimised. I gather metrics, through the review process, for all problems found in specifications and designs and, like any well managed project, solving the problems early in the lifecycle has a massive benefit. Which is why projects need a good deal of "front-loading" in order to produce a good plan. It seems to me that Systems Engineering and Project management have a lot in common in that respect. -- ******************************************************************** Paul E. Bennett IEng MIET.....<email://Paul_E.Bennett@topmail.co.uk> Forth based HIDECS Consultancy.............<http://www.hidecs.co.uk> Mob: +44 (0)7811-639972 Tel: +44 (0)1392-426688 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************







