Important Programming Concepts (Even on Embedded Systems) Part IV: Singletons

Jason SachsNovember 11, 20142 comments

Other articles in this series:

Today’s topic is the singleton. This article is unique (pun intended) in that unlike the others in this series, I tried to figure out a word to use that would be a positive concept to encourage, as an alternative to singletons, but it seems like there really isn’t one. The closest I could come up with is reentrancy or modularity, but neither of these is quite the same topic.

Anyway, I’m going to cover singletons in a few different ways.

Singletons, Design Patterns, and the GoF

Back in 1994, an influential book called Design Patterns was written by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides, known as the Gang of Four (GoF). At the time, Java, Ruby, and Javascript didn’t exist yet, object-oriented programming (OOP) had been a buzzword for a few years, C++ was on the ascent, Microsoft was pushing OLE repackaged as COM, and the Standard Template Library was a fresh new part of the C++ standard. Design Patterns was an eye-opening book full of recipes for how to use various common patterns in OOP — not just a contrived class hierarchy of animals with methods for making noises, but a fresh look at how to use OOP. Objects weren’t just ways to store your application data and couple methods along with them. They were also ways to decouple aspects of software design that were really orthogonal to each other — you just had to figure out how to look for those orthogonalities. (And, of course, it had pictures! UML diagrams! We learned that’s what you drew if you were serious about OOP.)

In any case, one of the Design Patterns is called the Singleton. The Singleton (note the capitalization) is an intentionally-unique object in a software application; by design, the software allows only one instance to exist. The Design Patterns book, in keeping with its methodology for describing all of the various patterns, explained the Singleton Pattern, including its characteristics and a whole discussion of various ways of implementing one in C++. The motivation for the Singleton Pattern, according to the GoF, is as follows:

It’s important for some classes to have exactly one instance. Although there can be many printers in a system, there should be only one printer spooler. There should be only one file system and one window manager. A digital filter will have one A/D converter. An accounting system will be dedicated to serving one company.

How do we ensure that a class has only one instance and that the instance is easily accessible? A global variable makes an object accessible, but it doesn’t keep you from instantiating multiple objects.

A better solution is to make the class itself responsible for keeping track of its sole instance. The class can ensure that no other instance can be created (by intercepting requests to create new objects), and it can provide a way to access the instance. This is the Singleton pattern.

So now, if you mention the word Singleton in the context of software engineering, there is an enduring association with the Design Patterns book.

My approach, on the other hand, is going to be a discussion of singletons (note the lack of capitalization) in general. Besides the Singleton Pattern — which specifically refers to an OOP class managing its own instance and preventing additional instances from being created — there are some other aspects of unique objects in programming and software engineering.

Singletons as a Practical Concept

All this hubbub about the Singleton Pattern tends to get stuck in OOP-speak. Instances, factories, constructors, subclassing, blah blah blah.

If you’ve been programming in C rather than a higher-level language, you may be more familiar with the term global variable, which is an example of a singleton. Both have negative connotations, and I’ll talk about why that is. The basic idea, though, is that singletons are modularity killers. Modularity is generally good, therefore singletons are generally bad.

The discussion of singletons is a little bit different in high-level and low-level languages. I’m going to get the high-level stuff out of the way first; those of you interested in embedded systems should be patient (or skip the next section, if you must).

Singletons in high-level languages

I’m going to use Java as an example here, since I’m more familiar with it than other languages. (The ideas discussed in this section are also valid in C++, to some extent.) In Java, there are a couple of ways something can be a singleton:

  • static member of a class
  • allowing only one instance of a class
  • tight coupling to one particular class

Static class members

Because in Java everything is associated with a class, the only type of data that can be a singleton is a static member of a class. So here’s a good example of a very bad technique using static member data. I’m going to blame this example on someone I’ll call Ornery Mungecraft, who may or may not have been a former coworker.

public class MyRandomGenerator
{
   static private int state = initialize();
   static public int getInt() { 
     state = iterate(state);
     return state;
   }

   /* Initialize random state */
   static private int initialize() {
     /* details omitted */
   }

   /* Compute next state from current state */
   static private int iterate(int oldState) {
     /* details omitted */
   }
}

This is bad on so many levels. First of all everything shares the same global state; there’s no way to instantiate a private copy, which means that different consumers of the random number generator will affect each other. The shared state also means this random number generator is not threadsafe. For example, suppose we have this sequence of events:

  • Thread 1 reads a local copy of integer #k from state
  • Thread 2 reads a local copy of integer #k from state
  • Thread 1 uses its local copy of integer #k to compute integer #k+1, writes it back to state, and uses it
  • Thread 1 reads a local copy of integer #k+1 from state
  • Thread 1 uses its local copy of integer #k+1 to compute integer #k+2, writes it back to state, and uses it
  • Thread 2 uses its local copy of integer #k to compute integer #k+1, writes it back to state, and uses it
  • Thread 1 reads a local copy of integer #k+1 from state
  • Thread 1 uses its local copy of integer #k+1 to compute integer #k+2, writes it back to state, and uses it

Uh oh — thread 1 generates the same random number twice!

We could take care of the thread safety of this class by adding synchronization, but that’s just putting lipstick on a pig. There are other disadvantages:

  • We don’t have much control over the initialization of the static state variable: initialize() gets called when the class is first loaded. This means it’s a race between usages of the class.

  • We can’t test this class without affecting the other places it is used. Testing techniques often utilize controlled setup of state variables: we might wish to go in and set state to something we choose, then run getInt() a few times and look at the output. Yeah, we could save the initial value of state, do our dirty work for testing, and then put state back the way we found it. And we could do this all in a synchronized block to keep things threadsafe. But that’s more lipstick on this ugly pig.

The more important question here is, why are static members and methods used at all? Why the hell didn’t the author write it this way:

public class MyRandomGenerator
{
   private int state = initialize();
   public int getInt() { 
     state = iterate(state);
     return state;
   }

   /* Initialize random state */
   private int initialize() {
     /* details omitted */
   }

   /* Compute next state from current state */
   private int iterate(int oldState) {
     /* details omitted */
   }
}

This is the exact same code, but with the keyword static omitted. Now anyone can use their own private instance of the random number generator. It’s still not threadsafe to share one instance, but that’s an odd thing to do anyway.

The takeaway here is that anytime you are writing code that has static member data, ask yourself why you’re doing it. In most cases it would be better to change it to instance data, and if you want to distinguish between the instance data of a single object and some private data corresponding to a group of objects, put that private data into a separate factory class.

Instance singletons

Now in some cases, programmers insist on being able to ensure there is only ever one instance of a class. (And here we are with the Singleton Pattern again.) Here’s an example in Java of a technique that mostly works:

public class MyClass
{
   static private MyClass INSTANCE = createInstance();
   static public MyClass getInstance() { return INSTANCE; }
   static private MyClass createInstance() { ... }

   /* constructor */
   private MyClass() { ... }
}

The reasons why this is not a bulletproof way of creating a singleton object are kind of arcane, and are discussed in Joshua Bloch’s book Effective Java and this article.

The bigger question is why do you require a singleton anyway? If it’s to prevent usage of a single resource (like a file or a computer peripheral), use synchronization techniques to allow only one object access to that resource. If it’s to share one object of class A with shared state among many instances of class B, then use a Factory class to create all the class B objects, and have that factory provide the same instance of class A to each of those class B objects. There are probably some obscure reasons to have a true singleton instance, but I don’t know what they are, and if you do, then you’re probably experienced enough to implement it correctly.

Tight coupling to a particular class

Remember this rule in high level languages: Always, always, always write your programs around specific interfaces rather than specific implementations. The interface defines how an object behaves. But allowing different implementations of that interface gives you modularity and prevents a lot of the tight coupling and dependency problems that arise in complex programs. In other words, instead of this:

// MyRandomGenerator.java
public class MyRandomGenerator
{
   public int getInt() { ... }
   /* implementation omitted */
}

// RGConsumer.java
public class RGConsumer
{
   final private MyRandomGenerator rgen = new MyRandomGenerator();
   void someMethod()
   {
      int i = this.rgen.getInt();
      ...
   }
}

you should be doing this:

// RandomGenerator.java
public interface RandomGenerator
{
   public int getInt();
}

// MyRandomGenerator.java
public class MyRandomGenerator implements RandomGenerator
{
   @Override public int getInt() { ... }
   /* implementation omitted */
}

// RGConsumer.java
public class RGConsumer
{
   // constructor call (see text below)
   final private RandomGenerator rgen = new MyRandomGenerator();
   void someMethod()
   {
      int i = this.rgen.getInt();
      ...
   }
}

In the first example, every time rgen is used, the compiler has to look at the MyRandomGenerator class, and you have tight coupling with that class. In the second example, the only place we are coupled to the MyRandomGenerator implementation is in the constructor call. When we use the variable rgen, all we have to know is that it’s an instance of some class implementing the RandomGenerator interface and we don’t have to care at all about implementation details. But the constructor call is still a weakness. Even though there’s no global state here, this is still a kind of singleton, since RGConsumer has to know that the MyRandomGenerator class is a special single instance of a RandomGenerator.

There’s a school of thought in Java that says that you should avoid calling class constructors directly in your program, because this ties you to a specific class implementation of an interface, and can cause dependencies between separate libraries. If you want to get around this, read about dependency injection and Google Guice. With dependency injection, one way to do it is to pass in the specific instance in the constructor:

// RGConsumer.java
public class RGConsumer
{
   final private RandomGenerator rgen;
   public RGConsumer(RandomGenerator rgen)
   {
      this.rgen = rgen;
   }

   void someMethod()
   {
      int i = this.rgen.getInt();
      ...
   }
}

This puts the responsibility of choosing a particular implementation at a higher level in the application. It’s also a bit clumsy if you have lots of state variables.

Maybe I’ll come around to this way of thinking someday. I understand the motivation, but at the small scale of Java programs I write, it seems a little extreme.

Singletons in low-level languages

Okay, now let’s switch gears to using C, or possibly C++, in an embedded system. Here are the types of singletons we have to watch out for:

  • Static member variables (in C++)
  • Static variables in a function
  • Global variables
  • Machine registers (including peripheral registers)

Let’s handle these in order.

Static member variables (in C++)

This is the same exact use case as in Java, and it has the same problems. Anytime you find yourself using a static member variable, ask yourself whether you really need it.

Static variables in a function

Here’s an example of static variables in a function:

int randomIntInitializer()
{
   return someComplicatedFunctionOfTimeAndOtherStuff();
}
int getNextRandomInt()
{
   static int state = randomIntInitializer();

   int newState = someComplicatedFunctionOfOldState(state);
   state = newState;
   return newState;
}

Yuck! We have all the same problems of static member variables, but it’s even worse. At least in C++ or Java, we can access static variables from anywhere in their enclosing class. When you have a static variable in the body of a C function, the only place that can access that variable is within that function itself. We have no way of reading or writing state directly. OK, so we could add a mess of pig lipstick here and do this:

enum RandomOp { R_NORMAL, R_WRITE, R_READ };

int getNextRandomInt(enum RandomOp op, int forcedValue)
{
   static int state = randomIntInitializer();
   int newState;

   switch (op)
   {
      case R_READ:
         return state;
      case R_NORMAL:
         newState = someComplicatedFunctionOfOldState(state);
         break;
      case R_WRITE:
         newState = forcedValue;
         break;
   }
   state = newState;
   return newState;
}

and then call getNextRandomInt(R_NORMAL, 0) during normal usage, getNextRandomInt(R_WRITE, desiredState) when we want to write state, and getNextRandomInt(R_READ, 0) when we want to read state. Got it?

Seriously, you think that is ok? I hope not. It looks like some ramshackle hodgepodge function someone threw together. This version of getNextRandomInt() serves three different purposes, and requires you to pass in extra arguments that you won’t use 99% of the time. Yuck.

The use of static variables in C functions is one of those things that should probably be deprecated. Your compiler should warn about it, or even stop and report it as an error, unless you go out of your way to bless a static variable with the right incantations of #pragma or __attribute__ or something.

If you must use static variables, at least keep them out of functions.

Global variables

The use of global variables in C isn’t so bad; there’s a lot of debate about it, but there’s nothing inherently wrong if you intend to use a global definition of a variable, especially if all the global definitions are together and well-documented. It’s essentially an inventory of the top-level system state variables:

enum MyState mystate;          // current state of the state machine
int number_of_fools;           // current number of fools in the ship of fools
char *application_name =       // what application are we again? 
        "Reindeer Games";      // oh yeah.
struct BeerBottle bottles[99]; // Ninety-nine bottles of beer on the wall,
                               // ninety-nine bottles of beer;
                               // take one down, pass it around,
                               // ninety-eight bottles of beer on the wall.
Reindeer_t dasher, dancer, prancer, vixen,
           comet, cupid, donner, blitzen,
           rudolph;            // 9 cute little reindeer!

int main()
{
   ... top-level application logic goes here ...
}

And if you look at it this way, it’s really easy to determine whether something needs to be a global variable. If it’s part of the overall system state, it should be a global variable. If it’s not, and it’s just a temporary variable used in part of the program, it shouldn’t be a global variable. That’s not so hard.

Or is it?

Here’s the key point: if you really have a set of top level system state variables, why can’t you just move them inside main()?

int main()
{
   enum MyState mystate;          // current state of the state machine
   int number_of_fools;           // current number of fools in the ship of fools
   char *application_name =       // what application are we again? 
           "Reindeer Games";      // oh yeah.
   struct BeerBottle bottles[99]; // Ninety-nine bottles of beer on the wall,
                                  // ninety-nine bottles of beer;
                                  // take one down, pass it around,
                                  // ninety-eight bottles of beer on the wall.
   Reindeer_t dasher, dancer, prancer, vixen,
              comet, cupid, donner, blitzen,
              rudolph;            // 9 cute little reindeer!

   ... top-level application logic goes here ...
}

Now they belong to the topmost logic in an application. There are a few reasons not to do this. Two of them are semi-legitimate, and one is real, but one is a bad reason, and it illustrates why global variables are “bad”. Let’s start with the bad reason.

But it breaks my code if you move them into main()!

Let’s say I made the change above. The next day, my colleague Ornery Mungecraft came running into my office with an angry look on his face. “You broke my code!” he said. Here’s what he was using:

void add_new_fool()
{
   blink_nose(&rudolph);
   ++number_of_fools;       
}

When rudolph and number_of_fools were global variables, everything was fine. As soon as we moved them inside main(), we introduced a compile error.

“You should have tried compiling it first before you checked it back in!” said Ornery. OK, he was right about that. But the problem here isn’t the fact that we moved these variables; the problem is that Ornery was expecting to access them directly. He was hardcoding the use of rudolph and number_of_fools into his code. It’s much better to access them indirectly:

void add_new_fool(Reindeer_t *preindeer, int *pnumber_of_fools)
{
   blink_nose(preindeer);
   ++*pnumber_of_fools;       
}

Now we have an advantage: we have decoupled the code from the data it is acting upon. Application code and especially library code should be modular, so that we can act upon any of the reindeer, not just Rudolph, and so that we can write test code:

void normal_app_code()
{
   ...
   add_new_fool(&rudolph, &number_of_fools);
   add_new_fool(&blitzen, &number_of_fools);
   ...
}

void test_add_new_fool()
{
   Reindeer_t test_reindeer;
   int number_of_fools = 33;

   init_reindeer(&test_reindeer);
   add_new_fool(&test_reindeer, &number_of_fools);
   assert (number_of_fools == 34);
   check_that_nose_has_blinked(&test_reindeer);
}

So the problem with global variables isn’t really caused when we have global variables; it’s when we access them directly. Change your code to take in a pointer, and you’re fine.

But I have library code! You can’t put my global variables in main()!

If you’re writing library code or a module of functions, and you’re not in charge of the top-level application code, then it doesn’t make sense to move associated state from your library or module, and into main(). But the real question is why does your library need to keep around state variables?

In some cases, library state variables (which are akin to static class member variables in Java) are a legitimate need. And the better way to go about it is to declare them as static inside of one compilation unit, so they don’t risk collision in the global namespace, along with a minimum of functions that are allowed to access them directly:

static int reindeer_count;

int __get_reindeer_count() { return reindeer_count; }

Reindeer_t *allocate_reindeer()
{
  ++reindeer_count;
  Reindeer_t *result = (Reindeer *)malloc(sizeof(Reindeer_t));
  if (result != NULL)
  {
     init_reindeer(result);
  }
  return result;
}

But I want to debug my system and it’s more reliable when I use global variables!

Debugging is one of those things that sometimes changes the rules in programming. Things that don’t matter are different when you run your code inside a debugger, and in addition to what the software does normally, now you care about what you see when you step through it with a debugger.

One of them is that statically-allocated variables are rock-solid. Your debugger can always see them, no matter where you happen to be in the program. If rudolph and blitzen are global variables, and you’re 37 levels deep in the stack, you can still add them to a watch window in the debugger. They’re also guaranteed to exist in data memory at a fixed location. If you have local variables on the stack, on the other hand, and you turn up the optimization level of the compiler, sometimes the compiler is smart enough to put them into CPU registers or even to optimize them out altogether, and this may befuddle some debuggers.

volatile int count1 = 0;
void myfunc(*psum)
{
   int x = count1 + 77;
   int y = count1 + 34;

   *psum += x;
   ++count1;
}

In this case, count1 exists in memory. The compiler might decide to put x on the stack, or it might decide to use a CPU register for x. If it does use a CPU register for x, and you stop the debugger on the ++count1 line, the value of x might not exist anymore because the compiler has used that CPU register for other purposes. And the value of y isn’t used anywhere, so the compiler might decide it doesn’t actually need to allocate storage for y or compute count1 + 34.

Some debuggers are better than others. A good debugger will do the right thing: if x is stored in a register, and the value of x still exists there, and you are showing x in a watch window, the good debugger will obtain the value of x from the register; if x no longer exists, the good debugger will tell you that by displaying a message like No longer available or No longer in scope. A mediocre debugger might display ??? because it’s only capable of watching variables on the stack or in global memory, and it can’t reliably use the value of registers. A bad debugger might display the wrong answer and leave you wondering why x is some completely unexpected value; this wastes your time, and you’ll start to come up with a voodoo rule that you can’t run the debugger whenever you turn optimization on.

So, yes, sometimes it’s appropriate to use global variables for important program state during debugging. But this is only for debugging; if all you care about is what happens to a program during normal operation, it should make no difference whether variables are statically allocated, or whether they are local variables allocated inside main().

But I’m working with an interrupt service routine!

Ding ding ding ding! We have a winner! All the other complaints we’ve made are kind of wishy-washy. Yeah, we could get away without global variables, we just don’t really want to.

Interrupt service routines are different. The way most CPU architectures are defined, an ISR works like this: your application program is merrily executing one step at a time, and then BAM! some external signal comes in and triggers a specific ISR. Normal execution stops, and the program counter and maybe a couple of core machine registers are saved, and execution jumps to the ISR in question. ISRs are usually required to be functions with no arguments and no return value. Hardware design doesn’t seem to embrace functional programming, and takes a minimal approach instead: change the program counter, save the bare minimum of stuff, let the ISR do what it wants, and then resume our merrily executing program.

This means that if you want the ISR to do anything useful, you have to arrange to put data in a place where the ISR can get at it. And therefore at least one piece of data must be placed at a fixed address, like a dead drop waiting for an espionage agent to pick it up. You can let this piece of data contain an address pointing to other data located in the stack, or the heap, or some magic movable feast of a place where leprechauns cavort with unicorns and Jimmy Hoffa and Andy Kaufman and the Abominable Snowman. We don’t really care. But the only way the ISR can find this other data is if we put a pointer to it in a fixed location that the ISR knows about.

So chances are, if you’re working with microcontrollers, sooner or later you’re going to have to use a global variable when you work with an ISR.

Machine registers (including peripheral registers)

In microcontrollers we often use machine registers. And I don’t mean the numerical ones associated with the ALU. I’m talking about the ones that control some device function or peripheral. In devices from Microchip, these are called Special Function Registers or SFRs.

These registers are always singletons, because each of them is memory-mapped to a specific fixed address.

The right way to go about using them is to encapsulate them in a hardware abstraction layer (HAL) — instead of using some register TMR1 directly, use a library function. The library function will take care of the ugly register-specific logic.

If you’re working on C code in a microcontroller, and you need to write your own software to use hardware registers rather than using pre-existing library functions, make sure that references to hardware registers aren’t scattered around your code: create your own hardware abstraction layer in an isolated file, by making a series of functions to access them instead.

/*
 * BAD CODE
 * courtesy of Ornery Mungecraft, who doesn't work here anymore
 */
void frobozznicate()
{
    // read the glob count registers
    int32_t x = GLOBCTH;
    x <<= 16;
    x |= (uint32_t)GLOBCTL;

    x += rudolph.antler_point_count;

    // write them back
    GLOBCTH = x >> 16;
    GLOBCTL = x & 0xffff;
}

// point of use:
frobozznicate();

Ugh. Ornery really liked global variables.

/* 
 * Better code! Yay!
 */

/*
 * === begin hardware abstraction layer === 
 */

int32_t get_glob_count()
{
    int32_t x = GLOBCTH;
    x <<= 16;
    x |= (uint32_t)GLOBCTL;
    return x;
}

void set_glob_count(int32_t x)
{
    GLOBCTH = x >> 16;
    GLOBCTL = x & 0xffff;
}

/*
 * === end hardware abstraction layer ===
 */

int32_t frobozznicate(const Reindeer_t *preindeer, int32_t old_count)
{
    return old_count + preindeer->antler_point_count;
}

// point of use:
set_glob_count(frobozznicate(&rudolph, get_glob_count()));

Much better. Here the use of hardware registers is confined to the get_glob_count() and set_glob_count() variables, and the frobozznicate() function is a pure function that acts only on its inputs and returns an output.

Application singletons

So far, all this has been about singletons in software implementation. These types of singletons will almost certainly be unnoticed by the people who use the software. A completely different aspect of singletons is when software users interact with a unique object. Here’s an example from my garden rakes file:

  • When you open two separate documents in Microsoft Word 2007, they show up as separate windows, which you can place anywhere you want. If I have two monitors, I can view one document in Monitor A and another in Monitor B.
  • When you open two separate documents in Microsoft Excel 2007 or PowerPoint 2007, they show up as child windows of a singleton application window, because some dumbass at Microsoft decided that MDI is still preferable. (Hey Microsoft! 1990 called, it wants its MDI back!) If you want to compare them on separate monitors, you’re out of luck.

Let’s ask the question: should only one instance of an application be allowed?

I don’t know that there is one answer. I do know that only allowing one window is a poor design decision. I don’t really care whether there is one copy of the software running, or whether there are several copies. But I don’t want to be stuck with one window. (In addition to one process managing one or more windows, one window can also be managed by multiple processes. The Google Chrome browser is a good example of this: in order to isolate crashes and unresponsive threads, the designers of Chrome decided to give each browser tab its own process, and through some magical feat, multiple processes together can share one window.)

At the opposite extreme, sometimes you want singletons. Here’s an example of this. I run MATLAB at work, and it takes maybe 20-30 seconds for it to start up. MATLAB scripts are called M-files. Every once in a while, when I want to edit an M-file, I make the mistake of double-clicking on that M-file in my filesystem explorer, and it opens up another instance of MATLAB, which takes a long time, and it gets all confusing having two sets of MATLAB windows on my computer. I really just wanted to open up the M-file in the editor in the instance of MATLAB I already had running. But I can’t do that. So in this case, a singleton really would be easier for me to use. At the very minimum this should be a user preference.

The point here is that you should think about whether the users of your software want to work with one central thing, as they see it, or several separate things.

Singletons from Ten Thousand Meters

Now we’ll return to the question of the internals of software design. Let’s take another step back and look at this from another angle. Here’s a really simple diagram:

AllocationUsage
Dataglobal vs. local variablesdirect vs. indirect access
Code????direct vs. indirect accessconcrete classes vs. interfaces

The singleton concept shows up in four different ways. Whether you realize it or not, we’ve talked about three of them.

Take a look at this code snippet from Mr. Mungecraft:

Reindeer_t rudolph;

void frobozznicate()
{
    // read the glob count registers
    int32_t x = GLOBCTH;
    x <<= 16;
    x |= (uint32_t)GLOBCTL;

    x += rudolph.antler_point_count;

    // write them back
    GLOBCTH = x >> 16;
    GLOBCTL = x & 0xffff;
}

The global variable rudolph is an example of static allocation. And I said that global variables themselves weren’t necessarily bad. What’s bad with the above snippet is that the frobozznicate() function reaches out and directly accesses rudolph. If you look at its function signature, frobozznicate() has no inputs and no outputs. Blechh! The interfaces in your software should be directly visible in the function signatures, rather than some back-door hidden things that make you look at each line of code to figure out where the real inputs and outputs are.

Global variables allocate singletons. Or rather: global variables allocate storage that has the potential to be singletons. The worse mistake here is writing code that accesses global variables, essentially making them into singletons — like the good Dr. Jekyll turning into Mr. Hyde — by a direct reference to the variables in question. And we can fix this by breaking that direct access:

int32_t frobozznicate(const Reindeer_t *preindeer, int32_t old_count)
{
    return old_count + preindeer->antler_point_count;
}

Tada! No more direct access to global variables. Now we can use this function with all of the other reindeer. Goodbye, Mr. Hyde!

Just as data has the potential to be used as singletons, we can use code as singletons also. In higher level languages, we have several ways to access code. Let’s say we have a few Java classes that look like this:

/**
 * horn-bearing mammals of the order Artiodactyla
 * and the infraorder Pecora
 */
interface Pecoran
{
    public String getFamily();
    public String getGenus();
    public String getSpecies();
    public String getCommonName();
    public String getAnimalName();
    public String getHornName();
    public int getPointCount();
}

abstract class AbstractPecoran implements Pecoran
{
    final private String name;
    final private int pointCount;
    public AbstractPecoran(String name, int pointCount)
    {
        this.name = name;
        this.pointCount = pointCount;
    }
    @Override public String getAnimalName() { return this.name; }
    @Override public int getPointCount() { return this.pointCount; }        
}

abstract class Cervid extends AbstractPecoran
{
    public Cervid(String name, int pointCount)
    {
        super(name, pointCount);
    }
    @Override public String getFamily() { return "Cervidae"; }
    @Override public String getHornName() { return "antler"; }
}

class Reindeer extends Cervid
{
    public Reindeer(String name, int pointCount)
    {
        super(name, pointCount);
    }
    @Override public String getGenus()  { return "Rangifer"; }
    @Override public String getSpecies() { return "tarandus"; }
    @Override public String getCommonName() { return "reindeer"; }
}

class ReindeerGame
{
    public void play(Reindeer reindeer)
    {
        System.out.printf("Reindeer %s has antlers with %d points. Ha ha!\n",
            reindeer.getAnimalName(),
            reindeer.getPointCount());
    }
}

The ReindeerGame class has tight coupling to two other classes. Can you spot them?

One of them is the Reindeer class. What if we wanted to play games with giraffes or oryxes or sheep or moose or antelope, and not just reindeer? Well, we couldn’t if we use the ReindeerGame class. But we could change things to be more general:

class PecoranGame
{
    private String capitalize(String s)
    {
        if (s.isEmpty())
            return s;
        return s.substring(0, 1).toUpperCase() + s.substring(1);
    }
    public void play(Pecoran pecoran)
    {
        System.out.printf("%s (%s %s) %s has %ss with %d points. Ha ha!\n",
            capitalize(pecoran.getCommonName()),
            pecoran.getGenus(),
            pecoran.getSpecies,
            pecoran.getAnimalName(),
            pecoran.getHornName(),
            pecoran.getPointCount());
    }
}

The other class is java.lang.System! (And actually, we have a third class, java.lang.String, but this is such a fundamental class to Java that you can forget about the whole tight coupling thing. Just assume everyone knows about String.) It would be better to pass in an outside object:

class PecoranGame
{
    private String capitalize(String s)
    {
        if (s.isEmpty())
            return s;
        return s.substring(0, 1).toUpperCase() + s.substring(1);
    }
    public void play(Pecoran pecoran, PrintStream ps)
    {
        ps.printf("%s (%s %s) %s has %ss with %d points. Ha ha!\n",
            capitalize(pecoran.getCommonName()),
            pecoran.getGenus(),
            pecoran.getSpecies,
            pecoran.getAnimalName(),
            pecoran.getHornName(),
            pecoran.getPointCount());
    }
}

Ha ha, indeed! Now we can pass in mock objects to PecoranGame.play() and test them to our hart’s heart’s content. The standard stream System.out is a widely-used singleton, and it’s one you should be careful of using in your code – much better to use a PrintStream so that if you want to change the behavior, you can make it a choice of the calling application rather than something hard-coded in your class. Again, this is an example of changing direct access to a singleton to indirect access.

The other aspect of singletons in code is referring to interfaces (or at least general abstract classes) rather than concrete classes. This decouples your code from a particular implementation. And here we did this by changing the Reindeer class reference to the more general Pecoran interface.

The fourth square: allocation of code?

So far this is all just a retelling of concepts that were in the first part of our article.

But we have that pesky square in our diagram containing “???” — which represents the allocation of code; what is it?

Is code (as opposed to data) really allocated?

In high-level languages with first-class functions, there’s not much of a difference between code and data. I can do something like this in a Python module:

# module "ornery.py"

foo = 3
def bar(x):
   return x+7

def makeBaz(k):
   def baz(x):
      return x+k
   return baz

There’s much less of a distinction between data and code here. The module ornery contains three attributes:

  • foo bound to the integer 3
  • bar bound to a function
  • makeBaz bound to a function.

In addition, makeBaz is a function that returns a function. There’s nothing magical about a function as a return value. It’s just something that we can’t really do in languages like C, or in earlier versions of C++ and Java which don’t support first-class functions; languages like Javascript and Python and Haskell and all the Lisp derivatives do.

Anyway, back to the question of allocation. In C when we “create” a function, what we’re really doing is defining it within a namespace at compile time; the compiler churns away and boils it down to machine code stored in an object file and associated with a symbol that has the name we gave it. C only has two choices of namespace: there is the global namespace, which is the default case, and there is file-static scope using the static keyword. This applies both to variable and function definitions. In C, if you want your function to be visible to other compilation units, it has to be a definition in the global namespace. Unlike the makeBaz() function above, in C you can’t create a function at run time, or in any other way than defining it in source code, unless you use some crazy nonportable byte-hackery. (Java and C++ both have namespaces, and both are moving towards support of dynamically-defined functions using lambda syntax in C++11 and Java 8.)

Think about this: when you define a function visible to other compilation units in C, you are creating a singleton!

int32_t frobozznicate(const Reindeer_t *preindeer, int32_t old_count)
{
    return old_count + preindeer->antler_point_count;
}

Here we are trying to do everything right by passing in pointers and using inputs and outputs in a predictable way, but we’re still creating a unique function called frobozznicate in the global C namespace, and clients of this function typically go and access it directly via its name:

Reindeer_t rudolph = ...;
int x = frobozznicate(&rudolph, 3);

Yeah, you can go and work with function pointers

typedef int32_t (*reindeer_func)(const Reindeer_t *, int32_t);
reindeer_func f1 = &frobozznicate;
int x2 = f1(&rudolph, 3);

and this way you could support alternative functions acting on Reindeer_t objects, but unless you want to go to that extent of modularity, it’s not really worth the trouble. We don’t make a big deal of functions and classes which are defined globally, whether they’re truly in the global namespace like in C, or whether they make their home in some sub-namespace like the package system in Java, or the C++ namespace system. If we need to abstract out the usage of a function or class, using function pointers in C or virtual base classes in C++ or interfaces in Java, that’s another story. That lets us generalize and accept an abstract reference to something that fits our needs, rather than requiring the use of a specific function or class. But there’s no real reason to eschew the definition of a function or class in the global namespace, aside from name collision issues.

Anyway, that’s something to think about. There really isn’t that much difference between code and data. We just have been trained to think of it that way when we learn languages like C.

Wrapup

Singletons are unique objects in software design. Sometimes they are intentionally and enforceably unique – the Singleton Pattern in the GoF Design Patterns book is an example of this. Other times they’re unique because we use language features that support uniqueness, whether we intend it or not. In general, they are a feature worth discouraging, because they break modularity and increase tight coupling between classes.

We can have singletons at a user-visible level (the one Excel window that can be opened at a time) or at the software implementation level.

Both data and code can be a singleton. We usually think of singletons as data, but functions and classes that are given names, bound to a unique place in the global namespace, and referred to by name, are also singletons. This is one reason that in Java it’s recommended to refer to more general interfaces by name, and pass in specific instances of those general references at runtime, so that clients of our classes don’t really know or care what our classes are named or where they are located, only that they meet the interface requirements we need.

In addition, it takes two actions to make a singleton: allocating it or defining it globally is only part of the way towards a singleton, and is not inherently a bad thing. The main action that makes something a singleton is a hard-coded direct reference to it. If we write code that accepts indirect references (pointers in C/C++, references in C++, or objects in Java), then our code doesn’t care where the data is actually located.

Very low-level code that uses hardware registers and interrupt service routines can’t get around using direct references to singleton data, but good software design encapsulates this in a kind of containment zone, so knowledge of these singletons is limited to a small part of the software application.

Next time we’ll be discussing state machines.


© 2014 Jason M. Sachs, all rights reserved.


Previous post by Jason Sachs:
   Second-Order Systems, Part I: Boing!!
Next post by Jason Sachs:
   Book Review: "Turing's Cathedral"


Comments:

[ - ]
Comment by Andrew56January 6, 2015
Two things:
1) Do you find that passing references to top level state makes function signatures overly long? I've been in the case where some important data was included in almost every single function (details about a simulation such as grid size). Moving to a global variable cleaned up the code quite significantly.

2) I use static variables in functions (C++) in a couple ways and would hate to see them go. In particular I use them in a lot of numerical code as: my parallelism is done with MPI (so single thread for each process), many functions will not be called twice on the same stack (BuildStiffnessMatrix(..) will not eventually call itself), and finally, there are large temporary objects on the order of MB that shouldn't be repeatedly allocated and de-allocated (such as calculation intermediaries or MPI buffers). I do recognise such usage is non-typical.
[ - ]
Comment by jms_nhJanuary 7, 2015
1) Structures allow you to group related data together. At work we have a project with dozens of state variables but they are organized into two structures (one of which has a pointer to the other). If you have important data included in almost every function, why not define a structure called SimulationDetails, and pass a pointer to it into your functions?

2) Pick your poison. If it were me, I'd rather move these to static global variables, and pass in a pointer to them. The point is to decouple the allocation of the variables from the access to them. In functions with static variables, you have tight coupling, and that puts restrictions on their use (even though as you say, in your case you won't be calling them twice). If you move the variables outside the functions but still make them statically-allocated, then you avoid the repeated allocation/deallocation but allow loose coupling.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Registering will allow you to participate to the forums on ALL the related sites and give you access to all pdf downloads.

Sign up
or Sign in