Benchmark: STL's list vs. hand-coded one

Hello everyone, in the compiler thread someone (was it Dijkstra?) 
claimed that M$ STL implementation of the doubly linked list was 50x 
slower than a hand-coded implementation. Now, I get the metrics that 
they are equally fast.

OS: Ubuntu 7.10
CPU: Athlon XP 2600+
compiler: gcc 4.1.3
opts: -O3 -march=athlon-xp

Results:

Custom impl. took 4.000000 seconds of CPU time.
STL impl. took 4.000000 seconds of CPU time.

(I couldn't find a method of more accurate timing in glibc)

Here's the source:

#include <list>
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

#define N 4000000

struct Node
{
  unsigned int data;
  struct Node *prev, *next;
};

struct List
{
  struct Node *head, *tail;
};

/**
 * Allocates a new doubly linked list.
 */
static List *list_New()
{
  List *list = (struct List *)malloc(sizeof(struct List));
  if (list == NULL)
    perror("malloc() failed");
  list->head = list->tail = NULL;
  return list;
}

/**
 * Frees a node.
 */
static void node_Delete(struct Node *node)
{
  free(node);
}

/**
 * Frees resources allocated to a list
 */
static void list_Delete(struct List *list)
{
  assert(list != NULL);
  struct Node *node = list->head;
  while (node != NULL)
  {
    struct Node *current = node;
    node_Delete(current);
    node = node->next;
  }
  free(list);
}

static struct Node *list_GetHead(const struct List *list)
{
  assert(list != NULL);
  return list->head;
}

static struct Node *node_GetNext(const struct Node *node)
{
  assert(node != NULL);
  return node->next;
}

/**
 * Allocates a new node
 */
static struct Node *node_New()
{
  struct Node *node = (struct Node *)malloc(sizeof(struct Node));
  if (node == NULL)
    perror("malloc() failed.");
  node->prev = node->next = NULL;
  return node;
}

/**
 * "Accessor"
 */
static void node_SetData(struct Node *node, int data)
{
  assert(node != NULL);
  node->data = data;
}

/**
 * "Accessor".
 */
static int node_GetData(const struct Node *node)
{
  assert(node != NULL);
  return node->data;
}

/**
 * Adds a node to the end of the list.
 */
static void list_Add(struct List *list, struct Node *node)
{
  assert(list != NULL);
  assert(node != NULL);
  assert(node->prev == NULL && node->next == NULL);
  node->prev = list->tail;
  if (list->head == NULL)
    list->head = node;
  if (list->tail != NULL)
    list->tail->next = node;
  list->tail = node;
}

int main()
{
  time_t t1, t2;
  t1 = time(NULL);
  struct List *list = list_New();
  int sum1 = 0;
  for (unsigned int i = 0; i < N; i++)
  {
    struct Node *node = node_New(); 
    node_SetData(node, i);
    list_Add(list, node);
  }
  struct Node *node = list_GetHead(list);
  for (int i = 0; i < N; i++)
  {
    sum1 += node_GetData(node);    
    node = node_GetNext(node);
  }
  list_Delete(list);
  t2 = time(NULL);
  double dt = t2 - t1;
  printf("Custom impl. took %f seconds of CPU time.\n", dt);
  
  t1 = time(NULL);
  std::list<unsigned int> *stlList = new std::list<unsigned int>;
  int sum2 = 0;
  for (unsigned int i = 0; i < N; i++)
  {
    stlList->push_back(i);
  }
  std::list<unsigned int>::const_iterator i1 = stlList->begin();
  std::list<unsigned int>::const_iterator i2 = stlList->end();

  while (i1 != i2)
  {
    sum2 += *i1;
    i1++;
  }
  delete stlList;
  t2 = time(NULL);
  dt = t2 - t1;
  printf("STL impl. took %f seconds of CPU time.\n", dt);

  // do prevent optimizer to remove all code as dead
  return sum1 + sum2;
}

-- 
Jyrki Saarinen
http://koti.welho.com/jsaari88/

Reply by Patrick de Zeester ●November 20, 20072007-11-20

Jyrki Saarinen wrote:
> Hello everyone, in the compiler thread someone (was it Dijkstra?) 
> claimed that M$ STL implementation of the doubly linked list was 50x 
> slower than a hand-coded implementation. Now, I get the metrics that 
> they are equally fast.
> 
> OS: Ubuntu 7.10
> CPU: Athlon XP 2600+
> compiler: gcc 4.1.3
> opts: -O3 -march=athlon-xp
> 
> Results:
> 
> Custom impl. took 4.000000 seconds of CPU time.
> STL impl. took 4.000000 seconds of CPU time.
> 
> (I couldn't find a method of more accurate timing in glibc)
> <snip> 

On most platforms the clock() function in <time.h> gives more accurate 
timing information than the time() function.

Reply by Jyrki Saarinen ●November 20, 20072007-11-20

Patrick de Zeester <invalid@invalid.invalid> wrote:

>> (I couldn't find a method of more accurate timing in glibc)
>> <snip> 
> 
> On most platforms the clock() function in <time.h> gives more accurate 
> timing information than the time() function.

On Linux, it returns 0 for some reason, perhaps it is not implemented.

-- 
Jyrki Saarinen
http://koti.welho.com/jsaari88/

Reply by Patrick de Zeester ●November 20, 20072007-11-20

Jyrki Saarinen wrote:
> Patrick de Zeester <invalid@invalid.invalid> wrote:
> 
>>> (I couldn't find a method of more accurate timing in glibc)
>>> <snip> 
>> On most platforms the clock() function in <time.h> gives more accurate 
>> timing information than the time() function.
> 
> On Linux, it returns 0 for some reason, perhaps it is not implemented.

On the version of Linux you are running that is, on the version of Linux 
I use the clock() function works just fine.

Reply by Jyrki Saarinen ●November 20, 20072007-11-20

Patrick de Zeester <invalid@invalid.invalid> wrote:

>> On Linux, it returns 0 for some reason, perhaps it is not implemented.
> 
> On the version of Linux you are running that is, on the version of Linux 
> I use the clock() function works just fine.

What distribution, kernel and glibc versions are you using?

-- 
Jyrki Saarinen
http://koti.welho.com/jsaari88/

Reply by Patrick de Zeester ●November 20, 20072007-11-20

Jyrki Saarinen wrote:
> Patrick de Zeester <invalid@invalid.invalid> wrote:
> 
>>> On Linux, it returns 0 for some reason, perhaps it is not implemented.
>> On the version of Linux you are running that is, on the version of Linux 
>> I use the clock() function works just fine.
> 
> What distribution, kernel and glibc versions are you using?

I checked it on a very old Mandrake distro:
Linux version 2.6.8.1-12mdk (quintela@n5.mandrakesoft.com) (gcc version 
3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)) #1 Fri Oct 1 12:53:41 CEST 2004
glibc-2.3.3-23.1.101mdk

Reply by Jyrki Saarinen ●November 20, 20072007-11-20

Patrick de Zeester <invalid@invalid.invalid> wrote:

> I checked it on a very old Mandrake distro:
> Linux version 2.6.8.1-12mdk (quintela@n5.mandrakesoft.com) (gcc version 
> 3.4.1 (Mandrakelinux (Alpha 3.4.1-3mdk)) #1 Fri Oct 1 12:53:41 CEST 2004
> glibc-2.3.3-23.1.101mdk

Ok, gettimeofday() did the trick. Now the times are for 4000000 nodes: 

Custom impl. took 0.946796 seconds of CPU time.
STL impl. took 0.972560 seconds of CPU time.

-- 
Jyrki Saarinen
http://koti.welho.com/jsaari88/

Reply by Wilco Dijkstra ●November 20, 20072007-11-20

"Jyrki Saarinen" <jyrki@NOSPAMwelho.com.invalid> wrote in message news:fhvaou$4iq$1@nyytiset.pp.htv.fi...
> Hello everyone, in the compiler thread someone (was it Dijkstra?)
> claimed that M$ STL implementation of the doubly linked list was 50x
> slower than a hand-coded implementation. Now, I get the metrics that
> they are equally fast.

I didn't claim that, nor did anyone else for that matter. I was talking about
STL strings and in particular string concatenation.

Anyway, there are some obvious flaws here. For starters few would
use a linked list if they had to store such a large number of integers.
Applications that do use large linked lists typically do something very
different. Your implementation uses 24 bytes for just one node (16 bytes
if the heap is 4-byte aligned) compared to just over 4 for a specialized
implementation. But worst of all, do you have any idea where over 99%
of the time is spent? Let me give you a hint - it is NOT the linked list code...

Wilco

Reply by Jyrki Saarinen ●November 21, 20072007-11-21

Wilco Dijkstra <Wilco_dot_Dijkstra@ntlworld.com> wrote:

>> Hello everyone, in the compiler thread someone (was it Dijkstra?)
>> claimed that M$ STL implementation of the doubly linked list was 50x
>> slower than a hand-coded implementation. Now, I get the metrics that
>> they are equally fast.
> 
> I didn't claim that, nor did anyone else for that matter. I was talking about
> STL strings and in particular string concatenation.

If this is the case, of course this micro-benchmark is irrelevant. I have 
to read your post again (if I can find it still..), and perhaps do another 
benchmark. Easier would be if you could describe the problem with STL 
string concatenation again?

> Anyway, there are some obvious flaws here. For starters few would
> use a linked list if they had to store such a large number of integers.

Well - int was just chosen, it could be anything else. I'll do the same 
test with a struct that contains a larger amount of data.

> different. Your implementation uses 24 bytes for just one node (16 bytes
> if the heap is 4-byte aligned) compared to just over 4 for a specialized
> implementation. 

What kind of an implementation is this with 4 bytes for a node?

> But worst of all, do you have any idea where over 99%
> of the time is spent? Let me give you a hint - it is NOT the linked list code...

I didn't profile, but very likely it is the memory allocation.

-- 
Jyrki Saarinen
http://koti.welho.com/jsaari88/

Reply by Wilco Dijkstra ●November 21, 20072007-11-21

"Jyrki Saarinen" <jyrki@NOSPAMwelho.com.invalid> wrote in message news:fi0v23$qfg$1@nyytiset.pp.htv.fi...
> Wilco Dijkstra <Wilco_dot_Dijkstra@ntlworld.com> wrote:
>
>>> Hello everyone, in the compiler thread someone (was it Dijkstra?)
>>> claimed that M$ STL implementation of the doubly linked list was 50x
>>> slower than a hand-coded implementation. Now, I get the metrics that
>>> they are equally fast.
>>
>> I didn't claim that, nor did anyone else for that matter. I was talking about
>> STL strings and in particular string concatenation.
>
> If this is the case, of course this micro-benchmark is irrelevant. I have
> to read your post again (if I can find it still..), and perhaps do another
> benchmark. Easier would be if you could describe the problem with STL
> string concatenation again?

The problem is as follows: you have 4 small character strings (char *),
and create a temporary string by concatenating them with spaces between
each string.

Implementing this in the obvious way gives a factor of 50 times in VC++
between strcat/strcpy vs STL.

>> Anyway, there are some obvious flaws here. For starters few would
>> use a linked list if they had to store such a large number of integers.
>
> Well - int was just chosen, it could be anything else. I'll do the same
> test with a struct that contains a larger amount of data.

It wouldn't change the result much, you still spend most of the time
in malloc/free. What matters is not the size of the data but the usage
pattern - since the benchmark doesn't do anything but create/destroy a
long list, it only measures the (high) overhead of memory allocation.

>> different. Your implementation uses 24 bytes for just one node (16 bytes
>> if the heap is 4-byte aligned) compared to just over 4 for a specialized
>> implementation.
>
> What kind of an implementation is this with 4 bytes for a node?

A linked list of arrays.

>> But worst of all, do you have any idea where over 99%
>> of the time is spent? Let me give you a hint - it is NOT the linked list code...
>
> I didn't profile, but very likely it is the memory allocation.

Correct. This is why big applications use custom allocators that are
much faster than malloc/free - think 2 orders of magnitude.

Wilco

Previous12 3 4 Next

Benchmark: STL's list vs. hand-coded one

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group