efficiency change by dynamic allocation in microprocessor

Question>
Hi,
I should make scientific program which use linea alebra with so man
matrixs.

I would like to know whether it is good to make matrix class using dynami
memory allocation (new/delete or malloc/free) or using two-D arrays wit
fixed (big enough)size?

of course, my focus is efficiency in terms of speed.
How much I would experience speed down if I use matrix class using dynami
allocation?

If the quantity of speed down is big enough, where can I get some usefu
information about matrix computation in embedded system?

c.f.> I already made matrix class using new/delete and am using it for m
project. however I think it does not have enough performance.
It seems to be slower than I thought.
and I am using samsung S3c2410 (ARM920T core with 16K chache and mmu).

		
This message was sent using the comp.arch.embedded web interface o
www.EmbeddedRelated.com

Reply by db ●August 30, 20052005-08-30

dungglee wrote:
>>Hi,

>>I should make scientific program which use linea alebra with so many
>>matrixs.
>>I would like to know whether it is good to make matrix class using dynamic
>>memory allocation (new/delete or malloc/free) or using two-D arrays with
>>fixed (big enough)size?

If you know the size of your arrays, use the stack. Dynamic memory uses
the heap and it will have a performance impact both in memory
allocation time and data access time. Only use dynamic memory
allocation if you have a good reason for using it.
Regards
db

Reply by Paul Keinanen ●August 30, 20052005-08-30

On Mon, 29 Aug 2005 23:32:56 -0500, "dungglee" <dungglee@hotmail.com>
wrote:

>I would like to know whether it is good to make matrix class using dynamic
>memory allocation (new/delete or malloc/free) or using two-D arrays with
>fixed (big enough)size?
>
>of course, my focus is efficiency in terms of speed.
>How much I would experience speed down if I use matrix class using dynamic
>allocation?

This of course depends how often the new/delete is called. If
temporary objects are created for each matrix operation, the dynamic
memory handling could easily consume most of the time. The advantage
with allocating arrays of the correct size is that all elements are
close together and with cache and/or virtual memory, quite high hit
rates can be achieved with few cache conflicts.

While a large fixed 2-D array does not have run time overhead.
However, if only part of each line and column is used, the actual data
is distributed all around the big array into small segments. This can
be bad for the cache performance.  At least make sure that each
segment starts at the beginning of a cache lane.

Especially with primitive cache algorithms, such as the single
associative 1 to 1 mapping, a physical address can only be loaded into
a specific cache address. If the fixed 2-D array allocation is done
carelessly, two adjacent lines or columns, which are in wildly
separate memory segments could map to the same cache address, causing
frequent reloads and dramatically dropping the performance. Using
array dimensions that are _not_ a power of 2 usually avoids this
problem. Anyway, it is important to know how the cache system works on
a specific processor and design the table allocation accordingly.

One way around this would be to allocate just a big 1-D vector and at
each matrix reference calculate the actual 1-D element index. In this
way all the used elements would be close together. For each matrix,
define a macro

#define A(i,j)    a[(i)*a_dim1+(j)]

To compare the performance with fixed 2-D allocation or dynamically
allocated 2-D, simply redefine

#define A(i,j)    a[i][j]

or even

#define A(i,j)    this->a[i][j]

By defining macros for each matrix,you could always write the program
nearly in good old FORTRAN style :-) e.g. like

      C(1,4) = A(2,4) + B(3,1) ;

After performance tests the best implementation could be selected,
without making the program hard to read.

Paul

Reply by dungglee ●August 31, 20052005-08-31

Thank you for your kind advice.

I have been programming matrix library with fixed size array.

In example, I have made a structure like this:

typedef struct {
int row;
int col;
double M[1000];
} Matrix; 
 
And I have used matrix as follows:

Matrix m_A, m_B, m_C;
..
void function_name(Matrix &A,Matrix &B)
{
..
}
void main(void)
{
..
function_name(&A,&B);
..
}

Of course, I am afraid of increasing of cache miss rate caused by a lot o
matrix structure used (with big -1000- size of array).

Do you mean should I use just one array such as A[100000].
and then I should utilize the adequate portions of array for each matri
using index technique?

or my mehtod is right?

regards.
J.J.

>This of course depends how often the new/delete is called. If
>temporary objects are created for each matrix operation, the dynamic
>memory handling could easily consume most of the time. The advantage
>with allocating arrays of the correct size is that all elements are
>close together and with cache and/or virtual memory, quite high hit
>rates can be achieved with few cache conflicts.
>
>While a large fixed 2-D array does not have run time overhead.
>However, if only part of each line and column is used, the actual data
>is distributed all around the big array into small segments. This can
>be bad for the cache performance.  At least make sure that each
>segment starts at the beginning of a cache lane.
>
>Especially with primitive cache algorithms, such as the single
>associative 1 to 1 mapping, a physical address can only be loaded into
>a specific cache address. If the fixed 2-D array allocation is done
>carelessly, two adjacent lines or columns, which are in wildly
>separate memory segments could map to the same cache address, causing
>frequent reloads and dramatically dropping the performance. Using
>array dimensions that are _not_ a power of 2 usually avoids this
>problem. Anyway, it is important to know how the cache system works on
>a specific processor and design the table allocation accordingly.
>
>One way around this would be to allocate just a big 1-D vector and at
>each matrix reference calculate the actual 1-D element index. In this
>way all the used elements would be close together. For each matrix,
>define a macro
>
>#define A(i,j)    a[(i)*a_dim1+(j)]
>
>To compare the performance with fixed 2-D allocation or dynamically
>allocated 2-D, simply redefine
>
>#define A(i,j)    a[i][j]
>
>or even
>
>#define A(i,j)    this->a[i][j]
>
>
>By defining macros for each matrix,you could always write the program
>nearly in good old FORTRAN style :-) e.g. like
>
>      C(1,4) = A(2,4) + B(3,1) ;
>
>After performance tests the best implementation could be selected,
>without making the program hard to read.
>
>Paul
>
>


		
This message was sent using the comp.arch.embedded web interface o
www.EmbeddedRelated.com

Reply by Thad Smith ●September 1, 20052005-09-01

dungglee wrote:

> In example, I have made a structure like this:
> 
> typedef struct {
> int row;
> int col;
> double M[1000];
> } Matrix;
> 
> And I have used matrix as follows:
> 
> Matrix m_A, m_B, m_C;
> ..
> void function_name(Matrix &A,Matrix &B)
> {
> ..
> }
> void main(void)
> {
> ..
> function_name(&A,&B);
> ..
> }
> 
> Of course, I am afraid of increasing of cache miss rate caused by a lot of
> matrix structure used (with big -1000- size of array).

Since you are apparently already doing your own array subscripting
based on the array size, you have eliminated the potential cache miss
problem.
> 
> Do you mean should I use just one array such as A[100000].

No, that isn't necessary.  You might get some small cache advantage
from packing all arrays within a single area, but not as big as
running the rows together, as you have apparently done.

Since you posted in comp.arch.embedded, it is worth noting that many
embedded programmers avoid dynamic memory allocation whenever
possible, due to the potential for memory fragmentation causing
problems when run for days / weeks / months without restart.

Thad

efficiency change by dynamic allocation in microprocessor

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group