EmbeddedRelated.com
Forums

efficiency change by dynamic allocation in microprocessor

Started by dungglee August 30, 2005
Question>
Hi,
I should make scientific program which use linea alebra with so man
matrixs.

I would like to know whether it is good to make matrix class using dynami
memory allocation (new/delete or malloc/free) or using two-D arrays wit
fixed (big enough)size?

of course, my focus is efficiency in terms of speed.
How much I would experience speed down if I use matrix class using dynami
allocation?

If the quantity of speed down is big enough, where can I get some usefu
information about matrix computation in embedded system?

c.f.> I already made matrix class using new/delete and am using it for m
project. however I think it does not have enough performance.
It seems to be slower than I thought.
and I am using samsung S3c2410 (ARM920T core with 16K chache and mmu).

		
This message was sent using the comp.arch.embedded web interface o
www.EmbeddedRelated.com
dungglee wrote:
>>Hi,
>>I should make scientific program which use linea alebra with so many >>matrixs. >>I would like to know whether it is good to make matrix class using dynamic >>memory allocation (new/delete or malloc/free) or using two-D arrays with >>fixed (big enough)size?
If you know the size of your arrays, use the stack. Dynamic memory uses the heap and it will have a performance impact both in memory allocation time and data access time. Only use dynamic memory allocation if you have a good reason for using it. Regards db
On Mon, 29 Aug 2005 23:32:56 -0500, "dungglee" <dungglee@hotmail.com>
wrote:


>I would like to know whether it is good to make matrix class using dynamic >memory allocation (new/delete or malloc/free) or using two-D arrays with >fixed (big enough)size? > >of course, my focus is efficiency in terms of speed. >How much I would experience speed down if I use matrix class using dynamic >allocation?
This of course depends how often the new/delete is called. If temporary objects are created for each matrix operation, the dynamic memory handling could easily consume most of the time. The advantage with allocating arrays of the correct size is that all elements are close together and with cache and/or virtual memory, quite high hit rates can be achieved with few cache conflicts. While a large fixed 2-D array does not have run time overhead. However, if only part of each line and column is used, the actual data is distributed all around the big array into small segments. This can be bad for the cache performance. At least make sure that each segment starts at the beginning of a cache lane. Especially with primitive cache algorithms, such as the single associative 1 to 1 mapping, a physical address can only be loaded into a specific cache address. If the fixed 2-D array allocation is done carelessly, two adjacent lines or columns, which are in wildly separate memory segments could map to the same cache address, causing frequent reloads and dramatically dropping the performance. Using array dimensions that are _not_ a power of 2 usually avoids this problem. Anyway, it is important to know how the cache system works on a specific processor and design the table allocation accordingly. One way around this would be to allocate just a big 1-D vector and at each matrix reference calculate the actual 1-D element index. In this way all the used elements would be close together. For each matrix, define a macro #define A(i,j) a[(i)*a_dim1+(j)] To compare the performance with fixed 2-D allocation or dynamically allocated 2-D, simply redefine #define A(i,j) a[i][j] or even #define A(i,j) this->a[i][j] By defining macros for each matrix,you could always write the program nearly in good old FORTRAN style :-) e.g. like C(1,4) = A(2,4) + B(3,1) ; After performance tests the best implementation could be selected, without making the program hard to read. Paul
Thank you for your kind advice.

I have been programming matrix library with fixed size array.

In example, I have made a structure like this:

typedef struct {
int row;
int col;
double M[1000];
} Matrix; 
 
And I have used matrix as follows:

Matrix m_A, m_B, m_C;
..
void function_name(Matrix &A,Matrix &B)
{
..
}
void main(void)
{
..
function_name(&A,&B);
..
}

Of course, I am afraid of increasing of cache miss rate caused by a lot o
matrix structure used (with big -1000- size of array).

Do you mean should I use just one array such as A[100000].
and then I should utilize the adequate portions of array for each matri
using index technique?

or my mehtod is right?

regards.
J.J.

>This of course depends how often the new/delete is called. If >temporary objects are created for each matrix operation, the dynamic >memory handling could easily consume most of the time. The advantage >with allocating arrays of the correct size is that all elements are >close together and with cache and/or virtual memory, quite high hit >rates can be achieved with few cache conflicts. > >While a large fixed 2-D array does not have run time overhead. >However, if only part of each line and column is used, the actual data >is distributed all around the big array into small segments. This can >be bad for the cache performance. At least make sure that each >segment starts at the beginning of a cache lane. > >Especially with primitive cache algorithms, such as the single >associative 1 to 1 mapping, a physical address can only be loaded into >a specific cache address. If the fixed 2-D array allocation is done >carelessly, two adjacent lines or columns, which are in wildly >separate memory segments could map to the same cache address, causing >frequent reloads and dramatically dropping the performance. Using >array dimensions that are _not_ a power of 2 usually avoids this >problem. Anyway, it is important to know how the cache system works on >a specific processor and design the table allocation accordingly. > >One way around this would be to allocate just a big 1-D vector and at >each matrix reference calculate the actual 1-D element index. In this >way all the used elements would be close together. For each matrix, >define a macro > >#define A(i,j) a[(i)*a_dim1+(j)] > >To compare the performance with fixed 2-D allocation or dynamically >allocated 2-D, simply redefine > >#define A(i,j) a[i][j] > >or even > >#define A(i,j) this->a[i][j] > > >By defining macros for each matrix,you could always write the program >nearly in good old FORTRAN style :-) e.g. like > > C(1,4) = A(2,4) + B(3,1) ; > >After performance tests the best implementation could be selected, >without making the program hard to read. > >Paul > >
This message was sent using the comp.arch.embedded web interface o www.EmbeddedRelated.com
dungglee wrote:

> In example, I have made a structure like this: > > typedef struct { > int row; > int col; > double M[1000]; > } Matrix; > > And I have used matrix as follows: > > Matrix m_A, m_B, m_C; > .. > void function_name(Matrix &A,Matrix &B) > { > .. > } > void main(void) > { > .. > function_name(&A,&B); > .. > } > > Of course, I am afraid of increasing of cache miss rate caused by a lot of > matrix structure used (with big -1000- size of array).
Since you are apparently already doing your own array subscripting based on the array size, you have eliminated the potential cache miss problem.
> > Do you mean should I use just one array such as A[100000].
No, that isn't necessary. You might get some small cache advantage from packing all arrays within a single area, but not as big as running the rows together, as you have apparently done. Since you posted in comp.arch.embedded, it is worth noting that many embedded programmers avoid dynamic memory allocation whenever possible, due to the potential for memory fragmentation causing problems when run for days / weeks / months without restart. Thad