EmbeddedRelated.com
Forums
Memfault Beyond the Launch

embedded tcp input only optimizations

Started by akennis May 3, 2006
I'm looking to streamline a tcp stack for use on the Renesas H8 series
microprocessors.  I would like to solicate your opinions on a few stack
optimizations I have in mind, what others might be possible, and the
performance which might be achieved.

For my application only tcp input is required.  This should eliminate
the need to maintain timers and thus allow for a purely interrupt
driven implimentation (i.e. no real-time OS).

If possible I would like to design the interrupt handler such that it
pulls packets out of the NIC (RTL8019AS), and simultaneously:

1. Checksums the payload for a final ACK/NAK decision
2. parses the incoming data translating control codes into function
pointers and placing these (or the non-control code data) into the
main-loop's receive buffer

By doing these two tasks simultaneously, the main-loop can be
constructed by simply casting the receive buffer's head into a function
pointer and then calling that function.

Does such an implimentation sound possible, and if so, how effective
might it be in increasing device performance.

Are any other optimizations possible?

Thanks very much,

Albert Kennis

>For my application only tcp input is required. This should eliminate >the need to maintain timers
Hmmm, depends who is initiating the connection, but either way with todays syn flood prevention algorithms in place (syn cookies) I think you will not get away without retransmitting either the initial SYN or the SYN/ACK if the need arises.
>and thus allow for a purely interrupt >driven implimentation (i.e. no real-time OS).
Um, I don't see why you should not be able to also generate an interupt based on the timer and deal there with segment retransmission. Since this happens only ocasionally anyways, I don't think that the penatly would be that noticeable.
>If possible I would like to design the interrupt handler such that it >pulls packets out of the NIC (RTL8019AS), and simultaneously: > >1. Checksums the payload for a final ACK/NAK decision >2. parses the incoming data translating control codes into function >pointers and placing these (or the non-control code data) into the >main-loop's receive buffer
Sounds good to me. Consider writing this function in assembly cause the main performance gain will come from the fact that you load a word of data from the controller to the CPU exactly only once, and at the same time summarize it in a CPU register, then store it in your buffer you intend to pass to the main loop. If you code that in C (on the H8) I bet that data will be stored on the stak which obviousely will need much more cycles. Then don't forget that also your answering segments must have a propper checksum. You could prepare a default segment for it and only modify the otherwise precalculated checksum with the changes you do.
>By doing these two tasks simultaneously, the main-loop can be >constructed by simply casting the receive buffer's head into a function >pointer and then calling that function. > >Does such an implimentation sound possible, and if so, how effective >might it be in increasing device performance.
Sure.
>Are any other optimizations possible?
IMHO it depends on what you intend to optimize. If it's the pure TCP throughput then useing a bigger reciveing window could help at the cost of a better segment buffering. A very important fact is wether there are simultanous connections in paralell possible or if your stack should only deal with one connection at a time. In the latter case there will be obviousely less overhead involved with buffering. If you are after optimizing CPU resource useage you will want to have fewer buffers and rely on the remote end to do more retransmits if needed etc. You could also consider if you need synflood protection (asumign you accept incoming connections) or not and also wether you want to accept early segments or not. HTH Markus

Memfault Beyond the Launch