EmbeddedRelated.com
Forums

PIC based Data Logger - Need slick search algorithm - any ideas ?

Started by Anbeyon November 13, 2007
Hi

I'm hoping someone out there might have a good idea how I might solve
my dilema.

I have a PIC18F4620 and am using it as a data logger.  Storing logged
data in an external SPI bus connected Serial Flash memory.   The
serial Flash is 16Mbits or 2Mbyte.

The Serial Flash is to be used as a circular buffer and end users will
request the logged data by Record number or by timestamp.   (Timestamp
is DWORD [4byte] seconds since 1990)

i.e. return all records from timestamp A to Timestamp B or all records
from record A to Record B.

Any way I need a really slick way of searching the logged data records
by record number or timestamp and then return the data as quickly as
possible.

If any one has any ideas I'd really appreciate them.

many thanks

Anbeyon

On Tue, 13 Nov 2007 10:08:44 -0800, I said, "Pick a card, any card"
and Anbeyon <anbeyon@btinternet.com> instead replied:

>I'm hoping someone out there might have a good idea how I might solve >my dilema. > >I have a PIC18F4620 and am using it as a data logger. Storing logged >data in an external SPI bus connected Serial Flash memory. The >serial Flash is 16Mbits or 2Mbyte. > >The Serial Flash is to be used as a circular buffer and end users will >request the logged data by Record number or by timestamp. (Timestamp >is DWORD [4byte] seconds since 1990) > >i.e. return all records from timestamp A to Timestamp B or all records >from record A to Record B. > >Any way I need a really slick way of searching the logged data records >by record number or timestamp and then return the data as quickly as >possible.
You need much more RAM than the PIC has to do it quickly. You could use a reserved portion of the Flash but to sort even 200,000 random records will involve a very high number of rewrites in that Flash causing its early demise. Try some static RAM. -- Ray
In article <1194977324.044882.289020@o80g2000hse.googlegroups.com>,
Anbeyon  <anbeyon@btinternet.com> wrote:
>The Serial Flash is to be used as a circular buffer and end users will >request the logged data by Record number or by timestamp. (Timestamp >is DWORD [4byte] seconds since 1990)
>i.e. return all records from timestamp A to Timestamp B or all records >from record A to Record B.
>Any way I need a really slick way of searching the logged data records >by record number or timestamp and then return the data as quickly as >possible.
Are the records fixed length? Lets say they're variable length. The format could be this: start <record number in hex><timestamp in hex> some string \n <record number in hex><timestamp in hex> some string \n ... end Start and end are pointers stored in flash. You have to be careful about the order of writing to the flash so that things don't get corrupted during power outage: write start first write new string write end last The data will be valid if the power goes out between any of the steps above. The data is all ASCII including the timestamp and record number so that you can easily find the record boundaries by looking at the newline character \n. The records just wrap when the end of the flash is reached. When end hits start, start is advanced to the beginning of the next record so that you always have complete records. You can always dump the entire log by reading from start to end. So lets say you are looking for record with timestamp 'key'. The records are in order, so use a binary search for O(log2(N)) lookup speed: low = 0 high = end - start old = -1 loop: mid = (high-low)/2 goto_start_of_record(mid) if mid == old goto done old = mid if get_timestamp(mid) >= key // Will find first record of set with same key high = mid else low = mid done: while mid != end - start && get_timestamp(mid) == key print_record(mid) goto_start_of_next_record(mid) Goto_start_of_record searches backwards through the log to find the first character of the record. It stops at NUL or start. Goto_start_of_next_record searches forwards through the log to find the first character of the next record or end. It could be combined with print_record. -- /* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
Oops, I missed that you are looking for a range.  I've edited it in below:

In article <fhd0nr$ro8$1@pcls6.std.com>,
Joseph H Allen <jhallen@TheWorld.com> wrote:
>In article <1194977324.044882.289020@o80g2000hse.googlegroups.com>, >Anbeyon <anbeyon@btinternet.com> wrote: >>The Serial Flash is to be used as a circular buffer and end users will >>request the logged data by Record number or by timestamp. (Timestamp >>is DWORD [4byte] seconds since 1990) > >>i.e. return all records from timestamp A to Timestamp B or all records >>from record A to Record B. > >>Any way I need a really slick way of searching the logged data records >>by record number or timestamp and then return the data as quickly as >>possible. > >Are the records fixed length? Lets say they're variable length. The format >could be this: > > start <record number in hex><timestamp in hex> some string \n > <record number in hex><timestamp in hex> some string \n > ... > end > >Start and end are pointers stored in flash. You have to be careful about >the order of writing to the flash so that things don't get corrupted during >power outage: > > write start first > > write new string > > write end last > >The data will be valid if the power goes out between any of the steps above. > >The data is all ASCII including the timestamp and record number so that you >can easily find the record boundaries by looking at the newline character >\n. > >The records just wrap when the end of the flash is reached. When end hits >start, start is advanced to the beginning of the next record so that you >always have complete records. > >You can always dump the entire log by reading from start to end. > >So lets say you are looking for record with timestamp 'A'. The records >are in order, so use a binary search for O(log2(N)) lookup speed: > >low = 0 >high = end - start >old = -1 > >loop: > mid = (high-low)/2 > goto_start_of_record(mid) > if mid == old > goto done > old = mid > if get_timestamp(mid) >= A // Will find first record of set with same key > high = mid > else > low = mid > >done: // Print records from A to B > while mid != end - start && get_timestamp(mid) < B > print_record(mid) > goto_start_of_next_record(mid) > >Goto_start_of_record searches backwards through the log to find the first >character of the record. It stops at \n or start. > >Goto_start_of_next_record searches forwards through the log to find the >first character of the next record or end. It could be combined with >print_record.
-- /* jhallen@world.std.com AB1GO */ /* Joseph H. Allen */ int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0) +r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2 ]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}
> I have a PIC18F4620 and am using it as a data logger. Storing logged > data in an external SPI bus connected Serial Flash memory. The > serial Flash is 16Mbits or 2Mbyte. > > The Serial Flash is to be used as a circular buffer and end users will > request the logged data by Record number or by timestamp. (Timestamp > is DWORD [4byte] seconds since 1990) > > i.e. return all records from timestamp A to Timestamp B or all records > from record A to Record B. > > Any way I need a really slick way of searching the logged data records > by record number or timestamp and then return the data as quickly as > possible.
Firstly, how would you measure "as quickly as possible"? We need a number to be really helpful. Maybe I'm missing something but exploiting some known properties of record number and timestamp will give you fast indexing *if* you use fixed length records. You obviously are maintaining your head and tail pointers in RAM and even possibly in external flash memory (you'll need wear leveling if you maintain your pointers in flash) so the pointers provide a frame from which you can calculate where the records can be found. Knowing record size is fixed, record numbers are sequential, and record numbers increment by one, you can easily calculate where you expect any record to be located. That's the simple case. The more difficult case is searching by timestamp. I'm guessing records are not necessarily evenly spaced in time. Knowing timestamps are sequential, you can do a binary search to find the timestamp (or timestamp range) you're searching for. Depending on record size, that's definitely no more than 22 search branches before you find your target (16M, 24 bits, minus timestamp size, 2 bits = 22 branches). That's freakin' quick! :) If the binary search method meets the "quickly" requirement, a binary search by record number would suffice as well. The only way that I can think of to improve the read speed is to cache blocks of serial flash into RAM so you don't need to get it again once you found it. Classic access time vs RAM tradeoff. JJS
"Anbeyon" <anbeyon@btinternet.com> wrote in message
news:1194977324.044882.289020@o80g2000hse.googlegroups.com...
> Hi > > I'm hoping someone out there might have a good idea how I might solve > my dilema. > > I have a PIC18F4620 and am using it as a data logger. Storing logged > data in an external SPI bus connected Serial Flash memory. The > serial Flash is 16Mbits or 2Mbyte. > > The Serial Flash is to be used as a circular buffer and end users will > request the logged data by Record number or by timestamp. (Timestamp > is DWORD [4byte] seconds since 1990) > > i.e. return all records from timestamp A to Timestamp B or all records > from record A to Record B.
First of all if you know 1/ Maximum memory size 2/ Records are fixed length You don't need to store record number *IF* your timestamp is always valid. If using circular storage method you can find highest and lowest record number by binary search on Timestamp (effectively record number in manner of speaking). Keeping records as fixed length and INTEGER sub-multiple of Flash page size WILL help you as no doubt you will have to erase pages before writing new records. Otherwise you have to handle partial records across Flash page boundaries where half of the record is *erased*. Use the erased state to determine full/erased pages (which you will need to keep track of). WHEN unit gets powered up do sanity check at power up. Hopefully a Flash page erase time is LESS than the amount of buffering space you have for records waiting to be written.
> Any way I need a really slick way of searching the logged data records > by record number or timestamp and then return the data as quickly as > possible. > > If any one has any ideas I'd really appreciate them.
Binary search in a circular buffer for lowest and highest Tiemstamp gives first and last record. As buffer is circular (with possible gap), with fixed size records it becomes easy to list record A to record B. With Timestamp it becomes a case of read highest and lowest (you will probably be keeping a record of this anyway for updates and parameter checks) and work on you binary search from there to find start and end timestamp record numbers. If you are logging at regular intervals Timestamp A to Timestamp B becomes even easier when you know the first and last records' time stamps. That way you are maintaining very little information in RAM! You only need to read the timestamp of the record when doing searches. -- Paul Carpenter | paul@pcserviceselectronics.co.uk <http://www.pcserviceselectronics.co.uk/> PC Services <http://www.pcserviceselectronics.co.uk/fonts/> Timing Diagram Font <http://www.gnuh8.org.uk/> GNU H8 <http://www.badweb.org.uk/> For those web sites you hate
Thanks every one for you comments and thoughts.

Anbeyon