PIC based Data Logger - Need slick search algorithm

Hi

I'm hoping someone out there might have a good idea how I might solve
my dilema.

I have a PIC18F4620 and am using it as a data logger.  Storing logged
data in an external SPI bus connected Serial Flash memory.   The
serial Flash is 16Mbits or 2Mbyte.

The Serial Flash is to be used as a circular buffer and end users will
request the logged data by Record number or by timestamp.   (Timestamp
is DWORD [4byte] seconds since 1990)

i.e. return all records from timestamp A to Timestamp B or all records
from record A to Record B.

Any way I need a really slick way of searching the logged data records
by record number or timestamp and then return the data as quickly as
possible.

If any one has any ideas I'd really appreciate them.

many thanks

Anbeyon

Reply by Ray Haddad ●November 13, 20072007-11-13

On Tue, 13 Nov 2007 10:08:44 -0800, I said, "Pick a card, any card"
and Anbeyon <anbeyon@btinternet.com> instead replied:

>I'm hoping someone out there might have a good idea how I might solve
>my dilema.
>
>I have a PIC18F4620 and am using it as a data logger.  Storing logged
>data in an external SPI bus connected Serial Flash memory.   The
>serial Flash is 16Mbits or 2Mbyte.
>
>The Serial Flash is to be used as a circular buffer and end users will
>request the logged data by Record number or by timestamp.   (Timestamp
>is DWORD [4byte] seconds since 1990)
>
>i.e. return all records from timestamp A to Timestamp B or all records
>from record A to Record B.
>
>Any way I need a really slick way of searching the logged data records
>by record number or timestamp and then return the data as quickly as
>possible.

You need much more RAM than the PIC has to do it quickly. You could
use a reserved portion of the Flash but to sort even 200,000 random
records will involve a very high number of rewrites in that Flash
causing its early demise. Try some static RAM.
--
Ray

Reply by Joseph H Allen ●November 13, 20072007-11-13

In article <1194977324.044882.289020@o80g2000hse.googlegroups.com>,
Anbeyon  <anbeyon@btinternet.com> wrote:
>The Serial Flash is to be used as a circular buffer and end users will
>request the logged data by Record number or by timestamp.   (Timestamp
>is DWORD [4byte] seconds since 1990)

>i.e. return all records from timestamp A to Timestamp B or all records
>from record A to Record B.

>Any way I need a really slick way of searching the logged data records
>by record number or timestamp and then return the data as quickly as
>possible.

Are the records fixed length?  Lets say they're variable length.  The format
could be this:

 start   <record number in hex><timestamp in hex> some string \n
         <record number in hex><timestamp in hex> some string \n
         ...
 end

Start and end are pointers stored in flash.  You have to be careful about
the order of writing to the flash so that things don't get corrupted during
power outage:

   write start first

   write new string

   write end last

The data will be valid if the power goes out between any of the steps above.

The data is all ASCII including the timestamp and record number so that you
can easily find the record boundaries by looking at the newline character
\n.

The records just wrap when the end of the flash is reached.  When end hits
start, start is advanced to the beginning of the next record so that you
always have complete records.

You can always dump the entire log by reading from start to end.

So lets say you are looking for record with timestamp 'key'.  The records
are in order, so use a binary search for O(log2(N)) lookup speed:

low = 0
high = end - start
old = -1

loop:
  mid = (high-low)/2
  goto_start_of_record(mid)
  if mid == old
    goto done
  old = mid
  if get_timestamp(mid) >= key // Will find first record of set with same key
    high = mid
  else
    low = mid

done:
  while mid != end - start && get_timestamp(mid) == key
    print_record(mid)
    goto_start_of_next_record(mid)

Goto_start_of_record searches backwards through the log to find the first
character of the record.  It stops at NUL or start.

Goto_start_of_next_record searches forwards through the log to find the
first character of the next record or end.  It could be combined with
print_record.

-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Reply by Joseph H Allen ●November 13, 20072007-11-13

Oops, I missed that you are looking for a range.  I've edited it in below:

In article <fhd0nr$ro8$1@pcls6.std.com>,
Joseph H Allen <jhallen@TheWorld.com> wrote:
>In article <1194977324.044882.289020@o80g2000hse.googlegroups.com>,
>Anbeyon  <anbeyon@btinternet.com> wrote:
>>The Serial Flash is to be used as a circular buffer and end users will
>>request the logged data by Record number or by timestamp.   (Timestamp
>>is DWORD [4byte] seconds since 1990)
>
>>i.e. return all records from timestamp A to Timestamp B or all records
>>from record A to Record B.
>
>>Any way I need a really slick way of searching the logged data records
>>by record number or timestamp and then return the data as quickly as
>>possible.
>
>Are the records fixed length?  Lets say they're variable length.  The format
>could be this:
>
> start   <record number in hex><timestamp in hex> some string \n
>         <record number in hex><timestamp in hex> some string \n
>         ...
> end
>
>Start and end are pointers stored in flash.  You have to be careful about
>the order of writing to the flash so that things don't get corrupted during
>power outage:
>
>   write start first
>
>   write new string
>
>   write end last
>
>The data will be valid if the power goes out between any of the steps above.
>
>The data is all ASCII including the timestamp and record number so that you
>can easily find the record boundaries by looking at the newline character
>\n.
>
>The records just wrap when the end of the flash is reached.  When end hits
>start, start is advanced to the beginning of the next record so that you
>always have complete records.
>
>You can always dump the entire log by reading from start to end.
>
>So lets say you are looking for record with timestamp 'A'.  The records
>are in order, so use a binary search for O(log2(N)) lookup speed:
>
>low = 0
>high = end - start
>old = -1
>
>loop:
>  mid = (high-low)/2
>  goto_start_of_record(mid)
>  if mid == old
>    goto done
>  old = mid
>  if get_timestamp(mid) >= A // Will find first record of set with same key
>    high = mid
>  else
>    low = mid
>
>done: // Print records from A to B
>  while mid != end - start && get_timestamp(mid) < B
>    print_record(mid)
>    goto_start_of_next_record(mid)
>
>Goto_start_of_record searches backwards through the log to find the first
>character of the record.  It stops at \n or start.
>
>Goto_start_of_next_record searches forwards through the log to find the
>first character of the next record or end.  It could be combined with
>print_record.
-- 
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p<1659?79:0:p>158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\n"," #"[!a[q-1]]);}

Reply by John Speth ●November 13, 20072007-11-13

> I have a PIC18F4620 and am using it as a data logger.  Storing logged
> data in an external SPI bus connected Serial Flash memory.   The
> serial Flash is 16Mbits or 2Mbyte.
>
> The Serial Flash is to be used as a circular buffer and end users will
> request the logged data by Record number or by timestamp.   (Timestamp
> is DWORD [4byte] seconds since 1990)
>
> i.e. return all records from timestamp A to Timestamp B or all records
> from record A to Record B.
>
> Any way I need a really slick way of searching the logged data records
> by record number or timestamp and then return the data as quickly as
> possible.

Firstly, how would you measure "as quickly as possible"?  We need a number 
to be really helpful.

Maybe I'm missing something but exploiting some known properties of record 
number and timestamp will give you fast indexing *if* you use fixed length 
records.  You obviously are maintaining your head and tail pointers in RAM 
and even possibly in external flash memory (you'll need wear leveling if you 
maintain your pointers in flash) so the pointers provide a frame from which 
you can calculate where the records can be found.

Knowing record size is fixed, record numbers are sequential, and record 
numbers increment by one, you can easily calculate where you expect any 
record to be located.  That's the simple case.

The more difficult case is searching by timestamp.  I'm guessing records are 
not necessarily evenly spaced in time.  Knowing timestamps are sequential, 
you can do a binary search to find the timestamp (or timestamp range) you're 
searching for.  Depending on record size, that's definitely no more than 22 
search branches before you find your target (16M, 24 bits, minus timestamp 
size, 2 bits = 22 branches).  That's freakin' quick! :)

If the binary search method meets the "quickly" requirement, a binary search 
by record number would suffice as well.

The only way that I can think of to improve the read speed is to cache 
blocks of serial flash into RAM so you don't need to get it again once you 
found it.  Classic access time vs RAM tradeoff.

JJS

Reply by Paul Carpenter ●November 13, 20072007-11-13

"Anbeyon" <anbeyon@btinternet.com> wrote in message
news:1194977324.044882.289020@o80g2000hse.googlegroups.com...
> Hi
>
> I'm hoping someone out there might have a good idea how I might solve
> my dilema.
>
> I have a PIC18F4620 and am using it as a data logger.  Storing logged
> data in an external SPI bus connected Serial Flash memory.   The
> serial Flash is 16Mbits or 2Mbyte.
>
> The Serial Flash is to be used as a circular buffer and end users will
> request the logged data by Record number or by timestamp.   (Timestamp
> is DWORD [4byte] seconds since 1990)
>
> i.e. return all records from timestamp A to Timestamp B or all records
> from record A to Record B.

First of all if you know

1/    Maximum memory size

2/    Records are fixed length

You don't need to store record number *IF* your timestamp is always valid.

If using circular storage method you can find highest and lowest record
number
by binary search on Timestamp (effectively record number in manner of
speaking).

Keeping records as fixed length and INTEGER sub-multiple of Flash page size
WILL help you as no doubt you will have to erase pages before writing new
records.
Otherwise you have to handle partial records across Flash page boundaries
where
half of the record is *erased*.

Use the erased state to determine full/erased pages (which you will need to
keep
track of). WHEN unit gets powered up do sanity check at power up. Hopefully
a
Flash page erase time is LESS than the amount of buffering space you have
for
records waiting to be written.

> Any way I need a really slick way of searching the logged data records
> by record number or timestamp and then return the data as quickly as
> possible.
>
> If any one has any ideas I'd really appreciate them.

Binary search in a circular buffer for lowest and highest Tiemstamp gives
first
and last record.

As buffer is circular (with possible gap), with fixed size records it
becomes easy
to list record A to record B.

With Timestamp it becomes a case of read highest and lowest (you will
probably
be keeping a record of this anyway for updates and parameter checks) and
work on
you binary search from there to find start and end timestamp record numbers.

If you are logging at regular intervals Timestamp A to Timestamp B becomes
even easier when you know the first and last records' time stamps.

That way you are maintaining very little information in RAM!

You only need to read the timestamp of the record when doing searches.

--
Paul Carpenter          | paul@pcserviceselectronics.co.uk
<http://www.pcserviceselectronics.co.uk/>    PC Services
<http://www.pcserviceselectronics.co.uk/fonts/>    Timing Diagram Font
<http://www.gnuh8.org.uk/>              GNU H8
<http://www.badweb.org.uk/>             For those web sites you hate

Reply by Anbeyon ●November 14, 20072007-11-14

Thanks every one for you comments and thoughts.

Anbeyon

PIC based Data Logger - Need slick search algorithm - any ideas ?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group