Camera interfaces| page 3

Reply by Richard Damon ●December 30, 20222022-12-30

On 12/30/22 2:27 AM, Don Y wrote:
> Hi George!
> 
> [Hope you are faring well... enjoying the COLD!&nbsp; ;) ]
> 
> On 12/29/2022 10:29 PM, George Neuner wrote:
>>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>>> nowadays.&nbsp; Are there any (mainstream/high volume) devices that
>>>>> "look" like a chunk of memory, in their native form?
> 
>>> I built my prototypes (proof-of-principle) using COTS USB cameras.
>>> But, getting the data out of the serial data stream and into RAM so
>>> it can be analyzed consumes memory bandwidth.
>>>
>>> I'm currently trying to sort out an approximate cost factor "per
>>> camera" (per video stream) and looking for ways that I can cut costs
>>> (memory bandwidth requirements) to allow greater numbers of
>>> cameras or higher frame rates.
>>
>> You aren't going to find anything low cost ... if you want bandwidth
>> for multiple cameras, you need to look into bus based frame grabbers.
>> They still exist, but are (relatively) expensive and getting harder to
>> find.
> 
> So, my options are:
> - reduce the overall frame rate such that N cameras can
>  &nbsp; be serviced by the USB (or whatever) interface *and*
>  &nbsp; the processing load
> - reduce the resolution of the cameras (a special case of the above)
> - reduce the number of cameras "per processor" (again, above)
> - design a "camera memory" (frame grabber) that I can install
>  &nbsp; multiply on a single host
> - develop distributed algorithms to allow more bandwidth to
>  &nbsp; effectively be applied
> 

The fact that you are starting for the concept of using "USB Cameras" 
sort of starts you with that sort of limit.

My personal thought on your problem is you want to put a "cheap" 
processor right on each camera using a processor with a direct camera 
interface to pull in the image and do your processing and send the 
results over some comm-link to the center core.

It is unclear what you actual image requirements per camera are, so it 
is hard to say what level camera and processor you will need.

My first feeling is you seem to be assuming a fairly cheep camera and 
then doing some fairly simple processing over the partial image, in 
which case you might even be able to live with a camera that uses a 
crude SPI interface to bring the frame in, and a very simple processor.

Reply by Don Y ●December 30, 20222022-12-30

On 12/30/2022 9:24 AM, Richard Damon wrote:
> On 12/30/22 2:27 AM, Don Y wrote:
>> Hi George!
>>
>> [Hope you are faring well... enjoying the COLD!&nbsp; ;) ]
>>
>> On 12/29/2022 10:29 PM, George Neuner wrote:
>>>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>>>> nowadays.&nbsp; Are there any (mainstream/high volume) devices that
>>>>>> "look" like a chunk of memory, in their native form?
>>
>>>> I built my prototypes (proof-of-principle) using COTS USB cameras.
>>>> But, getting the data out of the serial data stream and into RAM so
>>>> it can be analyzed consumes memory bandwidth.
>>>>
>>>> I'm currently trying to sort out an approximate cost factor "per
>>>> camera" (per video stream) and looking for ways that I can cut costs
>>>> (memory bandwidth requirements) to allow greater numbers of
>>>> cameras or higher frame rates.
>>>
>>> You aren't going to find anything low cost ... if you want bandwidth
>>> for multiple cameras, you need to look into bus based frame grabbers.
>>> They still exist, but are (relatively) expensive and getting harder to
>>> find.
>>
>> So, my options are:
>> - reduce the overall frame rate such that N cameras can
>> &nbsp;&nbsp; be serviced by the USB (or whatever) interface *and*
>> &nbsp;&nbsp; the processing load
>> - reduce the resolution of the cameras (a special case of the above)
>> - reduce the number of cameras "per processor" (again, above)
>> - design a "camera memory" (frame grabber) that I can install
>> &nbsp;&nbsp; multiply on a single host
>> - develop distributed algorithms to allow more bandwidth to
>> &nbsp;&nbsp; effectively be applied
> 
> The fact that you are starting for the concept of using "USB Cameras" sort of 
> starts you with that sort of limit.
> 
> My personal thought on your problem is you want to put a "cheap" processor 
> right on each camera using a processor with a direct camera interface to pull 
> in the image and do your processing and send the results over some comm-link to 
> the center core.

If I went the frame-grabber approach, that would be how I would address the
hardware.  But, it doesn't scale well.  I.e., at what point do you throw in
the towel and say there are too many concurrent images in the scene to
pile them all onto a single "host" processor?

ISTM that the better solution is to develop algorithms that can
process portions of the scene, concurrently, on different "hosts".
Then, coordinate these "partial results" to form the desired result.

I already have a "camera module" (host+USB camera) that has adequate
processing power to handle a "single camera scene".  But, these all
assume the scene can be easily defined to fit in that camera's field
of view.  E.g., point a camera across the path of a garage door and have
it "notice" any deviation from the "unobstructed" image.

When the scene gets too large to represent in enough detail in a single
camera's field of view, then there needs to be a way to coordinate
multiple cameras to a single (virtual?) host.  If those cameras were just
"chunks of memory", then the *imagery* would be easy to examine in a single
host -- though the processing power *might* need to increase geometrically
(depending on your current goal)

Moving the processing to "host per camera" implementation gives you more
MIPS.  But, makes coordinating partial results tedious.

> It is unclear what you actual image requirements per camera are, so it is hard 
> to say what level camera and processor you will need.
> 
> My first feeling is you seem to be assuming a fairly cheep camera and then 
> doing some fairly simple processing over the partial image, in which case you 
> might even be able to live with a camera that uses a crude SPI interface to 
> bring the frame in, and a very simple processor.

I use A LOT of cameras.  But, I should be able to swap the camera
(upgrade/downgrade) and still rely on the same *local* compute engine.
E.g., some of my cameras have Ir illuminators; it's not important
in others; some are PTZ; others fixed.

Watching for an obstruction in the path of a garage door (open/close)
has different requirements than trying to recognize a visitor at the front
door.  Or, identify the locations of the occupants of a facility.

Reply by Richard Damon ●December 30, 20222022-12-30

On 12/30/22 12:04 PM, Don Y wrote:
> On 12/30/2022 9:24 AM, Richard Damon wrote:
>> On 12/30/22 2:27 AM, Don Y wrote:
>>> Hi George!
>>>
>>> [Hope you are faring well... enjoying the COLD!&nbsp; ;) ]
>>>
>>> On 12/29/2022 10:29 PM, George Neuner wrote:
>>>>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>>>>> nowadays.&nbsp; Are there any (mainstream/high volume) devices that
>>>>>>> "look" like a chunk of memory, in their native form?
>>>
>>>>> I built my prototypes (proof-of-principle) using COTS USB cameras.
>>>>> But, getting the data out of the serial data stream and into RAM so
>>>>> it can be analyzed consumes memory bandwidth.
>>>>>
>>>>> I'm currently trying to sort out an approximate cost factor "per
>>>>> camera" (per video stream) and looking for ways that I can cut costs
>>>>> (memory bandwidth requirements) to allow greater numbers of
>>>>> cameras or higher frame rates.
>>>>
>>>> You aren't going to find anything low cost ... if you want bandwidth
>>>> for multiple cameras, you need to look into bus based frame grabbers.
>>>> They still exist, but are (relatively) expensive and getting harder to
>>>> find.
>>>
>>> So, my options are:
>>> - reduce the overall frame rate such that N cameras can
>>> &nbsp;&nbsp; be serviced by the USB (or whatever) interface *and*
>>> &nbsp;&nbsp; the processing load
>>> - reduce the resolution of the cameras (a special case of the above)
>>> - reduce the number of cameras "per processor" (again, above)
>>> - design a "camera memory" (frame grabber) that I can install
>>> &nbsp;&nbsp; multiply on a single host
>>> - develop distributed algorithms to allow more bandwidth to
>>> &nbsp;&nbsp; effectively be applied
>>
>> The fact that you are starting for the concept of using "USB Cameras" 
>> sort of starts you with that sort of limit.
>>
>> My personal thought on your problem is you want to put a "cheap" 
>> processor right on each camera using a processor with a direct camera 
>> interface to pull in the image and do your processing and send the 
>> results over some comm-link to the center core.
> 
> If I went the frame-grabber approach, that would be how I would address the
> hardware.&nbsp; But, it doesn't scale well.&nbsp; I.e., at what point do you throw in
> the towel and say there are too many concurrent images in the scene to
> pile them all onto a single "host" processor?

Thats why I didn't suggest that method. I was suggesting each camera has 
its own tightly coupled processor that handles the need of THAT

> 
> ISTM that the better solution is to develop algorithms that can
> process portions of the scene, concurrently, on different "hosts".
> Then, coordinate these "partial results" to form the desired result.
> 
> I already have a "camera module" (host+USB camera) that has adequate
> processing power to handle a "single camera scene".&nbsp; But, these all
> assume the scene can be easily defined to fit in that camera's field
> of view.&nbsp; E.g., point a camera across the path of a garage door and have
> it "notice" any deviation from the "unobstructed" image.

And if one camera can't fit the full scene, you use two cameras, each 
with there own processor, and they each process their own image.

The only problem is if your image processing algoritm need to compare 
parts of the images between the two cameras, which seems unlikely.

It does say that if trying to track something across the cameras, you 
need enough overlap to allow them to hand off the object when it is in 
the overlap.
> 
> When the scene gets too large to represent in enough detail in a single
> camera's field of view, then there needs to be a way to coordinate
> multiple cameras to a single (virtual?) host.&nbsp; If those cameras were just
> "chunks of memory", then the *imagery* would be easy to examine in a single
> host -- though the processing power *might* need to increase geometrically
> (depending on your current goal)

Yes, but your "chunks of memory" model just doesn't exist as a viable 
camera model.

The CMOS cameras with addressable pixels have "access times" 
significantly lower than your typical memory (and is read once) so 
doesn't really meet that model. Some of them do allow for sending 
multiple small regions of intererst and down loading just those regions, 
but this then starts to require moderate processor overhead to be 
loading all these regions and updating the grabber to put them where you 
want.

And yes, it does mean that there might be some cases where you need a 
core module that has TWO cameras connected to a single processor, either 
to get a wider field of view, or to combine two different types of 
camera (maybe a high res black and white to a low res color if you need 
just minor color information, or combine a visible camera to a thermal 
camera). These just become another tool in your tool box.

> 
> Moving the processing to "host per camera" implementation gives you more
> MIPS.&nbsp; But, makes coordinating partial results tedious.

Depends on what sort of partial results you are looking at.

> 
>> It is unclear what you actual image requirements per camera are, so it 
>> is hard to say what level camera and processor you will need.
>>
>> My first feeling is you seem to be assuming a fairly cheep camera and 
>> then doing some fairly simple processing over the partial image, in 
>> which case you might even be able to live with a camera that uses a 
>> crude SPI interface to bring the frame in, and a very simple processor.
> 
> I use A LOT of cameras.&nbsp; But, I should be able to swap the camera
> (upgrade/downgrade) and still rely on the same *local* compute engine.
> E.g., some of my cameras have Ir illuminators; it's not important
> in others; some are PTZ; others fixed.

Doesn't sound reasonable. If you downgrade a camera, you can't count on 
it being able to meet the same requirements, or you over speced the 
initial camera.

You put on a camera a processor capable of handling the tasks you expect 
out of that set of hardware.  One type of processor likely can handle a 
variaty of different camera setup with

> 
> Watching for an obstruction in the path of a garage door (open/close)
> has different requirements than trying to recognize a visitor at the front
> door.&nbsp; Or, identify the locations of the occupants of a facility.
> 

Yes, so you don't want to "Pay" for the capability to recognize a 
visitor in your garage door sensor, so you use different levels of 
sensor/processor.

Reply by Don Y ●December 30, 20222022-12-30

On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>> So, my options are:
>>>> - reduce the overall frame rate such that N cameras can
>>>> &nbsp;&nbsp; be serviced by the USB (or whatever) interface *and*
>>>> &nbsp;&nbsp; the processing load
>>>> - reduce the resolution of the cameras (a special case of the above)
>>>> - reduce the number of cameras "per processor" (again, above)
>>>> - design a "camera memory" (frame grabber) that I can install
>>>> &nbsp;&nbsp; multiply on a single host
>>>> - develop distributed algorithms to allow more bandwidth to
>>>> &nbsp;&nbsp; effectively be applied
>>>
>>> The fact that you are starting for the concept of using "USB Cameras" sort 
>>> of starts you with that sort of limit.
>>>
>>> My personal thought on your problem is you want to put a "cheap" processor 
>>> right on each camera using a processor with a direct camera interface to 
>>> pull in the image and do your processing and send the results over some 
>>> comm-link to the center core.
>>
>> If I went the frame-grabber approach, that would be how I would address the
>> hardware.&nbsp; But, it doesn't scale well.&nbsp; I.e., at what point do you throw in
>> the towel and say there are too many concurrent images in the scene to
>> pile them all onto a single "host" processor?
> 
> Thats why I didn't suggest that method. I was suggesting each camera has its 
> own tightly coupled processor that handles the need of THAT

My existing "module" handles a single USB camera (with a fairly heavy-weight
processor).

But, being USB-based, there is no way to look at *part* of an image.
And, I have to pay a relatively high cost (capturing the entire
image from the serial stream) to look at *any* part of it.

*If* a "camera memory" was available, I would site N of these
in the (64b) address space of the host and let the host pick
and choose which parts of which images it wanted to examine...
without worrying about all of the bandwidth that would have been
consumed deserializing those N images into that memory (which is
a continuous process)

>> ISTM that the better solution is to develop algorithms that can
>> process portions of the scene, concurrently, on different "hosts".
>> Then, coordinate these "partial results" to form the desired result.
>>
>> I already have a "camera module" (host+USB camera) that has adequate
>> processing power to handle a "single camera scene".&nbsp; But, these all
>> assume the scene can be easily defined to fit in that camera's field
>> of view.&nbsp; E.g., point a camera across the path of a garage door and have
>> it "notice" any deviation from the "unobstructed" image.
> 
> And if one camera can't fit the full scene, you use two cameras, each with 
> there own processor, and they each process their own image.

That's the above approach, but...

> The only problem is if your image processing algoritm need to compare parts of 
> the images between the two cameras, which seems unlikely.

Consider watching a single room (e.g., a lobby at a business) and
tracking the movements of "visitors".  It's unlikely that an individual's
movements would always be constrained to a single camera field.  There will
be times when he/she is "half-in" a field (and possibly NOT in the other,
HALF in the other or ENTIRELY in the other).  You can't ignore cases where
the entire object (or, your notion of what that object's characteristics
might be) is not entirely in the field as that leaves a vulnerability.

For example, I watch our garage door with *four* cameras.  A camera is
positioned on each side ("door jam"?) of the door "looking at" the other
camera.  This because a camera can't likely see the full height of the door
opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
and I'll watch *its* side!).

[The other two cameras are similarly positioned on the overhead *track*
onto which the door rolls, when open]

An object in (or near) the doorway can be visible in one (either) or
both cameras, depending on where it is located.  Additionally, one of
those manifestations may be only "partial" as regards to where it is
located and intersects the cameras' fields of view.

The "cost" of watching the door is only the cost of the actual *cameras*.
The cost of the compute resources is amortized over the rest of the system
as those can be used for other, non-camera, non-garage related activities.

> It does say that if trying to track something across the cameras, you need 
> enough overlap to allow them to hand off the object when it is in the overlap.

And, objects that consume large portions of a camera's field of view
require similar handling (unless you can always guarantee that cameras
and targets are "far apart")

>> When the scene gets too large to represent in enough detail in a single
>> camera's field of view, then there needs to be a way to coordinate
>> multiple cameras to a single (virtual?) host.&nbsp; If those cameras were just
>> "chunks of memory", then the *imagery* would be easy to examine in a single
>> host -- though the processing power *might* need to increase geometrically
>> (depending on your current goal)
> 
> Yes, but your "chunks of memory" model just doesn't exist as a viable camera 
> model.

Apparently not -- in the COTS sense.  But, that doesn't mean I can't
build a "camera memory emulator".

The downside is that this increases the cost of the "actual camera"
(see my above comment wrt ammortization).

And, it just moves the point at which a single host (of fixed capabilities)
can no longer handle the scene's complexity.  (when you have 10 cameras?)

> The CMOS cameras with addressable pixels have "access times" significantly 
> lower than your typical memory (and is read once) so doesn't really meet that 
> model. Some of them do allow for sending multiple small regions of intererst 
> and down loading just those regions, but this then starts to require moderate 
> processor overhead to be loading all these regions and updating the grabber to 
> put them where you want.

You would, instead, let the "camera memory emulator" capture the entire
image from the camera and place the entire image in a contiguous
region of memory (from the perspective of the host).  The cost of capturing
the portions that are not used is hidden *in* the cost of the "emulator".

> And yes, it does mean that there might be some cases where you need a core 
> module that has TWO cameras connected to a single processor, either to get a 
> wider field of view, or to combine two different types of camera (maybe a high 
> res black and white to a low res color if you need just minor color 
> information, or combine a visible camera to a thermal camera). These just 
> become another tool in your tool box.

I *think* (uncharted territory) that the better investment is to develop
algorithms that let me distribute the processing among multiple
(single) "camera modules/nodes".  How would your "two camera" exemplar
address an application requiring *three* cameras?  etc.

I can, currently, distribute this processing by treating the
region of memory into which a (local) camera's imagery is
deserialized as a "memory object" and then exporting *access*
to that object to other similar "camera modules/nodes".

But, the access times of non-local memory are horrendous, given
that the contents are ephemeral (if accesses could be *cached*
on each host needing them, then these costs diminish).

So, I need to come up with algorithms that let me export abstractions
instead of raw data.

>> Moving the processing to "host per camera" implementation gives you more
>> MIPS.&nbsp; But, makes coordinating partial results tedious.
> 
> Depends on what sort of partial results you are looking at.

"Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"

"Ah!  I was wondering whose legs those were in *my* image!"

>>> It is unclear what you actual image requirements per camera are, so it is 
>>> hard to say what level camera and processor you will need.
>>>
>>> My first feeling is you seem to be assuming a fairly cheep camera and then 
>>> doing some fairly simple processing over the partial image, in which case 
>>> you might even be able to live with a camera that uses a crude SPI interface 
>>> to bring the frame in, and a very simple processor.
>>
>> I use A LOT of cameras.&nbsp; But, I should be able to swap the camera
>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>> E.g., some of my cameras have Ir illuminators; it's not important
>> in others; some are PTZ; others fixed.
> 
> Doesn't sound reasonable. If you downgrade a camera, you can't count on it 
> being able to meet the same requirements, or you over speced the initial camera.

Sorry, I was using up/down relative to "nominal camera", not "specific camera
previously selected for application".  I'd 8really* like to just have a
single "camera module" (module = CPU+I/O) instead of one for camera type A
and another for camera type B, etc.

> You put on a camera a processor capable of handling the tasks you expect out of 
> that set of hardware.&nbsp; One type of processor likely can handle a variaty of 
> different camera setup with

Exactly.  If a particular instance has an Ir illuminator, then you include
controls for that in *the* "camera module".  If another instance doesn't have
this ability, then those controls go unused.

>> Watching for an obstruction in the path of a garage door (open/close)
>> has different requirements than trying to recognize a visitor at the front
>> door.&nbsp; Or, identify the locations of the occupants of a facility.
> 
> Yes, so you don't want to "Pay" for the capability to recognize a visitor in 
> your garage door sensor, so you use different levels of sensor/processor.

Exactly.  But, the algorithms that do the scene analysis can be the same;
you just parameterize the image and the objects within it that you seek.

There will likely be some combinations that exceed the capabilities of
the hardware to process in real-time.  So, you fall back to lower
frame rates or let the algorithms drop targets ("You watch Bob, I'll
watch Tom!")

Reply by Richard Damon ●December 31, 20222022-12-31

On 12/30/22 4:59 PM, Don Y wrote:
> On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>>> So, my options are:
>>>>> - reduce the overall frame rate such that N cameras can
>>>>> &nbsp;&nbsp; be serviced by the USB (or whatever) interface *and*
>>>>> &nbsp;&nbsp; the processing load
>>>>> - reduce the resolution of the cameras (a special case of the above)
>>>>> - reduce the number of cameras "per processor" (again, above)
>>>>> - design a "camera memory" (frame grabber) that I can install
>>>>> &nbsp;&nbsp; multiply on a single host
>>>>> - develop distributed algorithms to allow more bandwidth to
>>>>> &nbsp;&nbsp; effectively be applied
>>>>
>>>> The fact that you are starting for the concept of using "USB 
>>>> Cameras" sort of starts you with that sort of limit.
>>>>
>>>> My personal thought on your problem is you want to put a "cheap" 
>>>> processor right on each camera using a processor with a direct 
>>>> camera interface to pull in the image and do your processing and 
>>>> send the results over some comm-link to the center core.
>>>
>>> If I went the frame-grabber approach, that would be how I would 
>>> address the
>>> hardware.&nbsp; But, it doesn't scale well.&nbsp; I.e., at what point do you 
>>> throw in
>>> the towel and say there are too many concurrent images in the scene to
>>> pile them all onto a single "host" processor?
>>
>> Thats why I didn't suggest that method. I was suggesting each camera 
>> has its own tightly coupled processor that handles the need of THAT
> 
> My existing "module" handles a single USB camera (with a fairly 
> heavy-weight
> processor).
> 
> But, being USB-based, there is no way to look at *part* of an image.
> And, I have to pay a relatively high cost (capturing the entire
> image from the serial stream) to look at *any* part of it.

Yep, having chosen USB as your interface, you have limited yourself.

Since you say you have a fairly heavy-weight processor, that frame grab 
likely isn't you limiting factor.

> 
> *If* a "camera memory" was available, I would site N of these
> in the (64b) address space of the host and let the host pick
> and choose which parts of which images it wanted to examine...
> without worrying about all of the bandwidth that would have been
> consumed deserializing those N images into that memory (which is
> a continuous process)

But such a camera would almost certainly be designed for the processor 
to be on the same board as the camera, (or be VERY slow in access), so 
much less apt allow you to add multiple cameras to one processor.

> 
>>> ISTM that the better solution is to develop algorithms that can
>>> process portions of the scene, concurrently, on different "hosts".
>>> Then, coordinate these "partial results" to form the desired result.
>>>
>>> I already have a "camera module" (host+USB camera) that has adequate
>>> processing power to handle a "single camera scene".&nbsp; But, these all
>>> assume the scene can be easily defined to fit in that camera's field
>>> of view.&nbsp; E.g., point a camera across the path of a garage door and have
>>> it "notice" any deviation from the "unobstructed" image.
>>
>> And if one camera can't fit the full scene, you use two cameras, each 
>> with there own processor, and they each process their own image.
> 
> That's the above approach, but...
> 
>> The only problem is if your image processing algoritm need to compare 
>> parts of the images between the two cameras, which seems unlikely.
> 
> Consider watching a single room (e.g., a lobby at a business) and
> tracking the movements of "visitors".&nbsp; It's unlikely that an individual's
> movements would always be constrained to a single camera field.&nbsp; There will
> be times when he/she is "half-in" a field (and possibly NOT in the other,
> HALF in the other or ENTIRELY in the other).&nbsp; You can't ignore cases where
> the entire object (or, your notion of what that object's characteristics
> might be) is not entirely in the field as that leaves a vulnerability.

Sounds like you aren't overlapping your cameras enough or have 
insufficent coverage.  Maybe your problem is wrong field of view for 
your lens. Maybe you need fewer but better cameras with wider fields of 
view.

This might be due to try to use "stock" inexpensive USB cameras.
> 
> For example, I watch our garage door with *four* cameras.&nbsp; A camera is
> positioned on each side ("door jam"?) of the door "looking at" the other
> camera.&nbsp; This because a camera can't likely see the full height of the door
> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
> and I'll watch *its* side!).

Right, and if ANY see a problem, you stop. So no need for inter-camera 
coordination.
> 
> [The other two cameras are similarly positioned on the overhead *track*
> onto which the door rolls, when open]
> 
> An object in (or near) the doorway can be visible in one (either) or
> both cameras, depending on where it is located.&nbsp; Additionally, one of
> those manifestations may be only "partial" as regards to where it is
> located and intersects the cameras' fields of view.

But since you aren't trying to ID, only Detect, there still isn't a need 
for camera-camera processing, just camera-door controller

> 
> The "cost" of watching the door is only the cost of the actual *cameras*.
> The cost of the compute resources is amortized over the rest of the system
> as those can be used for other, non-camera, non-garage related activities.
> 
>> It does say that if trying to track something across the cameras, you 
>> need enough overlap to allow them to hand off the object when it is in 
>> the overlap.
> 
> And, objects that consume large portions of a camera's field of view
> require similar handling (unless you can always guarantee that cameras
> and targets are "far apart")
> 
>>> When the scene gets too large to represent in enough detail in a single
>>> camera's field of view, then there needs to be a way to coordinate
>>> multiple cameras to a single (virtual?) host.&nbsp; If those cameras were 
>>> just
>>> "chunks of memory", then the *imagery* would be easy to examine in a 
>>> single
>>> host -- though the processing power *might* need to increase 
>>> geometrically
>>> (depending on your current goal)
>>
>> Yes, but your "chunks of memory" model just doesn't exist as a viable 
>> camera model.
> 
> Apparently not -- in the COTS sense.&nbsp; But, that doesn't mean I can't
> build a "camera memory emulator".
> 
> The downside is that this increases the cost of the "actual camera"
> (see my above comment wrt ammortization).

Yep, implementing this likely costs more than giving the camera a 
dedicated moderate processor to do the major work. Might not handle the 
actual ID problem of your Door bell, but could likely process the live 
video, take a snapshot of a region with a good view of the vistor 
coming, and send just that to your master system for ID.

> 
> And, it just moves the point at which a single host (of fixed capabilities)
> can no longer handle the scene's complexity.&nbsp; (when you have 10 cameras?)
> 
>> The CMOS cameras with addressable pixels have "access times" 
>> significantly lower than your typical memory (and is read once) so 
>> doesn't really meet that model. Some of them do allow for sending 
>> multiple small regions of intererst and down loading just those 
>> regions, but this then starts to require moderate processor overhead 
>> to be loading all these regions and updating the grabber to put them 
>> where you want.
> 
> You would, instead, let the "camera memory emulator" capture the entire
> image from the camera and place the entire image in a contiguous
> region of memory (from the perspective of the host).&nbsp; The cost of capturing
> the portions that are not used is hidden *in* the cost of the "emulator".

Yep, you could build you system with a two-port memory buffer between 
the frane grabber loading with one port, and the decoding processor on 
the other.

The most cost effective way to do this is likely a commercial 
frame-grabber with built "two-port" memory, that sits in a slot of a PC 
type computer. These would likely not work with a "USB Camera" (why 
would you need a frame grabber with a camera that has it built in) so 
would be totally changing your cost models.

IF your current design method is based on using USB cameras, trying to 
do a full custom interface may be out of your field of operation.

> 
>> And yes, it does mean that there might be some cases where you need a 
>> core module that has TWO cameras connected to a single processor, 
>> either to get a wider field of view, or to combine two different types 
>> of camera (maybe a high res black and white to a low res color if you 
>> need just minor color information, or combine a visible camera to a 
>> thermal camera). These just become another tool in your tool box.
> 
> I *think* (uncharted territory) that the better investment is to develop
> algorithms that let me distribute the processing among multiple
> (single) "camera modules/nodes".&nbsp; How would your "two camera" exemplar
> address an application requiring *three* cameras?&nbsp; etc.

The first question comes, what processing are you thinking of that needs 
images from 3 cameras.

Note, my two camera example was a case where the processing needed to be 
done did need data from two cameras.

If you have another task that needs a different camera, you just build a 
system with one two camera model and one 1 camera module, relaying back 
to a central control, or you nominate one of the modules to be central 
control if the load there is light enough.

Your garage doer example would be built from 4 seperate and independent 
1 camera modules, either going to one as the master, or to a 5th module 
acting as the master.

The cases I can think of for needing to process three cameras together 
would be:

1) a system stiching images from 3 cameras and generating a single image 
out of it, but that totally breaks your concept of needing only bits of 
the images, that inherently is using most of each camera, and doing some 
stiching processing on the overlaps.

2) A Multi-spectrum system, where again, you are taking the ENTIRE scene 
from the three cameras and producing a merged "false-color" image from 
them. Again, this also breaks you partial image model.

> 
> I can, currently, distribute this processing by treating the
> region of memory into which a (local) camera's imagery is
> deserialized as a "memory object" and then exporting *access*
> to that object to other similar "camera modules/nodes".
> 
> But, the access times of non-local memory are horrendous, given
> that the contents are ephemeral (if accesses could be *cached*
> on each host needing them, then these costs diminish).
> 
> So, I need to come up with algorithms that let me export abstractions
> instead of raw data.

Sounds like you current design is very centralized. This limits its 
scalability,

> 
>>> Moving the processing to "host per camera" implementation gives you more
>>> MIPS.&nbsp; But, makes coordinating partial results tedious.
>>
>> Depends on what sort of partial results you are looking at.
> 
> "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"
> 
> "Ah!&nbsp; I was wondering whose legs those were in *my* image!"
> 
>>>> It is unclear what you actual image requirements per camera are, so 
>>>> it is hard to say what level camera and processor you will need.
>>>>
>>>> My first feeling is you seem to be assuming a fairly cheep camera 
>>>> and then doing some fairly simple processing over the partial image, 
>>>> in which case you might even be able to live with a camera that uses 
>>>> a crude SPI interface to bring the frame in, and a very simple 
>>>> processor.
>>>
>>> I use A LOT of cameras.&nbsp; But, I should be able to swap the camera
>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>> E.g., some of my cameras have Ir illuminators; it's not important
>>> in others; some are PTZ; others fixed.
>>
>> Doesn't sound reasonable. If you downgrade a camera, you can't count 
>> on it being able to meet the same requirements, or you over speced the 
>> initial camera.
> 
> Sorry, I was using up/down relative to "nominal camera", not "specific 
> camera
> previously selected for application".&nbsp; I'd 8really* like to just have a
> single "camera module" (module = CPU+I/O) instead of one for camera type A
> and another for camera type B, etc.
> 

That only works if you are willing to spend for the sports car, even if 
you just need it to go around the block.

It depends a bit on how much span you need of capability. A $10 camera 
is likely having a very different interface to a $30,000 camera, so will 
need a different board. Some boards might handle multiple camera 
interface types if it doesn't add a lot to the board, but you are apt to 
find that you need to make some choice.

Then some tasks will just need a lot more computer power than others. 
Yes, you can just put too much computer power on the simple tasks, (and 
that might make sense to early design the higher end processor), but 
ultimately you are going to want the less expensive lower end processors.

>> You put on a camera a processor capable of handling the tasks you 
>> expect out of that set of hardware.&nbsp; One type of processor likely can 
>> handle a variaty of different camera setup with
> 
> Exactly.&nbsp; If a particular instance has an Ir illuminator, then you include
> controls for that in *the* "camera module".&nbsp; If another instance doesn't 
> have
> this ability, then those controls go unused.

Yes, Auxilary functionality is often cheap to include the hooks for.

> 
>>> Watching for an obstruction in the path of a garage door (open/close)
>>> has different requirements than trying to recognize a visitor at the 
>>> front
>>> door.&nbsp; Or, identify the locations of the occupants of a facility.
>>
>> Yes, so you don't want to "Pay" for the capability to recognize a 
>> visitor in your garage door sensor, so you use different levels of 
>> sensor/processor.
> 
> Exactly.&nbsp; But, the algorithms that do the scene analysis can be the same;
> you just parameterize the image and the objects within it that you seek.

Actually, "Tracking" can be a very different type of algorithm then 
"Detecting". You might be able to use a Tracking base algorithm to 
Detect, but likely a much simpler algorithm can be used (needing less 
resources) to just detect.

> 
> There will likely be some combinations that exceed the capabilities of
> the hardware to process in real-time.&nbsp; So, you fall back to lower
> frame rates or let the algorithms drop targets ("You watch Bob, I'll
> watch Tom!")
> 
>

Reply by Don Y ●December 31, 20222022-12-31

On 12/30/2022 10:39 PM, Richard Damon wrote:
> On 12/30/22 4:59 PM, Don Y wrote:
>> On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>>>> So, my options are:
>>>>>> - reduce the overall frame rate such that N cameras can
>>>>>> &nbsp;&nbsp; be serviced by the USB (or whatever) interface *and*
>>>>>> &nbsp;&nbsp; the processing load
>>>>>> - reduce the resolution of the cameras (a special case of the above)
>>>>>> - reduce the number of cameras "per processor" (again, above)
>>>>>> - design a "camera memory" (frame grabber) that I can install
>>>>>> &nbsp;&nbsp; multiply on a single host
>>>>>> - develop distributed algorithms to allow more bandwidth to
>>>>>> &nbsp;&nbsp; effectively be applied
>>>>>
>>>>> The fact that you are starting for the concept of using "USB Cameras" sort 
>>>>> of starts you with that sort of limit.
>>>>>
>>>>> My personal thought on your problem is you want to put a "cheap" processor 
>>>>> right on each camera using a processor with a direct camera interface to 
>>>>> pull in the image and do your processing and send the results over some 
>>>>> comm-link to the center core.
>>>>
>>>> If I went the frame-grabber approach, that would be how I would address the
>>>> hardware.&nbsp; But, it doesn't scale well.&nbsp; I.e., at what point do you throw in
>>>> the towel and say there are too many concurrent images in the scene to
>>>> pile them all onto a single "host" processor?
>>>
>>> Thats why I didn't suggest that method. I was suggesting each camera has its 
>>> own tightly coupled processor that handles the need of THAT
>>
>> My existing "module" handles a single USB camera (with a fairly heavy-weight
>> processor).
>>
>> But, being USB-based, there is no way to look at *part* of an image.
>> And, I have to pay a relatively high cost (capturing the entire
>> image from the serial stream) to look at *any* part of it.
> 
> Yep, having chosen USB as your interface, you have limited yourself.

Doesn't matter.  Any serial interface poses the same problem;
I can't examine the image until I can *look* at it.

> Since you say you have a fairly heavy-weight processor, that frame grab likely 
> isn't you limiting factor.

It becomes an issue when the number of cameras increases
significantly on a single host.  I have one scene that requires
11 cameras to capture, completely.

>> *If* a "camera memory" was available, I would site N of these
>> in the (64b) address space of the host and let the host pick
>> and choose which parts of which images it wanted to examine...
>> without worrying about all of the bandwidth that would have been
>> consumed deserializing those N images into that memory (which is
>> a continuous process)
> 
> But such a camera would almost certainly be designed for the processor to be on 
> the same board as the camera, (or be VERY slow in access), so much less apt 
> allow you to add multiple cameras to one processor.

Yes.  But, if the module is small, then siting the assembly "someplace
convenient" isn't a big issue.  I.e., my modules are smaller than most
webcams/dashcams.

>>>> ISTM that the better solution is to develop algorithms that can
>>>> process portions of the scene, concurrently, on different "hosts".
>>>> Then, coordinate these "partial results" to form the desired result.
>>>>
>>>> I already have a "camera module" (host+USB camera) that has adequate
>>>> processing power to handle a "single camera scene".&nbsp; But, these all
>>>> assume the scene can be easily defined to fit in that camera's field
>>>> of view.&nbsp; E.g., point a camera across the path of a garage door and have
>>>> it "notice" any deviation from the "unobstructed" image.
>>>
>>> And if one camera can't fit the full scene, you use two cameras, each with 
>>> there own processor, and they each process their own image.
>>
>> That's the above approach, but...
>>
>>> The only problem is if your image processing algoritm need to compare parts 
>>> of the images between the two cameras, which seems unlikely.
>>
>> Consider watching a single room (e.g., a lobby at a business) and
>> tracking the movements of "visitors".&nbsp; It's unlikely that an individual's
>> movements would always be constrained to a single camera field.&nbsp; There will
>> be times when he/she is "half-in" a field (and possibly NOT in the other,
>> HALF in the other or ENTIRELY in the other).&nbsp; You can't ignore cases where
>> the entire object (or, your notion of what that object's characteristics
>> might be) is not entirely in the field as that leaves a vulnerability.
> 
> Sounds like you aren't overlapping your cameras enough or have insufficent 
> coverage.&nbsp; Maybe your problem is wrong field of view for your lens. Maybe you 
> need fewer but better cameras with wider fields of view.

Distance from camera to target means you have to play games with optics
that can distort images.

I also can't rely on "professional installers" *or* for the cameras to remain
aimed in their original configurations.

> This might be due to try to use "stock" inexpensive USB cameras.
>
>> For example, I watch our garage door with *four* cameras.&nbsp; A camera is
>> positioned on each side ("door jam"?) of the door "looking at" the other
>> camera.&nbsp; This because a camera can't likely see the full height of the door
>> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
>> and I'll watch *its* side!).
> 
> Right, and if ANY see a problem, you stop. So no need for inter-camera 
> coordination.

But you don't know there is a problem until you can identify *where*
the obstruction exists and if that poses a problem for the vehicle
or the "obstructing item".  Doing so requires knowing what the
object likely is.

E.g., SWMBO frequently stands in the doorway as I pull the car in or
out (not enough room between vehicles *in* the garage to allow for
ease of entry/egress).  I'd not want this to be flagged as a
problem (signalling an alert in the vehicle).

Likewise, an obstruction on one vehicle-side of the garage shouldn't
interfere with access to the other side.

>> [The other two cameras are similarly positioned on the overhead *track*
>> onto which the door rolls, when open]
>>
>> An object in (or near) the doorway can be visible in one (either) or
>> both cameras, depending on where it is located.&nbsp; Additionally, one of
>> those manifestations may be only "partial" as regards to where it is
>> located and intersects the cameras' fields of view.
> 
> But since you aren't trying to ID, only Detect, there still isn't a need for 
> camera-camera processing, just camera-door controller

The cameras need to coordinate to resolve the location of the object.
A "toy wagon" would present differently, visually, than a tall person.

>>>> When the scene gets too large to represent in enough detail in a single
>>>> camera's field of view, then there needs to be a way to coordinate
>>>> multiple cameras to a single (virtual?) host.&nbsp; If those cameras were just
>>>> "chunks of memory", then the *imagery* would be easy to examine in a single
>>>> host -- though the processing power *might* need to increase geometrically
>>>> (depending on your current goal)
>>>
>>> Yes, but your "chunks of memory" model just doesn't exist as a viable camera 
>>> model.
>>
>> Apparently not -- in the COTS sense.&nbsp; But, that doesn't mean I can't
>> build a "camera memory emulator".
>>
>> The downside is that this increases the cost of the "actual camera"
>> (see my above comment wrt ammortization).
> 
> Yep, implementing this likely costs more than giving the camera a dedicated 
> moderate processor to do the major work. Might not handle the actual ID problem 
> of your Door bell, but could likely process the live video, take a snapshot of 
> a region with a good view of the vistor coming, and send just that to your 
> master system for ID.

But, then I could just use one of my existing "modules".  If the
target fits entirely within its field of view, then it has everything
that it needs for the assigned functionality.  If not, then it
needs to consult with other cameras.

>>> The CMOS cameras with addressable pixels have "access times" significantly 
>>> lower than your typical memory (and is read once) so doesn't really meet 
>>> that model. Some of them do allow for sending multiple small regions of 
>>> intererst and down loading just those regions, but this then starts to 
>>> require moderate processor overhead to be loading all these regions and 
>>> updating the grabber to put them where you want.
>>
>> You would, instead, let the "camera memory emulator" capture the entire
>> image from the camera and place the entire image in a contiguous
>> region of memory (from the perspective of the host).&nbsp; The cost of capturing
>> the portions that are not used is hidden *in* the cost of the "emulator".
> 
> Yep, you could build you system with a two-port memory buffer between the frane 
> grabber loading with one port, and the decoding processor on the other.

Yes.  But large *true* dual-port memories are costly.  Instead, you would
emulate such a device either by time-division multiplexing a single
physical memory *or* sharing alternate memories (fill one, view the other).

> The most cost effective way to do this is likely a commercial frame-grabber 
> with built "two-port" memory, that sits in a slot of a PC type computer. These 
> would likely not work with a "USB Camera" (why would you need a frame grabber 
> with a camera that has it built in) so would be totally changing your cost models.

Yes, I have a few of these intended for medical imaging apps.
Way too big; way too expensive.  Designed for the wrong type of "host"

> IF your current design method is based on using USB cameras, trying to do a 
> full custom interface may be out of your field of operation.
> 
>>> And yes, it does mean that there might be some cases where you need a core 
>>> module that has TWO cameras connected to a single processor, either to get a 
>>> wider field of view, or to combine two different types of camera (maybe a 
>>> high res black and white to a low res color if you need just minor color 
>>> information, or combine a visible camera to a thermal camera). These just 
>>> become another tool in your tool box.
>>
>> I *think* (uncharted territory) that the better investment is to develop
>> algorithms that let me distribute the processing among multiple
>> (single) "camera modules/nodes".&nbsp; How would your "two camera" exemplar
>> address an application requiring *three* cameras?&nbsp; etc.
> 
> The first question comes, what processing are you thinking of that needs images 
> from 3 cameras.
> 
> Note, my two camera example was a case where the processing needed to be done 
> did need data from two cameras.
> 
> If you have another task that needs a different camera, you just build a system 
> with one two camera model and one 1 camera module, relaying back to a central 
> control, or you nominate one of the modules to be central control if the load 
> there is light enough.
> 
> Your garage doer example would be built from 4 seperate and independent 1 
> camera modules, either going to one as the master, or to a 5th module acting as 
> the master.

Yes, but they have to share image data (either raw or abstracted)
to make deductions about the targets present.

> The cases I can think of for needing to process three cameras together would be:
> 
> 1) a system stiching images from 3 cameras and generating a single image out of 
> it, but that totally breaks your concept of needing only bits of the images, 
> that inherently is using most of each camera, and doing some stiching 
> processing on the overlaps.
> 
> 2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from 
> the three cameras and producing a merged "false-color" image from them. Again, 
> this also breaks you partial image model.

Or, tracking multiple actors in an "arena" -- visitors in a business,
occupants in a home, etc.  In much the same way that the two garage
door cameras conspire to locate the obstruction's position along the
line from left doorjam to right, pairs of cameras can resolve
a target in an arena and *sets* of cameras (freely paired, as needed)
can track all locations (and targets) in the arena.

>> I can, currently, distribute this processing by treating the
>> region of memory into which a (local) camera's imagery is
>> deserialized as a "memory object" and then exporting *access*
>> to that object to other similar "camera modules/nodes".
>>
>> But, the access times of non-local memory are horrendous, given
>> that the contents are ephemeral (if accesses could be *cached*
>> on each host needing them, then these costs diminish).
>>
>> So, I need to come up with algorithms that let me export abstractions
>> instead of raw data.
> 
> Sounds like you current design is very centralized. This limits its scalability,

The current design is completely distributed.  The only "shared component"
is the network switch through which they converse and the RDBMS that acts
as the persistent store.

If a site realizes that it needs additional coverage to track <whatever>
it just adds another camera module and lets the RDBMS know about it's general
location/functionality (i.e., how it can relate to any other cameras
covering the same arena)

>>>>> My first feeling is you seem to be assuming a fairly cheep camera and then 
>>>>> doing some fairly simple processing over the partial image, in which case 
>>>>> you might even be able to live with a camera that uses a crude SPI 
>>>>> interface to bring the frame in, and a very simple processor.
>>>>
>>>> I use A LOT of cameras.&nbsp; But, I should be able to swap the camera
>>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>>> E.g., some of my cameras have Ir illuminators; it's not important
>>>> in others; some are PTZ; others fixed.
>>>
>>> Doesn't sound reasonable. If you downgrade a camera, you can't count on it 
>>> being able to meet the same requirements, or you over speced the initial 
>>> camera.
>>
>> Sorry, I was using up/down relative to "nominal camera", not "specific camera
>> previously selected for application".&nbsp; I'd 8really* like to just have a
>> single "camera module" (module = CPU+I/O) instead of one for camera type A
>> and another for camera type B, etc.
> 
> That only works if you are willing to spend for the sports car, even if you 
> just need it to go around the block.

If the "extra" bits of the sports car can be used by other elements,
then those costs aren't directly borne by the camera module, itself.
E.g., when the garage door is closed, there's no reason the modules
in the garage can't be busy training speech models or removing
commercials from recorded broadcast content.

If, OTOH, you detect objects with a photo-interrupter across the door's
path, there's scant little it can do when not needed.

> It depends a bit on how much span you need of capability. A $10 camera is 
> likely having a very different interface to a $30,000 camera, so will need a 
> different board. Some boards might handle multiple camera interface types if it 
> doesn't add a lot to the board, but you are apt to find that you need to make 
> some choice.

I don't ever see a need for a $30,000 camera.  There may be a need for a
PTZ model.  Or, a low lux model.  Or, one with longer focal length. Or,
shorter (I'd considered putting one *in* the mailbox to examine its
contents instead of just detecting that it had been "visited").

Instead of a 4K device, I'd opt for multiple simpler devices better
positioned.

But, not radically different in terms of cost, size, etc.

If you walk into a bank lobby, you don't see *one* super-high resolution,
wide field camera surveilling the lobby but, rather half a dozen or more
watching specific portions of the lobby.  Similarly, if you use the
self-check at the store, there is a camera per checkout station instead
of one "really good" camera located centrally trying to take it all in.

This gives installers more leeway in terms of how they cover an arena.

> Then some tasks will just need a lot more computer power than others. Yes, you 
> can just put too much computer power on the simple tasks, (and that might make 
> sense to early design the higher end processor), but ultimately you are going 
> to want the less expensive lower end processors.

I can call on surplus processing power from other nodes in the system
in much the same way that they can call on surplus capabilities from
a camera module that isn't "seeing" anything interesting, at the moment.

There will always be limits on what can be done; I'm not going
to be able to VISUALLY verify that you have the right wrench in
your hand as you set about working on the car.  Or, that you
are holding an eating utensil instead of a random piece of
plastic as you traverse the kitchen.

But, I'll know YOU are in the kitchen and likely the person whose
voice I hear (to further reinforce the speaker identification
algorithms).

>>> You put on a camera a processor capable of handling the tasks you expect out 
>>> of that set of hardware.&nbsp; One type of processor likely can handle a variaty 
>>> of different camera setup with
>>
>> Exactly.&nbsp; If a particular instance has an Ir illuminator, then you include
>> controls for that in *the* "camera module".&nbsp; If another instance doesn't have
>> this ability, then those controls go unused.
> 
> Yes, Auxilary functionality is often cheap to include the hooks for.

But, it often requires looking at your TOTAL needs instead of designing
for specific (initial) needs.  E.g., my camera modules now include
audio capabilities as there are instances where I want an audio
pickup in the same arena that I am monitoring.  Silly to have to add
an "audio module" just because I didn't have the foresight to
include it with the camera!

>>>> Watching for an obstruction in the path of a garage door (open/close)
>>>> has different requirements than trying to recognize a visitor at the front
>>>> door.&nbsp; Or, identify the locations of the occupants of a facility.
>>>
>>> Yes, so you don't want to "Pay" for the capability to recognize a visitor in 
>>> your garage door sensor, so you use different levels of sensor/processor.
>>
>> Exactly.&nbsp; But, the algorithms that do the scene analysis can be the same;
>> you just parameterize the image and the objects within it that you seek.
> 
> Actually, "Tracking" can be a very different type of algorithm then 
> "Detecting". You might be able to use a Tracking base algorithm to Detect, but 
> likely a much simpler algorithm can be used (needing less resources) to just 
> detect.

My current detection algorithm (e.g., garage) just looks for deltas between
"clear" and "obstructed" imagery, conditioned by masks.  There is some
image processing required as things look different at night vs. day, etc.

I don't have to "get it right".  All I have to do is demonstrate "proof of
concept".  And, be able to indicate why a particular approach is superior
to others/existing ones.

E.g., if you drive a "pickup-on-steroids", you'd need to locate a
photointerrupter "obstruction detector" pretty high up off the ground
to catch the case where the truck bed was in the way of the door.
Or, some lumber overhanging the end of the bed that you forgot you'd
brought home!  And, you'd likely need *another* detector down low
to catch toddlers or toy wagons in the path of the door.

OTOH, doing the detection with a camera catches these use conditions
in addition to the "nominal" one for which the photointerrupter was
designed.

Tracking two/four occupants of a home *suggests* that you can track
6 or 8.  Or, dozens of employees in a business conference room, etc.

I have no desire to spend my time perfecting any of these
technologies (I have other goals); just lay the groundwork and the
framework to make them possible.

>> There will likely be some combinations that exceed the capabilities of
>> the hardware to process in real-time.&nbsp; So, you fall back to lower
>> frame rates or let the algorithms drop targets ("You watch Bob, I'll
>> watch Tom!")

Reply by Dimiter_Popoff ●December 31, 20222022-12-31

On 12/30/2022 5:32, Don Y wrote:
> On 12/29/2022 5:40 PM, Richard Damon wrote:
>> On 12/29/22 5:57 PM, Don Y wrote:
>>> On 12/29/2022 2:09 PM, Richard Damon wrote:
>>>> On 12/29/22 2:26 PM, Don Y wrote:
>>>>> On 12/29/2022 10:06 AM, Richard Damon wrote:
>>>>>> On 12/29/22 8:16 AM, Don Y wrote:
>>>>>>> ISTR playing with de-encapsulated DRAMs as image sensors
>>>>>>> back in school (DRAM being relatively new technology, then).
>>>>>>>
>>>>>>> But, most cameras seem to have (bit- or word-) serial interfaces
>>>>>>> nowadays.&nbsp; Are there any (mainstream/high volume) devices that
>>>>>>> "look" like a chunk of memory, in their native form?
>>>>>>
>>>>>> Using a DRAM in that manner would only give you a single bit value 
>>>>>> for each pixel (maybe some more modern memories store multiple 
>>>>>> bits in a cell so you get a few grey levels).
>>>>>
>>>>> I mentioned the DRAM reference only as an exemplar of how a "true"
>>>>> parallel, random access interface could exist.
>>>>
>>>> Right, and cameras based on parallel random access do exist, but 
>>>> tend to be on the smaller and slower end of the spectrum.
>>>>
>>>>>
>>>>>> There are some CMOS sensors that let you address pixels 
>>>>>> individually and in a random order (like you got with the DRAM) 
>>>>>> but by its nature, such a readout method tends to be slow, and 
>>>>>> space inefficient, so these interfaces tend to be only available 
>>>>>> on smaller camera arrays.
>>>>>
>>>>> But, if you are processing the image, such an approach can lead to
>>>>> higher throughput than having to transfer a serial data stream into
>>>>> memory (thus consuming memory bandwidth).
>>>>
>>>> My guess is that in almost all cases, the need to send the address 
>>>> to teh camera and then get back the pixel value is going to use up 
>>>> more total bandwidth than getting the image in a stream. The one 
>>>> exception would be if you need just a very small percentage of the 
>>>> array data, and it is scattered over the array so a Region of 
>>>> Interest operation can't be used.
>>>
>>> No, you're missing the nature of the DRAM example.
>>>
>>> You don't "send" the address of the memory cell desired *to* the DRAM.
>>> You simply *address* the memory cell, directly.&nbsp; I.e., if there are
>>> N locations in the DRAM, then N addresses in your address space are
>>> consumed by it; one for each location in the array.
>>
>> No, look at you DRAM timing again, the trasaction begins with the 
>> sending of the address over typically two clock edges with RAS and 
>> CAS, and then a couple of clock cycles and then you get back on the 
>> data bus the answer.
> 
> But it's a single memory reference.&nbsp; Look at what happens when you
> deserialize a USB video stream into that same DRAM.&nbsp; The DMAC has
> tied up the bus for the same amount of time that the processor
> would have if it read those same N locations.
> 
>> Yes, the addresses come from an address bus, using address space out 
>> of the processor, but it is a multi-cycle operation. Typically, you 
>> read back a "burst" with some minimal caching on the processor side, 
>> but that is more a minor detail.
>>
>>> I'm looking for *that* sort of "direct access" in a camera.
>>
>> Its been awhile, but I thought some CMOS cameras could work on a 
>> similar basis, strobe a Row/Column address from pins on the camera, 
>> and a few clock cycles later you got a burst out of the camera 
>> starting at the address cell.
> 
> I don't want the camera to decide which pixels *it* thinks I want to see.
> It sends me a burst of a row -- but the next part of the image I may have
> wanted to access may have been down the same *column*.&nbsp; Or, in another
> part of the image entirely.
> 
> Serial protocols inherently deliver data in a predefined pattern
> (often intended for display).&nbsp; Scene analysis doesn't necessarily
> conform to that same pattern.

Isn't there a camera doing a protocol which allows you to request
a specific area only to be transferred? RFB like, VNC does that
all the time.

Reply by Don Y ●December 31, 20222022-12-31

On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
>> Serial protocols inherently deliver data in a predefined pattern
>> (often intended for display).&nbsp; Scene analysis doesn't necessarily
>> conform to that same pattern.
> 
> Isn't there a camera doing a protocol which allows you to request
> a specific area only to be transferred? RFB like, VNC does that
> all the time.

That only makes sense if you know, a priori, which part(s) of the
image you might want to examine.  E.g., it would work for
"exposing" just the portion of the field that "overlaps" some
other image.  I can get fixed parts of partial frames from
*other* cameras just by ensuring the other camera puts that
portion of the image in a particular memory object and then
export that memory object to the node that wants it.

But, if a target can move into or out of the exposed area, then
you have to make a return trip to the camera to request MORE of
the field.

When your targets are "far away" (like a surveillance camera
monitoring a parking lot), targets don't move from their
previous noted positions considerably from one frame to the
next.

But, when the camera and targets are in close proximity,
there's greater (apparent) relative motion in the same
frame-interval.  So, knowing where (x,y+WxH)) the portion of
the image of interest lay, previously, is less predictive
of where it may lie currently.

Having the entire image available means the software
can look <wherever> and <whenever>.

Reply by Dimiter_Popoff ●December 31, 20222022-12-31

On 12/31/2022 20:16, Don Y wrote:
> On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
>>> Serial protocols inherently deliver data in a predefined pattern
>>> (often intended for display).&nbsp; Scene analysis doesn't necessarily
>>> conform to that same pattern.
>>
>> Isn't there a camera doing a protocol which allows you to request
>> a specific area only to be transferred? RFB like, VNC does that
>> all the time.
> 
> That only makes sense if you know, a priori, which part(s) of the
> image you might want to examine.&nbsp; E.g., it would work for
> "exposing" just the portion of the field that "overlaps" some
> other image.&nbsp; I can get fixed parts of partial frames from
> *other* cameras just by ensuring the other camera puts that
> portion of the image in a particular memory object and then
> export that memory object to the node that wants it.
> 
> But, if a target can move into or out of the exposed area, then
> you have to make a return trip to the camera to request MORE of
> the field.
> 
> When your targets are "far away" (like a surveillance camera
> monitoring a parking lot), targets don't move from their
> previous noted positions considerably from one frame to the
> next.
> 
> But, when the camera and targets are in close proximity,
> there's greater (apparent) relative motion in the same
> frame-interval.&nbsp; So, knowing where (x,y+WxH)) the portion of
> the image of interest lay, previously, is less predictive
> of where it may lie currently.
> 
> Having the entire image available means the software
> can look <wherever> and <whenever>.
> 

Well yes, obviously so, but this is valid whatever the interface.
Direct access to the sensor cells can't be double buffered so
you will have to transfer anyway to get the frame you are analyzing
static.
Perhaps you could find a way to make yourself some camera module
using an existing one, MIPI or even USB, since you are looking for low
overall cost; and add some MCU board to it to do the buffering and
transfer areas on request. Or may be put enough CPU power together with
each camera to do most if not all of the analysis... Depending on
which achieves the lowest cost. But I can't say much on cost, that's
pretty far from me (as you know).

Reply by Don Y ●December 31, 20222022-12-31

On 12/31/2022 1:13 PM, Dimiter_Popoff wrote:
> On 12/31/2022 20:16, Don Y wrote:
>> On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
>>>> Serial protocols inherently deliver data in a predefined pattern
>>>> (often intended for display).&nbsp; Scene analysis doesn't necessarily
>>>> conform to that same pattern.
>>>
>>> Isn't there a camera doing a protocol which allows you to request
>>> a specific area only to be transferred? RFB like, VNC does that
>>> all the time.
>>
>> That only makes sense if you know, a priori, which part(s) of the
>> image you might want to examine.&nbsp; E.g., it would work for
>> "exposing" just the portion of the field that "overlaps" some
>> other image.&nbsp; I can get fixed parts of partial frames from
>> *other* cameras just by ensuring the other camera puts that
>> portion of the image in a particular memory object and then
>> export that memory object to the node that wants it.
>>
>> But, if a target can move into or out of the exposed area, then
>> you have to make a return trip to the camera to request MORE of
>> the field.
>>
>> When your targets are "far away" (like a surveillance camera
>> monitoring a parking lot), targets don't move from their
>> previous noted positions considerably from one frame to the
>> next.
>>
>> But, when the camera and targets are in close proximity,
>> there's greater (apparent) relative motion in the same
>> frame-interval.&nbsp; So, knowing where (x,y+WxH)) the portion of
>> the image of interest lay, previously, is less predictive
>> of where it may lie currently.
>>
>> Having the entire image available means the software
>> can look <wherever> and <whenever>.
> 
> Well yes, obviously so, but this is valid whatever the interface.
> Direct access to the sensor cells can't be double buffered so
> you will have to transfer anyway to get the frame you are analyzing
> static.

I would assume the devices would have evolved an "internal buffer"
(as I said, my experience with *DRAM* in this manner was 40+ years
ago)

> Perhaps you could find a way to make yourself some camera module
> using an existing one, MIPI or even USB, since you are looking for low
> overall cost; and add some MCU board to it to do the buffering and
> transfer areas on request. Or may be put enough CPU power together with
> each camera to do most if not all of the analysis... Depending on
> which achieves the lowest cost. But I can't say much on cost, that's
> pretty far from me (as you know).

My current approach gives me that -- MIPS, size, etc.  But, the cost
of transferring parts of the image (without adding a specific mechanism)
is a "shared page" (DSM).  So, host (on node A) references part of
node *B*s frame buffer and the page (on B) containing that memory
address gets shipped back to node A and mapped into A's memory.

An agency on A could "touch" a "pixel-per-block" and cause the entire
frame to be transferred to A, from B (or, I can treat the entire frame
as a coherent object and arrange for ALL of it to be transferred when
ANY of it is referenced).  Some process on B could alternate between
multiple such "memory objects" ("this one is complete, but I'm busy
filling this OTHER one with data from the camera interface")
to give me a *virtual* "camera memory device".

But, transport delays make this unsuitable for real-time work;
a megabyte of imagery would require 100ms to transfer, in "raw"
form.  (I could encode it on the originating host; transfer it
and then decode it on the receiving host -- at the expense of MIPS.
This is how I "record" video without saturating the network)

So, you (B) want to "abstract" the salient features of the image
while it is on B and then transfer just those to A.  *Use*
them, on A, and then move on to the next set of features
(that B has computed while A was busy chewing on the last set)

Or, give A direct access to the native data (without A having
to capture video streams from each of the cameras that it wants
to potentially examine)