> [Hope you are faring well... enjoying the COLD! ;) ]
Not very. Don't think I have your latest email.
On Fri, 30 Dec 2022 14:59:39 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:
>On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>>> So, my options are:
>>>>> - reduce the overall frame rate such that N cameras can
>>>>> be serviced by the USB (or whatever) interface *and*
>>>>> the processing load
>>>>> - reduce the resolution of the cameras (a special case of the above)
>>>>> - reduce the number of cameras "per processor" (again, above)
>>>>> - design a "camera memory" (frame grabber) that I can install
>>>>> multiply on a single host
>>>>> - develop distributed algorithms to allow more bandwidth to
>>>>> effectively be applied
>>>>
>>>> The fact that you are starting for the concept of using "USB Cameras" sort
>>>> of starts you with that sort of limit.
>>>>
>>>> My personal thought on your problem is you want to put a "cheap" processor
>>>> right on each camera using a processor with a direct camera interface to
>>>> pull in the image and do your processing and send the results over some
>>>> comm-link to the center core.
>>>
>>> If I went the frame-grabber approach, that would be how I would address the
>>> hardware. But, it doesn't scale well. I.e., at what point do you throw in
>>> the towel and say there are too many concurrent images in the scene to
>>> pile them all onto a single "host" processor?
>>
>> Thats why I didn't suggest that method. I was suggesting each camera has its
>> own tightly coupled processor that handles the need of THAT
>
>My existing "module" handles a single USB camera (with a fairly heavy-weight
>processor).
>
>But, being USB-based, there is no way to look at *part* of an image.
>And, I have to pay a relatively high cost (capturing the entire
>image from the serial stream) to look at *any* part of it.
>
>*If* a "camera memory" was available, I would site N of these
>in the (64b) address space of the host and let the host pick
>and choose which parts of which images it wanted to examine...
>without worrying about all of the bandwidth that would have been
>consumed deserializing those N images into that memory (which is
>a continuous process)
That's the way all cameras work - at least low level. The camera
captures a field (or a frame, depending) on its CCD, and then the CCD
pixel data is read out serially by a controller.
What you are looking for is some kind of local frame buffering at the
camera. There are some "smart" cameras that provide that ... and also
generally a bunch of image analysis functions that you may or may not
find useful. I haven't played with any of them in a long time, and
when I did the image functions were too primitive for my purpose, so I
really can't recommend anything.
>>> ISTM that the better solution is to develop algorithms that can
>>> process portions of the scene, concurrently, on different "hosts".
>>> Then, coordinate these "partial results" to form the desired result.
>>>
>>> I already have a "camera module" (host+USB camera) that has adequate
>>> processing power to handle a "single camera scene". But, these all
>>> assume the scene can be easily defined to fit in that camera's field
>>> of view. E.g., point a camera across the path of a garage door and have
>>> it "notice" any deviation from the "unobstructed" image.
>>
>> And if one camera can't fit the full scene, you use two cameras, each with
>> there own processor, and they each process their own image.
>
>That's the above approach, but...
>
>> The only problem is if your image processing algoritm need to compare parts of
>> the images between the two cameras, which seems unlikely.
>
>Consider watching a single room (e.g., a lobby at a business) and
>tracking the movements of "visitors". It's unlikely that an individual's
>movements would always be constrained to a single camera field. There will
>be times when he/she is "half-in" a field (and possibly NOT in the other,
>HALF in the other or ENTIRELY in the other). You can't ignore cases where
>the entire object (or, your notion of what that object's characteristics
>might be) is not entirely in the field as that leaves a vulnerability.
I've done simple cases following objects from one camera to another,
but not dealing with different angles/points of view - the cameras had
contiguous views with a bit of overlap. That made it relatively easy.
Following a person, e.g., seen quarter-behind in one camera, and
tracking them to another camera that sees a side view - from the
/other/ side -
Just following a person is easy, but tracking a specific person,
particularly when multiple people are present, gets very complicated
very quickly.
>For example, I watch our garage door with *four* cameras. A camera is
>positioned on each side ("door jam"?) of the door "looking at" the other
>camera. This because a camera can't likely see the full height of the door
>opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side"
>and I'll watch *its* side!).
>
>[The other two cameras are similarly positioned on the overhead *track*
>onto which the door rolls, when open]
>
>An object in (or near) the doorway can be visible in one (either) or
>both cameras, depending on where it is located. Additionally, one of
>those manifestations may be only "partial" as regards to where it is
>located and intersects the cameras' fields of view.
>
>The "cost" of watching the door is only the cost of the actual *cameras*.
>The cost of the compute resources is amortized over the rest of the system
>as those can be used for other, non-camera, non-garage related activities.
>
>> It does say that if trying to track something across the cameras, you need
>> enough overlap to allow them to hand off the object when it is in the overlap.
>
>And, objects that consume large portions of a camera's field of view
>require similar handling (unless you can always guarantee that cameras
>and targets are "far apart")
>
>>> When the scene gets too large to represent in enough detail in a single
>>> camera's field of view, then there needs to be a way to coordinate
>>> multiple cameras to a single (virtual?) host. If those cameras were just
>>> "chunks of memory", then the *imagery* would be easy to examine in a single
>>> host -- though the processing power *might* need to increase geometrically
>>> (depending on your current goal)
>>
>> Yes, but your "chunks of memory" model just doesn't exist as a viable camera
>> model.
>
>Apparently not -- in the COTS sense. But, that doesn't mean I can't
>build a "camera memory emulator".
>
>The downside is that this increases the cost of the "actual camera"
>(see my above comment wrt ammortization).
>
>And, it just moves the point at which a single host (of fixed capabilities)
>can no longer handle the scene's complexity. (when you have 10 cameras?)
>
>> The CMOS cameras with addressable pixels have "access times" significantly
>> lower than your typical memory (and is read once) so doesn't really meet that
>> model. Some of them do allow for sending multiple small regions of intererst
>> and down loading just those regions, but this then starts to require moderate
>> processor overhead to be loading all these regions and updating the grabber to
>> put them where you want.
>
>You would, instead, let the "camera memory emulator" capture the entire
>image from the camera and place the entire image in a contiguous
>region of memory (from the perspective of the host). The cost of capturing
>the portions that are not used is hidden *in* the cost of the "emulator".
>
>> And yes, it does mean that there might be some cases where you need a core
>> module that has TWO cameras connected to a single processor, either to get a
>> wider field of view, or to combine two different types of camera (maybe a high
>> res black and white to a low res color if you need just minor color
>> information, or combine a visible camera to a thermal camera). These just
>> become another tool in your tool box.
>
>I *think* (uncharted territory) that the better investment is to develop
>algorithms that let me distribute the processing among multiple
>(single) "camera modules/nodes". How would your "two camera" exemplar
>address an application requiring *three* cameras? etc.
>
>I can, currently, distribute this processing by treating the
>region of memory into which a (local) camera's imagery is
>deserialized as a "memory object" and then exporting *access*
>to that object to other similar "camera modules/nodes".
>
>But, the access times of non-local memory are horrendous, given
>that the contents are ephemeral (if accesses could be *cached*
>on each host needing them, then these costs diminish).
>
>So, I need to come up with algorithms that let me export abstractions
>instead of raw data.
>
>>> Moving the processing to "host per camera" implementation gives you more
>>> MIPS. But, makes coordinating partial results tedious.
>>
>> Depends on what sort of partial results you are looking at.
>
>"Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible"
>
>"Ah! I was wondering whose legs those were in *my* image!"
>
>>>> It is unclear what you actual image requirements per camera are, so it is
>>>> hard to say what level camera and processor you will need.
>>>>
>>>> My first feeling is you seem to be assuming a fairly cheep camera and then
>>>> doing some fairly simple processing over the partial image, in which case
>>>> you might even be able to live with a camera that uses a crude SPI interface
>>>> to bring the frame in, and a very simple processor.
>>>
>>> I use A LOT of cameras. But, I should be able to swap the camera
>>> (upgrade/downgrade) and still rely on the same *local* compute engine.
>>> E.g., some of my cameras have Ir illuminators; it's not important
>>> in others; some are PTZ; others fixed.
>>
>> Doesn't sound reasonable. If you downgrade a camera, you can't count on it
>> being able to meet the same requirements, or you over speced the initial camera.
>
>Sorry, I was using up/down relative to "nominal camera", not "specific camera
>previously selected for application". I'd 8really* like to just have a
>single "camera module" (module = CPU+I/O) instead of one for camera type A
>and another for camera type B, etc.
>
>> You put on a camera a processor capable of handling the tasks you expect out of
>> that set of hardware. One type of processor likely can handle a variaty of
>> different camera setup with
>
>Exactly. If a particular instance has an Ir illuminator, then you include
>controls for that in *the* "camera module". If another instance doesn't have
>this ability, then those controls go unused.
>
>>> Watching for an obstruction in the path of a garage door (open/close)
>>> has different requirements than trying to recognize a visitor at the front
>>> door. Or, identify the locations of the occupants of a facility.
>>
>> Yes, so you don't want to "Pay" for the capability to recognize a visitor in
>> your garage door sensor, so you use different levels of sensor/processor.
>
>Exactly. But, the algorithms that do the scene analysis can be the same;
>you just parameterize the image and the objects within it that you seek.
>
>There will likely be some combinations that exceed the capabilities of
>the hardware to process in real-time. So, you fall back to lower
>frame rates or let the algorithms drop targets ("You watch Bob, I'll
>watch Tom!")
>