EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Camera interfaces

Started by Don Y December 29, 2022
On 12/30/22 2:27 AM, Don Y wrote:
> Hi George! > > [Hope you are faring well... enjoying the COLD!  ;) ] > > On 12/29/2022 10:29 PM, George Neuner wrote: >>>>> But, most cameras seem to have (bit- or word-) serial interfaces >>>>> nowadays.  Are there any (mainstream/high volume) devices that >>>>> "look" like a chunk of memory, in their native form? > >>> I built my prototypes (proof-of-principle) using COTS USB cameras. >>> But, getting the data out of the serial data stream and into RAM so >>> it can be analyzed consumes memory bandwidth. >>> >>> I'm currently trying to sort out an approximate cost factor "per >>> camera" (per video stream) and looking for ways that I can cut costs >>> (memory bandwidth requirements) to allow greater numbers of >>> cameras or higher frame rates. >> >> You aren't going to find anything low cost ... if you want bandwidth >> for multiple cameras, you need to look into bus based frame grabbers. >> They still exist, but are (relatively) expensive and getting harder to >> find. > > So, my options are: > - reduce the overall frame rate such that N cameras can >   be serviced by the USB (or whatever) interface *and* >   the processing load > - reduce the resolution of the cameras (a special case of the above) > - reduce the number of cameras "per processor" (again, above) > - design a "camera memory" (frame grabber) that I can install >   multiply on a single host > - develop distributed algorithms to allow more bandwidth to >   effectively be applied >
The fact that you are starting for the concept of using "USB Cameras" sort of starts you with that sort of limit. My personal thought on your problem is you want to put a "cheap" processor right on each camera using a processor with a direct camera interface to pull in the image and do your processing and send the results over some comm-link to the center core. It is unclear what you actual image requirements per camera are, so it is hard to say what level camera and processor you will need. My first feeling is you seem to be assuming a fairly cheep camera and then doing some fairly simple processing over the partial image, in which case you might even be able to live with a camera that uses a crude SPI interface to bring the frame in, and a very simple processor.
On 12/30/2022 9:24 AM, Richard Damon wrote:
> On 12/30/22 2:27 AM, Don Y wrote: >> Hi George! >> >> [Hope you are faring well... enjoying the COLD!  ;) ] >> >> On 12/29/2022 10:29 PM, George Neuner wrote: >>>>>> But, most cameras seem to have (bit- or word-) serial interfaces >>>>>> nowadays.  Are there any (mainstream/high volume) devices that >>>>>> "look" like a chunk of memory, in their native form? >> >>>> I built my prototypes (proof-of-principle) using COTS USB cameras. >>>> But, getting the data out of the serial data stream and into RAM so >>>> it can be analyzed consumes memory bandwidth. >>>> >>>> I'm currently trying to sort out an approximate cost factor "per >>>> camera" (per video stream) and looking for ways that I can cut costs >>>> (memory bandwidth requirements) to allow greater numbers of >>>> cameras or higher frame rates. >>> >>> You aren't going to find anything low cost ... if you want bandwidth >>> for multiple cameras, you need to look into bus based frame grabbers. >>> They still exist, but are (relatively) expensive and getting harder to >>> find. >> >> So, my options are: >> - reduce the overall frame rate such that N cameras can >>    be serviced by the USB (or whatever) interface *and* >>    the processing load >> - reduce the resolution of the cameras (a special case of the above) >> - reduce the number of cameras "per processor" (again, above) >> - design a "camera memory" (frame grabber) that I can install >>    multiply on a single host >> - develop distributed algorithms to allow more bandwidth to >>    effectively be applied > > The fact that you are starting for the concept of using "USB Cameras" sort of > starts you with that sort of limit. > > My personal thought on your problem is you want to put a "cheap" processor > right on each camera using a processor with a direct camera interface to pull > in the image and do your processing and send the results over some comm-link to > the center core.
If I went the frame-grabber approach, that would be how I would address the hardware. But, it doesn't scale well. I.e., at what point do you throw in the towel and say there are too many concurrent images in the scene to pile them all onto a single "host" processor? ISTM that the better solution is to develop algorithms that can process portions of the scene, concurrently, on different "hosts". Then, coordinate these "partial results" to form the desired result. I already have a "camera module" (host+USB camera) that has adequate processing power to handle a "single camera scene". But, these all assume the scene can be easily defined to fit in that camera's field of view. E.g., point a camera across the path of a garage door and have it "notice" any deviation from the "unobstructed" image. When the scene gets too large to represent in enough detail in a single camera's field of view, then there needs to be a way to coordinate multiple cameras to a single (virtual?) host. If those cameras were just "chunks of memory", then the *imagery* would be easy to examine in a single host -- though the processing power *might* need to increase geometrically (depending on your current goal) Moving the processing to "host per camera" implementation gives you more MIPS. But, makes coordinating partial results tedious.
> It is unclear what you actual image requirements per camera are, so it is hard > to say what level camera and processor you will need. > > My first feeling is you seem to be assuming a fairly cheep camera and then > doing some fairly simple processing over the partial image, in which case you > might even be able to live with a camera that uses a crude SPI interface to > bring the frame in, and a very simple processor.
I use A LOT of cameras. But, I should be able to swap the camera (upgrade/downgrade) and still rely on the same *local* compute engine. E.g., some of my cameras have Ir illuminators; it's not important in others; some are PTZ; others fixed. Watching for an obstruction in the path of a garage door (open/close) has different requirements than trying to recognize a visitor at the front door. Or, identify the locations of the occupants of a facility.
On 12/30/22 12:04 PM, Don Y wrote:
> On 12/30/2022 9:24 AM, Richard Damon wrote: >> On 12/30/22 2:27 AM, Don Y wrote: >>> Hi George! >>> >>> [Hope you are faring well... enjoying the COLD!  ;) ] >>> >>> On 12/29/2022 10:29 PM, George Neuner wrote: >>>>>>> But, most cameras seem to have (bit- or word-) serial interfaces >>>>>>> nowadays.  Are there any (mainstream/high volume) devices that >>>>>>> "look" like a chunk of memory, in their native form? >>> >>>>> I built my prototypes (proof-of-principle) using COTS USB cameras. >>>>> But, getting the data out of the serial data stream and into RAM so >>>>> it can be analyzed consumes memory bandwidth. >>>>> >>>>> I'm currently trying to sort out an approximate cost factor "per >>>>> camera" (per video stream) and looking for ways that I can cut costs >>>>> (memory bandwidth requirements) to allow greater numbers of >>>>> cameras or higher frame rates. >>>> >>>> You aren't going to find anything low cost ... if you want bandwidth >>>> for multiple cameras, you need to look into bus based frame grabbers. >>>> They still exist, but are (relatively) expensive and getting harder to >>>> find. >>> >>> So, my options are: >>> - reduce the overall frame rate such that N cameras can >>>    be serviced by the USB (or whatever) interface *and* >>>    the processing load >>> - reduce the resolution of the cameras (a special case of the above) >>> - reduce the number of cameras "per processor" (again, above) >>> - design a "camera memory" (frame grabber) that I can install >>>    multiply on a single host >>> - develop distributed algorithms to allow more bandwidth to >>>    effectively be applied >> >> The fact that you are starting for the concept of using "USB Cameras" >> sort of starts you with that sort of limit. >> >> My personal thought on your problem is you want to put a "cheap" >> processor right on each camera using a processor with a direct camera >> interface to pull in the image and do your processing and send the >> results over some comm-link to the center core. > > If I went the frame-grabber approach, that would be how I would address the > hardware.  But, it doesn't scale well.  I.e., at what point do you throw in > the towel and say there are too many concurrent images in the scene to > pile them all onto a single "host" processor?
Thats why I didn't suggest that method. I was suggesting each camera has its own tightly coupled processor that handles the need of THAT
> > ISTM that the better solution is to develop algorithms that can > process portions of the scene, concurrently, on different "hosts". > Then, coordinate these "partial results" to form the desired result. > > I already have a "camera module" (host+USB camera) that has adequate > processing power to handle a "single camera scene".  But, these all > assume the scene can be easily defined to fit in that camera's field > of view.  E.g., point a camera across the path of a garage door and have > it "notice" any deviation from the "unobstructed" image.
And if one camera can't fit the full scene, you use two cameras, each with there own processor, and they each process their own image. The only problem is if your image processing algoritm need to compare parts of the images between the two cameras, which seems unlikely. It does say that if trying to track something across the cameras, you need enough overlap to allow them to hand off the object when it is in the overlap.
> > When the scene gets too large to represent in enough detail in a single > camera's field of view, then there needs to be a way to coordinate > multiple cameras to a single (virtual?) host.  If those cameras were just > "chunks of memory", then the *imagery* would be easy to examine in a single > host -- though the processing power *might* need to increase geometrically > (depending on your current goal)
Yes, but your "chunks of memory" model just doesn't exist as a viable camera model. The CMOS cameras with addressable pixels have "access times" significantly lower than your typical memory (and is read once) so doesn't really meet that model. Some of them do allow for sending multiple small regions of intererst and down loading just those regions, but this then starts to require moderate processor overhead to be loading all these regions and updating the grabber to put them where you want. And yes, it does mean that there might be some cases where you need a core module that has TWO cameras connected to a single processor, either to get a wider field of view, or to combine two different types of camera (maybe a high res black and white to a low res color if you need just minor color information, or combine a visible camera to a thermal camera). These just become another tool in your tool box.
> > Moving the processing to "host per camera" implementation gives you more > MIPS.  But, makes coordinating partial results tedious.
Depends on what sort of partial results you are looking at.
> >> It is unclear what you actual image requirements per camera are, so it >> is hard to say what level camera and processor you will need. >> >> My first feeling is you seem to be assuming a fairly cheep camera and >> then doing some fairly simple processing over the partial image, in >> which case you might even be able to live with a camera that uses a >> crude SPI interface to bring the frame in, and a very simple processor. > > I use A LOT of cameras.  But, I should be able to swap the camera > (upgrade/downgrade) and still rely on the same *local* compute engine. > E.g., some of my cameras have Ir illuminators; it's not important > in others; some are PTZ; others fixed.
Doesn't sound reasonable. If you downgrade a camera, you can't count on it being able to meet the same requirements, or you over speced the initial camera. You put on a camera a processor capable of handling the tasks you expect out of that set of hardware. One type of processor likely can handle a variaty of different camera setup with
> > Watching for an obstruction in the path of a garage door (open/close) > has different requirements than trying to recognize a visitor at the front > door.  Or, identify the locations of the occupants of a facility. >
Yes, so you don't want to "Pay" for the capability to recognize a visitor in your garage door sensor, so you use different levels of sensor/processor.
On 12/30/2022 11:02 AM, Richard Damon wrote:
>>>> So, my options are: >>>> - reduce the overall frame rate such that N cameras can >>>>    be serviced by the USB (or whatever) interface *and* >>>>    the processing load >>>> - reduce the resolution of the cameras (a special case of the above) >>>> - reduce the number of cameras "per processor" (again, above) >>>> - design a "camera memory" (frame grabber) that I can install >>>>    multiply on a single host >>>> - develop distributed algorithms to allow more bandwidth to >>>>    effectively be applied >>> >>> The fact that you are starting for the concept of using "USB Cameras" sort >>> of starts you with that sort of limit. >>> >>> My personal thought on your problem is you want to put a "cheap" processor >>> right on each camera using a processor with a direct camera interface to >>> pull in the image and do your processing and send the results over some >>> comm-link to the center core. >> >> If I went the frame-grabber approach, that would be how I would address the >> hardware.  But, it doesn't scale well.  I.e., at what point do you throw in >> the towel and say there are too many concurrent images in the scene to >> pile them all onto a single "host" processor? > > Thats why I didn't suggest that method. I was suggesting each camera has its > own tightly coupled processor that handles the need of THAT
My existing "module" handles a single USB camera (with a fairly heavy-weight processor). But, being USB-based, there is no way to look at *part* of an image. And, I have to pay a relatively high cost (capturing the entire image from the serial stream) to look at *any* part of it. *If* a "camera memory" was available, I would site N of these in the (64b) address space of the host and let the host pick and choose which parts of which images it wanted to examine... without worrying about all of the bandwidth that would have been consumed deserializing those N images into that memory (which is a continuous process)
>> ISTM that the better solution is to develop algorithms that can >> process portions of the scene, concurrently, on different "hosts". >> Then, coordinate these "partial results" to form the desired result. >> >> I already have a "camera module" (host+USB camera) that has adequate >> processing power to handle a "single camera scene".  But, these all >> assume the scene can be easily defined to fit in that camera's field >> of view.  E.g., point a camera across the path of a garage door and have >> it "notice" any deviation from the "unobstructed" image. > > And if one camera can't fit the full scene, you use two cameras, each with > there own processor, and they each process their own image.
That's the above approach, but...
> The only problem is if your image processing algoritm need to compare parts of > the images between the two cameras, which seems unlikely.
Consider watching a single room (e.g., a lobby at a business) and tracking the movements of "visitors". It's unlikely that an individual's movements would always be constrained to a single camera field. There will be times when he/she is "half-in" a field (and possibly NOT in the other, HALF in the other or ENTIRELY in the other). You can't ignore cases where the entire object (or, your notion of what that object's characteristics might be) is not entirely in the field as that leaves a vulnerability. For example, I watch our garage door with *four* cameras. A camera is positioned on each side ("door jam"?) of the door "looking at" the other camera. This because a camera can't likely see the full height of the door opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" and I'll watch *its* side!). [The other two cameras are similarly positioned on the overhead *track* onto which the door rolls, when open] An object in (or near) the doorway can be visible in one (either) or both cameras, depending on where it is located. Additionally, one of those manifestations may be only "partial" as regards to where it is located and intersects the cameras' fields of view. The "cost" of watching the door is only the cost of the actual *cameras*. The cost of the compute resources is amortized over the rest of the system as those can be used for other, non-camera, non-garage related activities.
> It does say that if trying to track something across the cameras, you need > enough overlap to allow them to hand off the object when it is in the overlap.
And, objects that consume large portions of a camera's field of view require similar handling (unless you can always guarantee that cameras and targets are "far apart")
>> When the scene gets too large to represent in enough detail in a single >> camera's field of view, then there needs to be a way to coordinate >> multiple cameras to a single (virtual?) host.  If those cameras were just >> "chunks of memory", then the *imagery* would be easy to examine in a single >> host -- though the processing power *might* need to increase geometrically >> (depending on your current goal) > > Yes, but your "chunks of memory" model just doesn't exist as a viable camera > model.
Apparently not -- in the COTS sense. But, that doesn't mean I can't build a "camera memory emulator". The downside is that this increases the cost of the "actual camera" (see my above comment wrt ammortization). And, it just moves the point at which a single host (of fixed capabilities) can no longer handle the scene's complexity. (when you have 10 cameras?)
> The CMOS cameras with addressable pixels have "access times" significantly > lower than your typical memory (and is read once) so doesn't really meet that > model. Some of them do allow for sending multiple small regions of intererst > and down loading just those regions, but this then starts to require moderate > processor overhead to be loading all these regions and updating the grabber to > put them where you want.
You would, instead, let the "camera memory emulator" capture the entire image from the camera and place the entire image in a contiguous region of memory (from the perspective of the host). The cost of capturing the portions that are not used is hidden *in* the cost of the "emulator".
> And yes, it does mean that there might be some cases where you need a core > module that has TWO cameras connected to a single processor, either to get a > wider field of view, or to combine two different types of camera (maybe a high > res black and white to a low res color if you need just minor color > information, or combine a visible camera to a thermal camera). These just > become another tool in your tool box.
I *think* (uncharted territory) that the better investment is to develop algorithms that let me distribute the processing among multiple (single) "camera modules/nodes". How would your "two camera" exemplar address an application requiring *three* cameras? etc. I can, currently, distribute this processing by treating the region of memory into which a (local) camera's imagery is deserialized as a "memory object" and then exporting *access* to that object to other similar "camera modules/nodes". But, the access times of non-local memory are horrendous, given that the contents are ephemeral (if accesses could be *cached* on each host needing them, then these costs diminish). So, I need to come up with algorithms that let me export abstractions instead of raw data.
>> Moving the processing to "host per camera" implementation gives you more >> MIPS.  But, makes coordinating partial results tedious. > > Depends on what sort of partial results you are looking at.
"Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible" "Ah! I was wondering whose legs those were in *my* image!"
>>> It is unclear what you actual image requirements per camera are, so it is >>> hard to say what level camera and processor you will need. >>> >>> My first feeling is you seem to be assuming a fairly cheep camera and then >>> doing some fairly simple processing over the partial image, in which case >>> you might even be able to live with a camera that uses a crude SPI interface >>> to bring the frame in, and a very simple processor. >> >> I use A LOT of cameras.  But, I should be able to swap the camera >> (upgrade/downgrade) and still rely on the same *local* compute engine. >> E.g., some of my cameras have Ir illuminators; it's not important >> in others; some are PTZ; others fixed. > > Doesn't sound reasonable. If you downgrade a camera, you can't count on it > being able to meet the same requirements, or you over speced the initial camera.
Sorry, I was using up/down relative to "nominal camera", not "specific camera previously selected for application". I'd 8really* like to just have a single "camera module" (module = CPU+I/O) instead of one for camera type A and another for camera type B, etc.
> You put on a camera a processor capable of handling the tasks you expect out of > that set of hardware.  One type of processor likely can handle a variaty of > different camera setup with
Exactly. If a particular instance has an Ir illuminator, then you include controls for that in *the* "camera module". If another instance doesn't have this ability, then those controls go unused.
>> Watching for an obstruction in the path of a garage door (open/close) >> has different requirements than trying to recognize a visitor at the front >> door.  Or, identify the locations of the occupants of a facility. > > Yes, so you don't want to "Pay" for the capability to recognize a visitor in > your garage door sensor, so you use different levels of sensor/processor.
Exactly. But, the algorithms that do the scene analysis can be the same; you just parameterize the image and the objects within it that you seek. There will likely be some combinations that exceed the capabilities of the hardware to process in real-time. So, you fall back to lower frame rates or let the algorithms drop targets ("You watch Bob, I'll watch Tom!")
On 12/30/22 4:59 PM, Don Y wrote:
> On 12/30/2022 11:02 AM, Richard Damon wrote: >>>>> So, my options are: >>>>> - reduce the overall frame rate such that N cameras can >>>>>    be serviced by the USB (or whatever) interface *and* >>>>>    the processing load >>>>> - reduce the resolution of the cameras (a special case of the above) >>>>> - reduce the number of cameras "per processor" (again, above) >>>>> - design a "camera memory" (frame grabber) that I can install >>>>>    multiply on a single host >>>>> - develop distributed algorithms to allow more bandwidth to >>>>>    effectively be applied >>>> >>>> The fact that you are starting for the concept of using "USB >>>> Cameras" sort of starts you with that sort of limit. >>>> >>>> My personal thought on your problem is you want to put a "cheap" >>>> processor right on each camera using a processor with a direct >>>> camera interface to pull in the image and do your processing and >>>> send the results over some comm-link to the center core. >>> >>> If I went the frame-grabber approach, that would be how I would >>> address the >>> hardware.  But, it doesn't scale well.  I.e., at what point do you >>> throw in >>> the towel and say there are too many concurrent images in the scene to >>> pile them all onto a single "host" processor? >> >> Thats why I didn't suggest that method. I was suggesting each camera >> has its own tightly coupled processor that handles the need of THAT > > My existing "module" handles a single USB camera (with a fairly > heavy-weight > processor). > > But, being USB-based, there is no way to look at *part* of an image. > And, I have to pay a relatively high cost (capturing the entire > image from the serial stream) to look at *any* part of it.
Yep, having chosen USB as your interface, you have limited yourself. Since you say you have a fairly heavy-weight processor, that frame grab likely isn't you limiting factor.
> > *If* a "camera memory" was available, I would site N of these > in the (64b) address space of the host and let the host pick > and choose which parts of which images it wanted to examine... > without worrying about all of the bandwidth that would have been > consumed deserializing those N images into that memory (which is > a continuous process)
But such a camera would almost certainly be designed for the processor to be on the same board as the camera, (or be VERY slow in access), so much less apt allow you to add multiple cameras to one processor.
> >>> ISTM that the better solution is to develop algorithms that can >>> process portions of the scene, concurrently, on different "hosts". >>> Then, coordinate these "partial results" to form the desired result. >>> >>> I already have a "camera module" (host+USB camera) that has adequate >>> processing power to handle a "single camera scene".  But, these all >>> assume the scene can be easily defined to fit in that camera's field >>> of view.  E.g., point a camera across the path of a garage door and have >>> it "notice" any deviation from the "unobstructed" image. >> >> And if one camera can't fit the full scene, you use two cameras, each >> with there own processor, and they each process their own image. > > That's the above approach, but... > >> The only problem is if your image processing algoritm need to compare >> parts of the images between the two cameras, which seems unlikely. > > Consider watching a single room (e.g., a lobby at a business) and > tracking the movements of "visitors".  It's unlikely that an individual's > movements would always be constrained to a single camera field.  There will > be times when he/she is "half-in" a field (and possibly NOT in the other, > HALF in the other or ENTIRELY in the other).  You can't ignore cases where > the entire object (or, your notion of what that object's characteristics > might be) is not entirely in the field as that leaves a vulnerability.
Sounds like you aren't overlapping your cameras enough or have insufficent coverage. Maybe your problem is wrong field of view for your lens. Maybe you need fewer but better cameras with wider fields of view. This might be due to try to use "stock" inexpensive USB cameras.
> > For example, I watch our garage door with *four* cameras.  A camera is > positioned on each side ("door jam"?) of the door "looking at" the other > camera.  This because a camera can't likely see the full height of the door > opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" > and I'll watch *its* side!).
Right, and if ANY see a problem, you stop. So no need for inter-camera coordination.
> > [The other two cameras are similarly positioned on the overhead *track* > onto which the door rolls, when open] > > An object in (or near) the doorway can be visible in one (either) or > both cameras, depending on where it is located.  Additionally, one of > those manifestations may be only "partial" as regards to where it is > located and intersects the cameras' fields of view.
But since you aren't trying to ID, only Detect, there still isn't a need for camera-camera processing, just camera-door controller
> > The "cost" of watching the door is only the cost of the actual *cameras*. > The cost of the compute resources is amortized over the rest of the system > as those can be used for other, non-camera, non-garage related activities. > >> It does say that if trying to track something across the cameras, you >> need enough overlap to allow them to hand off the object when it is in >> the overlap. > > And, objects that consume large portions of a camera's field of view > require similar handling (unless you can always guarantee that cameras > and targets are "far apart") > >>> When the scene gets too large to represent in enough detail in a single >>> camera's field of view, then there needs to be a way to coordinate >>> multiple cameras to a single (virtual?) host.  If those cameras were >>> just >>> "chunks of memory", then the *imagery* would be easy to examine in a >>> single >>> host -- though the processing power *might* need to increase >>> geometrically >>> (depending on your current goal) >> >> Yes, but your "chunks of memory" model just doesn't exist as a viable >> camera model. > > Apparently not -- in the COTS sense.  But, that doesn't mean I can't > build a "camera memory emulator". > > The downside is that this increases the cost of the "actual camera" > (see my above comment wrt ammortization).
Yep, implementing this likely costs more than giving the camera a dedicated moderate processor to do the major work. Might not handle the actual ID problem of your Door bell, but could likely process the live video, take a snapshot of a region with a good view of the vistor coming, and send just that to your master system for ID.
> > And, it just moves the point at which a single host (of fixed capabilities) > can no longer handle the scene's complexity.  (when you have 10 cameras?) > >> The CMOS cameras with addressable pixels have "access times" >> significantly lower than your typical memory (and is read once) so >> doesn't really meet that model. Some of them do allow for sending >> multiple small regions of intererst and down loading just those >> regions, but this then starts to require moderate processor overhead >> to be loading all these regions and updating the grabber to put them >> where you want. > > You would, instead, let the "camera memory emulator" capture the entire > image from the camera and place the entire image in a contiguous > region of memory (from the perspective of the host).  The cost of capturing > the portions that are not used is hidden *in* the cost of the "emulator".
Yep, you could build you system with a two-port memory buffer between the frane grabber loading with one port, and the decoding processor on the other. The most cost effective way to do this is likely a commercial frame-grabber with built "two-port" memory, that sits in a slot of a PC type computer. These would likely not work with a "USB Camera" (why would you need a frame grabber with a camera that has it built in) so would be totally changing your cost models. IF your current design method is based on using USB cameras, trying to do a full custom interface may be out of your field of operation.
> >> And yes, it does mean that there might be some cases where you need a >> core module that has TWO cameras connected to a single processor, >> either to get a wider field of view, or to combine two different types >> of camera (maybe a high res black and white to a low res color if you >> need just minor color information, or combine a visible camera to a >> thermal camera). These just become another tool in your tool box. > > I *think* (uncharted territory) that the better investment is to develop > algorithms that let me distribute the processing among multiple > (single) "camera modules/nodes".  How would your "two camera" exemplar > address an application requiring *three* cameras?  etc.
The first question comes, what processing are you thinking of that needs images from 3 cameras. Note, my two camera example was a case where the processing needed to be done did need data from two cameras. If you have another task that needs a different camera, you just build a system with one two camera model and one 1 camera module, relaying back to a central control, or you nominate one of the modules to be central control if the load there is light enough. Your garage doer example would be built from 4 seperate and independent 1 camera modules, either going to one as the master, or to a 5th module acting as the master. The cases I can think of for needing to process three cameras together would be: 1) a system stiching images from 3 cameras and generating a single image out of it, but that totally breaks your concept of needing only bits of the images, that inherently is using most of each camera, and doing some stiching processing on the overlaps. 2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from the three cameras and producing a merged "false-color" image from them. Again, this also breaks you partial image model.
> > I can, currently, distribute this processing by treating the > region of memory into which a (local) camera's imagery is > deserialized as a "memory object" and then exporting *access* > to that object to other similar "camera modules/nodes". > > But, the access times of non-local memory are horrendous, given > that the contents are ephemeral (if accesses could be *cached* > on each host needing them, then these costs diminish). > > So, I need to come up with algorithms that let me export abstractions > instead of raw data.
Sounds like you current design is very centralized. This limits its scalability,
> >>> Moving the processing to "host per camera" implementation gives you more >>> MIPS.  But, makes coordinating partial results tedious. >> >> Depends on what sort of partial results you are looking at. > > "Bob's *head* is at X,Y+H,W in my image -- but, his body is not visible" > > "Ah!  I was wondering whose legs those were in *my* image!" > >>>> It is unclear what you actual image requirements per camera are, so >>>> it is hard to say what level camera and processor you will need. >>>> >>>> My first feeling is you seem to be assuming a fairly cheep camera >>>> and then doing some fairly simple processing over the partial image, >>>> in which case you might even be able to live with a camera that uses >>>> a crude SPI interface to bring the frame in, and a very simple >>>> processor. >>> >>> I use A LOT of cameras.  But, I should be able to swap the camera >>> (upgrade/downgrade) and still rely on the same *local* compute engine. >>> E.g., some of my cameras have Ir illuminators; it's not important >>> in others; some are PTZ; others fixed. >> >> Doesn't sound reasonable. If you downgrade a camera, you can't count >> on it being able to meet the same requirements, or you over speced the >> initial camera. > > Sorry, I was using up/down relative to "nominal camera", not "specific > camera > previously selected for application".  I'd 8really* like to just have a > single "camera module" (module = CPU+I/O) instead of one for camera type A > and another for camera type B, etc. >
That only works if you are willing to spend for the sports car, even if you just need it to go around the block. It depends a bit on how much span you need of capability. A $10 camera is likely having a very different interface to a $30,000 camera, so will need a different board. Some boards might handle multiple camera interface types if it doesn't add a lot to the board, but you are apt to find that you need to make some choice. Then some tasks will just need a lot more computer power than others. Yes, you can just put too much computer power on the simple tasks, (and that might make sense to early design the higher end processor), but ultimately you are going to want the less expensive lower end processors.
>> You put on a camera a processor capable of handling the tasks you >> expect out of that set of hardware.  One type of processor likely can >> handle a variaty of different camera setup with > > Exactly.  If a particular instance has an Ir illuminator, then you include > controls for that in *the* "camera module".  If another instance doesn't > have > this ability, then those controls go unused.
Yes, Auxilary functionality is often cheap to include the hooks for.
> >>> Watching for an obstruction in the path of a garage door (open/close) >>> has different requirements than trying to recognize a visitor at the >>> front >>> door.  Or, identify the locations of the occupants of a facility. >> >> Yes, so you don't want to "Pay" for the capability to recognize a >> visitor in your garage door sensor, so you use different levels of >> sensor/processor. > > Exactly.  But, the algorithms that do the scene analysis can be the same; > you just parameterize the image and the objects within it that you seek.
Actually, "Tracking" can be a very different type of algorithm then "Detecting". You might be able to use a Tracking base algorithm to Detect, but likely a much simpler algorithm can be used (needing less resources) to just detect.
> > There will likely be some combinations that exceed the capabilities of > the hardware to process in real-time.  So, you fall back to lower > frame rates or let the algorithms drop targets ("You watch Bob, I'll > watch Tom!") > >
On 12/30/2022 10:39 PM, Richard Damon wrote:
> On 12/30/22 4:59 PM, Don Y wrote: >> On 12/30/2022 11:02 AM, Richard Damon wrote: >>>>>> So, my options are: >>>>>> - reduce the overall frame rate such that N cameras can >>>>>>    be serviced by the USB (or whatever) interface *and* >>>>>>    the processing load >>>>>> - reduce the resolution of the cameras (a special case of the above) >>>>>> - reduce the number of cameras "per processor" (again, above) >>>>>> - design a "camera memory" (frame grabber) that I can install >>>>>>    multiply on a single host >>>>>> - develop distributed algorithms to allow more bandwidth to >>>>>>    effectively be applied >>>>> >>>>> The fact that you are starting for the concept of using "USB Cameras" sort >>>>> of starts you with that sort of limit. >>>>> >>>>> My personal thought on your problem is you want to put a "cheap" processor >>>>> right on each camera using a processor with a direct camera interface to >>>>> pull in the image and do your processing and send the results over some >>>>> comm-link to the center core. >>>> >>>> If I went the frame-grabber approach, that would be how I would address the >>>> hardware.  But, it doesn't scale well.  I.e., at what point do you throw in >>>> the towel and say there are too many concurrent images in the scene to >>>> pile them all onto a single "host" processor? >>> >>> Thats why I didn't suggest that method. I was suggesting each camera has its >>> own tightly coupled processor that handles the need of THAT >> >> My existing "module" handles a single USB camera (with a fairly heavy-weight >> processor). >> >> But, being USB-based, there is no way to look at *part* of an image. >> And, I have to pay a relatively high cost (capturing the entire >> image from the serial stream) to look at *any* part of it. > > Yep, having chosen USB as your interface, you have limited yourself.
Doesn't matter. Any serial interface poses the same problem; I can't examine the image until I can *look* at it.
> Since you say you have a fairly heavy-weight processor, that frame grab likely > isn't you limiting factor.
It becomes an issue when the number of cameras increases significantly on a single host. I have one scene that requires 11 cameras to capture, completely.
>> *If* a "camera memory" was available, I would site N of these >> in the (64b) address space of the host and let the host pick >> and choose which parts of which images it wanted to examine... >> without worrying about all of the bandwidth that would have been >> consumed deserializing those N images into that memory (which is >> a continuous process) > > But such a camera would almost certainly be designed for the processor to be on > the same board as the camera, (or be VERY slow in access), so much less apt > allow you to add multiple cameras to one processor.
Yes. But, if the module is small, then siting the assembly "someplace convenient" isn't a big issue. I.e., my modules are smaller than most webcams/dashcams.
>>>> ISTM that the better solution is to develop algorithms that can >>>> process portions of the scene, concurrently, on different "hosts". >>>> Then, coordinate these "partial results" to form the desired result. >>>> >>>> I already have a "camera module" (host+USB camera) that has adequate >>>> processing power to handle a "single camera scene".  But, these all >>>> assume the scene can be easily defined to fit in that camera's field >>>> of view.  E.g., point a camera across the path of a garage door and have >>>> it "notice" any deviation from the "unobstructed" image. >>> >>> And if one camera can't fit the full scene, you use two cameras, each with >>> there own processor, and they each process their own image. >> >> That's the above approach, but... >> >>> The only problem is if your image processing algoritm need to compare parts >>> of the images between the two cameras, which seems unlikely. >> >> Consider watching a single room (e.g., a lobby at a business) and >> tracking the movements of "visitors".  It's unlikely that an individual's >> movements would always be constrained to a single camera field.  There will >> be times when he/she is "half-in" a field (and possibly NOT in the other, >> HALF in the other or ENTIRELY in the other).  You can't ignore cases where >> the entire object (or, your notion of what that object's characteristics >> might be) is not entirely in the field as that leaves a vulnerability. > > Sounds like you aren't overlapping your cameras enough or have insufficent > coverage.  Maybe your problem is wrong field of view for your lens. Maybe you > need fewer but better cameras with wider fields of view.
Distance from camera to target means you have to play games with optics that can distort images. I also can't rely on "professional installers" *or* for the cameras to remain aimed in their original configurations.
> This might be due to try to use "stock" inexpensive USB cameras. > >> For example, I watch our garage door with *four* cameras.  A camera is >> positioned on each side ("door jam"?) of the door "looking at" the other >> camera.  This because a camera can't likely see the full height of the door >> opening ON ITS SIDE OF THE DOOR (so, the opposing camera watches "my side" >> and I'll watch *its* side!). > > Right, and if ANY see a problem, you stop. So no need for inter-camera > coordination.
But you don't know there is a problem until you can identify *where* the obstruction exists and if that poses a problem for the vehicle or the "obstructing item". Doing so requires knowing what the object likely is. E.g., SWMBO frequently stands in the doorway as I pull the car in or out (not enough room between vehicles *in* the garage to allow for ease of entry/egress). I'd not want this to be flagged as a problem (signalling an alert in the vehicle). Likewise, an obstruction on one vehicle-side of the garage shouldn't interfere with access to the other side.
>> [The other two cameras are similarly positioned on the overhead *track* >> onto which the door rolls, when open] >> >> An object in (or near) the doorway can be visible in one (either) or >> both cameras, depending on where it is located.  Additionally, one of >> those manifestations may be only "partial" as regards to where it is >> located and intersects the cameras' fields of view. > > But since you aren't trying to ID, only Detect, there still isn't a need for > camera-camera processing, just camera-door controller
The cameras need to coordinate to resolve the location of the object. A "toy wagon" would present differently, visually, than a tall person.
>>>> When the scene gets too large to represent in enough detail in a single >>>> camera's field of view, then there needs to be a way to coordinate >>>> multiple cameras to a single (virtual?) host.  If those cameras were just >>>> "chunks of memory", then the *imagery* would be easy to examine in a single >>>> host -- though the processing power *might* need to increase geometrically >>>> (depending on your current goal) >>> >>> Yes, but your "chunks of memory" model just doesn't exist as a viable camera >>> model. >> >> Apparently not -- in the COTS sense.  But, that doesn't mean I can't >> build a "camera memory emulator". >> >> The downside is that this increases the cost of the "actual camera" >> (see my above comment wrt ammortization). > > Yep, implementing this likely costs more than giving the camera a dedicated > moderate processor to do the major work. Might not handle the actual ID problem > of your Door bell, but could likely process the live video, take a snapshot of > a region with a good view of the vistor coming, and send just that to your > master system for ID.
But, then I could just use one of my existing "modules". If the target fits entirely within its field of view, then it has everything that it needs for the assigned functionality. If not, then it needs to consult with other cameras.
>>> The CMOS cameras with addressable pixels have "access times" significantly >>> lower than your typical memory (and is read once) so doesn't really meet >>> that model. Some of them do allow for sending multiple small regions of >>> intererst and down loading just those regions, but this then starts to >>> require moderate processor overhead to be loading all these regions and >>> updating the grabber to put them where you want. >> >> You would, instead, let the "camera memory emulator" capture the entire >> image from the camera and place the entire image in a contiguous >> region of memory (from the perspective of the host).  The cost of capturing >> the portions that are not used is hidden *in* the cost of the "emulator". > > Yep, you could build you system with a two-port memory buffer between the frane > grabber loading with one port, and the decoding processor on the other.
Yes. But large *true* dual-port memories are costly. Instead, you would emulate such a device either by time-division multiplexing a single physical memory *or* sharing alternate memories (fill one, view the other).
> The most cost effective way to do this is likely a commercial frame-grabber > with built "two-port" memory, that sits in a slot of a PC type computer. These > would likely not work with a "USB Camera" (why would you need a frame grabber > with a camera that has it built in) so would be totally changing your cost models.
Yes, I have a few of these intended for medical imaging apps. Way too big; way too expensive. Designed for the wrong type of "host"
> IF your current design method is based on using USB cameras, trying to do a > full custom interface may be out of your field of operation. > >>> And yes, it does mean that there might be some cases where you need a core >>> module that has TWO cameras connected to a single processor, either to get a >>> wider field of view, or to combine two different types of camera (maybe a >>> high res black and white to a low res color if you need just minor color >>> information, or combine a visible camera to a thermal camera). These just >>> become another tool in your tool box. >> >> I *think* (uncharted territory) that the better investment is to develop >> algorithms that let me distribute the processing among multiple >> (single) "camera modules/nodes".  How would your "two camera" exemplar >> address an application requiring *three* cameras?  etc. > > The first question comes, what processing are you thinking of that needs images > from 3 cameras. > > Note, my two camera example was a case where the processing needed to be done > did need data from two cameras. > > If you have another task that needs a different camera, you just build a system > with one two camera model and one 1 camera module, relaying back to a central > control, or you nominate one of the modules to be central control if the load > there is light enough. > > Your garage doer example would be built from 4 seperate and independent 1 > camera modules, either going to one as the master, or to a 5th module acting as > the master.
Yes, but they have to share image data (either raw or abstracted) to make deductions about the targets present.
> The cases I can think of for needing to process three cameras together would be: > > 1) a system stiching images from 3 cameras and generating a single image out of > it, but that totally breaks your concept of needing only bits of the images, > that inherently is using most of each camera, and doing some stiching > processing on the overlaps. > > 2) A Multi-spectrum system, where again, you are taking the ENTIRE scene from > the three cameras and producing a merged "false-color" image from them. Again, > this also breaks you partial image model.
Or, tracking multiple actors in an "arena" -- visitors in a business, occupants in a home, etc. In much the same way that the two garage door cameras conspire to locate the obstruction's position along the line from left doorjam to right, pairs of cameras can resolve a target in an arena and *sets* of cameras (freely paired, as needed) can track all locations (and targets) in the arena.
>> I can, currently, distribute this processing by treating the >> region of memory into which a (local) camera's imagery is >> deserialized as a "memory object" and then exporting *access* >> to that object to other similar "camera modules/nodes". >> >> But, the access times of non-local memory are horrendous, given >> that the contents are ephemeral (if accesses could be *cached* >> on each host needing them, then these costs diminish). >> >> So, I need to come up with algorithms that let me export abstractions >> instead of raw data. > > Sounds like you current design is very centralized. This limits its scalability,
The current design is completely distributed. The only "shared component" is the network switch through which they converse and the RDBMS that acts as the persistent store. If a site realizes that it needs additional coverage to track <whatever> it just adds another camera module and lets the RDBMS know about it's general location/functionality (i.e., how it can relate to any other cameras covering the same arena)
>>>>> My first feeling is you seem to be assuming a fairly cheep camera and then >>>>> doing some fairly simple processing over the partial image, in which case >>>>> you might even be able to live with a camera that uses a crude SPI >>>>> interface to bring the frame in, and a very simple processor. >>>> >>>> I use A LOT of cameras.&nbsp; But, I should be able to swap the camera >>>> (upgrade/downgrade) and still rely on the same *local* compute engine. >>>> E.g., some of my cameras have Ir illuminators; it's not important >>>> in others; some are PTZ; others fixed. >>> >>> Doesn't sound reasonable. If you downgrade a camera, you can't count on it >>> being able to meet the same requirements, or you over speced the initial >>> camera. >> >> Sorry, I was using up/down relative to "nominal camera", not "specific camera >> previously selected for application".&nbsp; I'd 8really* like to just have a >> single "camera module" (module = CPU+I/O) instead of one for camera type A >> and another for camera type B, etc. > > That only works if you are willing to spend for the sports car, even if you > just need it to go around the block.
If the "extra" bits of the sports car can be used by other elements, then those costs aren't directly borne by the camera module, itself. E.g., when the garage door is closed, there's no reason the modules in the garage can't be busy training speech models or removing commercials from recorded broadcast content. If, OTOH, you detect objects with a photo-interrupter across the door's path, there's scant little it can do when not needed.
> It depends a bit on how much span you need of capability. A $10 camera is > likely having a very different interface to a $30,000 camera, so will need a > different board. Some boards might handle multiple camera interface types if it > doesn't add a lot to the board, but you are apt to find that you need to make > some choice.
I don't ever see a need for a $30,000 camera. There may be a need for a PTZ model. Or, a low lux model. Or, one with longer focal length. Or, shorter (I'd considered putting one *in* the mailbox to examine its contents instead of just detecting that it had been "visited"). Instead of a 4K device, I'd opt for multiple simpler devices better positioned. But, not radically different in terms of cost, size, etc. If you walk into a bank lobby, you don't see *one* super-high resolution, wide field camera surveilling the lobby but, rather half a dozen or more watching specific portions of the lobby. Similarly, if you use the self-check at the store, there is a camera per checkout station instead of one "really good" camera located centrally trying to take it all in. This gives installers more leeway in terms of how they cover an arena.
> Then some tasks will just need a lot more computer power than others. Yes, you > can just put too much computer power on the simple tasks, (and that might make > sense to early design the higher end processor), but ultimately you are going > to want the less expensive lower end processors.
I can call on surplus processing power from other nodes in the system in much the same way that they can call on surplus capabilities from a camera module that isn't "seeing" anything interesting, at the moment. There will always be limits on what can be done; I'm not going to be able to VISUALLY verify that you have the right wrench in your hand as you set about working on the car. Or, that you are holding an eating utensil instead of a random piece of plastic as you traverse the kitchen. But, I'll know YOU are in the kitchen and likely the person whose voice I hear (to further reinforce the speaker identification algorithms).
>>> You put on a camera a processor capable of handling the tasks you expect out >>> of that set of hardware.&nbsp; One type of processor likely can handle a variaty >>> of different camera setup with >> >> Exactly.&nbsp; If a particular instance has an Ir illuminator, then you include >> controls for that in *the* "camera module".&nbsp; If another instance doesn't have >> this ability, then those controls go unused. > > Yes, Auxilary functionality is often cheap to include the hooks for.
But, it often requires looking at your TOTAL needs instead of designing for specific (initial) needs. E.g., my camera modules now include audio capabilities as there are instances where I want an audio pickup in the same arena that I am monitoring. Silly to have to add an "audio module" just because I didn't have the foresight to include it with the camera!
>>>> Watching for an obstruction in the path of a garage door (open/close) >>>> has different requirements than trying to recognize a visitor at the front >>>> door.&nbsp; Or, identify the locations of the occupants of a facility. >>> >>> Yes, so you don't want to "Pay" for the capability to recognize a visitor in >>> your garage door sensor, so you use different levels of sensor/processor. >> >> Exactly.&nbsp; But, the algorithms that do the scene analysis can be the same; >> you just parameterize the image and the objects within it that you seek. > > Actually, "Tracking" can be a very different type of algorithm then > "Detecting". You might be able to use a Tracking base algorithm to Detect, but > likely a much simpler algorithm can be used (needing less resources) to just > detect.
My current detection algorithm (e.g., garage) just looks for deltas between "clear" and "obstructed" imagery, conditioned by masks. There is some image processing required as things look different at night vs. day, etc. I don't have to "get it right". All I have to do is demonstrate "proof of concept". And, be able to indicate why a particular approach is superior to others/existing ones. E.g., if you drive a "pickup-on-steroids", you'd need to locate a photointerrupter "obstruction detector" pretty high up off the ground to catch the case where the truck bed was in the way of the door. Or, some lumber overhanging the end of the bed that you forgot you'd brought home! And, you'd likely need *another* detector down low to catch toddlers or toy wagons in the path of the door. OTOH, doing the detection with a camera catches these use conditions in addition to the "nominal" one for which the photointerrupter was designed. Tracking two/four occupants of a home *suggests* that you can track 6 or 8. Or, dozens of employees in a business conference room, etc. I have no desire to spend my time perfecting any of these technologies (I have other goals); just lay the groundwork and the framework to make them possible.
>> There will likely be some combinations that exceed the capabilities of >> the hardware to process in real-time.&nbsp; So, you fall back to lower >> frame rates or let the algorithms drop targets ("You watch Bob, I'll >> watch Tom!")
On 12/30/2022 5:32, Don Y wrote:
> On 12/29/2022 5:40 PM, Richard Damon wrote: >> On 12/29/22 5:57 PM, Don Y wrote: >>> On 12/29/2022 2:09 PM, Richard Damon wrote: >>>> On 12/29/22 2:26 PM, Don Y wrote: >>>>> On 12/29/2022 10:06 AM, Richard Damon wrote: >>>>>> On 12/29/22 8:16 AM, Don Y wrote: >>>>>>> ISTR playing with de-encapsulated DRAMs as image sensors >>>>>>> back in school (DRAM being relatively new technology, then). >>>>>>> >>>>>>> But, most cameras seem to have (bit- or word-) serial interfaces >>>>>>> nowadays.&nbsp; Are there any (mainstream/high volume) devices that >>>>>>> "look" like a chunk of memory, in their native form? >>>>>> >>>>>> Using a DRAM in that manner would only give you a single bit value >>>>>> for each pixel (maybe some more modern memories store multiple >>>>>> bits in a cell so you get a few grey levels). >>>>> >>>>> I mentioned the DRAM reference only as an exemplar of how a "true" >>>>> parallel, random access interface could exist. >>>> >>>> Right, and cameras based on parallel random access do exist, but >>>> tend to be on the smaller and slower end of the spectrum. >>>> >>>>> >>>>>> There are some CMOS sensors that let you address pixels >>>>>> individually and in a random order (like you got with the DRAM) >>>>>> but by its nature, such a readout method tends to be slow, and >>>>>> space inefficient, so these interfaces tend to be only available >>>>>> on smaller camera arrays. >>>>> >>>>> But, if you are processing the image, such an approach can lead to >>>>> higher throughput than having to transfer a serial data stream into >>>>> memory (thus consuming memory bandwidth). >>>> >>>> My guess is that in almost all cases, the need to send the address >>>> to teh camera and then get back the pixel value is going to use up >>>> more total bandwidth than getting the image in a stream. The one >>>> exception would be if you need just a very small percentage of the >>>> array data, and it is scattered over the array so a Region of >>>> Interest operation can't be used. >>> >>> No, you're missing the nature of the DRAM example. >>> >>> You don't "send" the address of the memory cell desired *to* the DRAM. >>> You simply *address* the memory cell, directly.&nbsp; I.e., if there are >>> N locations in the DRAM, then N addresses in your address space are >>> consumed by it; one for each location in the array. >> >> No, look at you DRAM timing again, the trasaction begins with the >> sending of the address over typically two clock edges with RAS and >> CAS, and then a couple of clock cycles and then you get back on the >> data bus the answer. > > But it's a single memory reference.&nbsp; Look at what happens when you > deserialize a USB video stream into that same DRAM.&nbsp; The DMAC has > tied up the bus for the same amount of time that the processor > would have if it read those same N locations. > >> Yes, the addresses come from an address bus, using address space out >> of the processor, but it is a multi-cycle operation. Typically, you >> read back a "burst" with some minimal caching on the processor side, >> but that is more a minor detail. >> >>> I'm looking for *that* sort of "direct access" in a camera. >> >> Its been awhile, but I thought some CMOS cameras could work on a >> similar basis, strobe a Row/Column address from pins on the camera, >> and a few clock cycles later you got a burst out of the camera >> starting at the address cell. > > I don't want the camera to decide which pixels *it* thinks I want to see. > It sends me a burst of a row -- but the next part of the image I may have > wanted to access may have been down the same *column*.&nbsp; Or, in another > part of the image entirely. > > Serial protocols inherently deliver data in a predefined pattern > (often intended for display).&nbsp; Scene analysis doesn't necessarily > conform to that same pattern.
Isn't there a camera doing a protocol which allows you to request a specific area only to be transferred? RFB like, VNC does that all the time.
On 12/31/2022 4:15 AM, Dimiter_Popoff wrote:
>> Serial protocols inherently deliver data in a predefined pattern >> (often intended for display).&nbsp; Scene analysis doesn't necessarily >> conform to that same pattern. > > Isn't there a camera doing a protocol which allows you to request > a specific area only to be transferred? RFB like, VNC does that > all the time.
That only makes sense if you know, a priori, which part(s) of the image you might want to examine. E.g., it would work for "exposing" just the portion of the field that "overlaps" some other image. I can get fixed parts of partial frames from *other* cameras just by ensuring the other camera puts that portion of the image in a particular memory object and then export that memory object to the node that wants it. But, if a target can move into or out of the exposed area, then you have to make a return trip to the camera to request MORE of the field. When your targets are "far away" (like a surveillance camera monitoring a parking lot), targets don't move from their previous noted positions considerably from one frame to the next. But, when the camera and targets are in close proximity, there's greater (apparent) relative motion in the same frame-interval. So, knowing where (x,y+WxH)) the portion of the image of interest lay, previously, is less predictive of where it may lie currently. Having the entire image available means the software can look <wherever> and <whenever>.
On 12/31/2022 20:16, Don Y wrote:
> On 12/31/2022 4:15 AM, Dimiter_Popoff wrote: >>> Serial protocols inherently deliver data in a predefined pattern >>> (often intended for display).&nbsp; Scene analysis doesn't necessarily >>> conform to that same pattern. >> >> Isn't there a camera doing a protocol which allows you to request >> a specific area only to be transferred? RFB like, VNC does that >> all the time. > > That only makes sense if you know, a priori, which part(s) of the > image you might want to examine.&nbsp; E.g., it would work for > "exposing" just the portion of the field that "overlaps" some > other image.&nbsp; I can get fixed parts of partial frames from > *other* cameras just by ensuring the other camera puts that > portion of the image in a particular memory object and then > export that memory object to the node that wants it. > > But, if a target can move into or out of the exposed area, then > you have to make a return trip to the camera to request MORE of > the field. > > When your targets are "far away" (like a surveillance camera > monitoring a parking lot), targets don't move from their > previous noted positions considerably from one frame to the > next. > > But, when the camera and targets are in close proximity, > there's greater (apparent) relative motion in the same > frame-interval.&nbsp; So, knowing where (x,y+WxH)) the portion of > the image of interest lay, previously, is less predictive > of where it may lie currently. > > Having the entire image available means the software > can look <wherever> and <whenever>. >
Well yes, obviously so, but this is valid whatever the interface. Direct access to the sensor cells can't be double buffered so you will have to transfer anyway to get the frame you are analyzing static. Perhaps you could find a way to make yourself some camera module using an existing one, MIPI or even USB, since you are looking for low overall cost; and add some MCU board to it to do the buffering and transfer areas on request. Or may be put enough CPU power together with each camera to do most if not all of the analysis... Depending on which achieves the lowest cost. But I can't say much on cost, that's pretty far from me (as you know).
On 12/31/2022 1:13 PM, Dimiter_Popoff wrote:
> On 12/31/2022 20:16, Don Y wrote: >> On 12/31/2022 4:15 AM, Dimiter_Popoff wrote: >>>> Serial protocols inherently deliver data in a predefined pattern >>>> (often intended for display).&nbsp; Scene analysis doesn't necessarily >>>> conform to that same pattern. >>> >>> Isn't there a camera doing a protocol which allows you to request >>> a specific area only to be transferred? RFB like, VNC does that >>> all the time. >> >> That only makes sense if you know, a priori, which part(s) of the >> image you might want to examine.&nbsp; E.g., it would work for >> "exposing" just the portion of the field that "overlaps" some >> other image.&nbsp; I can get fixed parts of partial frames from >> *other* cameras just by ensuring the other camera puts that >> portion of the image in a particular memory object and then >> export that memory object to the node that wants it. >> >> But, if a target can move into or out of the exposed area, then >> you have to make a return trip to the camera to request MORE of >> the field. >> >> When your targets are "far away" (like a surveillance camera >> monitoring a parking lot), targets don't move from their >> previous noted positions considerably from one frame to the >> next. >> >> But, when the camera and targets are in close proximity, >> there's greater (apparent) relative motion in the same >> frame-interval.&nbsp; So, knowing where (x,y+WxH)) the portion of >> the image of interest lay, previously, is less predictive >> of where it may lie currently. >> >> Having the entire image available means the software >> can look <wherever> and <whenever>. > > Well yes, obviously so, but this is valid whatever the interface. > Direct access to the sensor cells can't be double buffered so > you will have to transfer anyway to get the frame you are analyzing > static.
I would assume the devices would have evolved an "internal buffer" (as I said, my experience with *DRAM* in this manner was 40+ years ago)
> Perhaps you could find a way to make yourself some camera module > using an existing one, MIPI or even USB, since you are looking for low > overall cost; and add some MCU board to it to do the buffering and > transfer areas on request. Or may be put enough CPU power together with > each camera to do most if not all of the analysis... Depending on > which achieves the lowest cost. But I can't say much on cost, that's > pretty far from me (as you know).
My current approach gives me that -- MIPS, size, etc. But, the cost of transferring parts of the image (without adding a specific mechanism) is a "shared page" (DSM). So, host (on node A) references part of node *B*s frame buffer and the page (on B) containing that memory address gets shipped back to node A and mapped into A's memory. An agency on A could "touch" a "pixel-per-block" and cause the entire frame to be transferred to A, from B (or, I can treat the entire frame as a coherent object and arrange for ALL of it to be transferred when ANY of it is referenced). Some process on B could alternate between multiple such "memory objects" ("this one is complete, but I'm busy filling this OTHER one with data from the camera interface") to give me a *virtual* "camera memory device". But, transport delays make this unsuitable for real-time work; a megabyte of imagery would require 100ms to transfer, in "raw" form. (I could encode it on the originating host; transfer it and then decode it on the receiving host -- at the expense of MIPS. This is how I "record" video without saturating the network) So, you (B) want to "abstract" the salient features of the image while it is on B and then transfer just those to A. *Use* them, on A, and then move on to the next set of features (that B has computed while A was busy chewing on the last set) Or, give A direct access to the native data (without A having to capture video streams from each of the cameras that it wants to potentially examine)

The 2024 Embedded Online Conference