EmbeddedRelated.com
Forums

Multicasting and Switches

Started by D Yuniskis November 8, 2010
Hi Clifford,

Clifford Heath wrote:
> D Yuniskis wrote: >> Clifford Heath wrote: >>> When I implemented a P2P file distribution platform, I >>> decided that multicast wasn't useful and went instead for >> Why? --------------------^^^^^^^^^^^^^ > > For this app, the computers targeted are almost always > on the same subnet, so broadcasts naturally get propagated > to just the places they're needed (by default, broadcast
Sorry, I should have been more precise in my comment. I intended "multicast" to encompass broadcast :-/ I should have said "non-unicast" to be more clear.
> traffic stops at subnet boundaries). LAN traffic is regarded > as essentially free, it's WAN traffic that needs to be > limited and shared.
Understood. See below
>>> broadcasts. The LDSS protocol (Local Download Sharing >> I'll have to look at the RFC's... > > It got to a Draft, which has expired, but I've attached it > below. > > Very simple protocol; two messages only (since files are > only identified by SHA-1 hash); just NeedFile and WillSend, > having the same packet structure.
The control packets are broadcast. But, it is unclear as to how the actual payload is delivered (seems to be left to the application to decide?). E.g., the protocol seems to imply each "Need"-ing host can use whatever (supported) protocol to fetch the payload from the "Have"-ing host(s). I don't see anything akin to reliable multicast inherent in the protocol (though, conceivably, a host that loses/drops payload can subsequently reissue a "NeedFile" request).
>>> normal system operations. It was an interesting project! >> This is geared towards asynchronous sharing, no doubt. > > Software distribution. All machines fetch an individual > policy file saying what software they should install, and > in most cases there is overlap; more than one machine > needs the same software. Rather than all downloading a > separate copy, they announce their plans, progress, and > ETA, so others know they can wait for a LAN transfer when > it's done.
OK, different usage model than what I was targeting. E.g., consider N (N being large) diskless workstations powering up simultaneously and all wanting to fetch the (identical) image from a single/few server(s). Clearly, a broadcast/reliable multicast scheme would best utilize the network bandwidth (in this case).
>> How would you (re)consider your design choices in a >> *synchronous* environment? > > In the same-subnet scenario, I'd probably still use broadcast, > unless the utilization is likely to reach a significant > percentage (say, >25%) of the media's capability, or there > is a likelihood of multiple synchronized groups which won't > necessarily have to pass traffic across the same link. > > The latter case is pretty rare, actually - a home media subnet > is likely all going through one switch and hence limited by > its capability. Using a IGMP aware router is unlikely to help.
Here, I'm looking at the scenario where many devices are powered up simultaneously (as above) and need to fetch images over a network that is already being used for other traffic (MM and otherwise). Or, when their collective "mode of operation" changes at run-time and they need to (all) load an "overlay", etc. Unicast transfers (of any sort) mean that the "image server" sees a higher load as it has to push the same image out N times. (A P2P scheme shares that load among the nodes themselves but you still have a longer theoretical time until all nodes have valid images -- unless your P2P algorithm carefully schedules which packets go where to maximize network utilization). Ideally, the "image server" would coincide with the "media server" (or at least *one* such media server) so that host is already busy with some sort of load. I *think* (unsupported) that reliable multicast/broadcast gives you the shortest time to "everyone having a clean image" -- of course, cases exist where any protocol can be boundless.
>> How would you (re)consider that same scenario in a wireless >> network (with nodes closely located -- "tight weave" >> instead of a "loose mesh")? > > I'm not familiar with the implementation of broadcast/multicast > IP in a wireless environment, but I can't imagine that it would > change very much.
I was only mentioning wireless in the sense that it can exploit broadcast easily if there is an underlying protocol to govern access to the "medium" (hence the distinction between loose/tight meshes)
Resend, last didn't appear. Sorry if you  get it twice.

D Yuniskis wrote:
>> Very simple protocol; two messages only (since files are >> only identified by SHA-1 hash); just NeedFile and WillSend, >> having the same packet structure. > > The control packets are broadcast. But, it is unclear as > to how the actual payload is delivered (seems to be left > to the application to decide?).
TCP - it's documented under the heading "TCP file transfers". Either a raw stream is sent in response to a NeedFile request (if the needer advertised a port number), or a simplified HTTP-style raw request-response, if the needer responds to an advertised port in a WillSend promise. The two forms are needed in case one system has a (software?) firewall.
> I don't see anything akin to reliable multicast inherent > in the protocol (though, conceivably, a host that loses/drops > payload can subsequently reissue a "NeedFile" request).
Everything is checked through SHA-1 hashes. If a file reaches its expected size but the hash doesn't match, the whole file is dropped (since there's no way to know where the error occurred) and the search starts afresh.
> OK, different usage model than what I was targeting.
Yes.
> E.g., consider N (N being large) diskless workstations > powering up simultaneously and all wanting to fetch > the (identical) image from a single/few server(s). > Clearly, a broadcast/reliable multicast scheme would > best utilize the network bandwidth (in this case).
Yes, but beware that a single dropped packet at the source will cause every recipient to send a NACK. This is the problem with massive wide-area reliable multicast protocols, they get you out of the data fan-out problem but replace it with a NACK fan-in one instead. That's also why some routers have been taught how to aggregate such NACKs. If you're dealing with tens or hundreds of machines on a LAN, it's probably not an issue - low error rate and it's possible to cope with the NACKs. Even if many of them are dropped, it only requires one to get through and the requested section will be multicast again.
> Here, I'm looking at the scenario where many devices are powered > up simultaneously (as above) and need to fetch images over > a network
I expect that due to startup timing differences, you'd need to sit and listen for a relevant multicast to start for a few seconds before requesting it... or to wait after receiving such a request for a few seconds before starting to send. Include identification inside the stream so latecomers realise what they're missing and can re-fetch the earlier parts. The other thing we considered doing with massive multicast involving overlapping sets of parties was to allocate 2**N multicast groups, and take N bits of the SHA-1 of the file (or channel ID, if not using content-addressing) to decide which IGMP group to send it to. That way a smart router can refrain from sending packets down links where no-one might be interested. Clifford Heath.
On Tue, 16 Nov 2010 12:38:18 -0700, D Yuniskis
<not.going.to.be@seen.com> wrote:

>OK, different usage model than what I was targeting. >E.g., consider N (N being large) diskless workstations >powering up simultaneously and all wanting to fetch >the (identical) image from a single/few server(s). >Clearly, a broadcast/reliable multicast scheme would >best utilize the network bandwidth (in this case).
One way would be to break up the message into numbered blocks with CRCs and use a simple carousel to repeatedly broadcast those blocks. This is used e.g. for firmware updates for TV STBs, in which no return channel is available. Each receiver accumulates blocks and if you did not get all blocks during the first cycle, wait for the next carousel cycle to pick the missing blocks. If there is a return channel, each slave could initially request all blocks, after one full cycle check which blocks are missing and only request those missing blocks. After each full cycle, the server would check all the update requests received during the cycle and drop those blocks from the carousel which have not been requested, those speeding up the update cycle, finally shutting down the carousel. If the expected error rate is low, the missing blocks could be asked even with unicasts. If the expected error rate is high, such as in some radio links with large blocks, a memory ARQ system could be used, in which blocks failing the CRC are stored and if subsequent reception(s) of the same block also fail the CRC check, the previously received blocks are accumulated, until the accumulated block passes the CRC. Alternatively, the few missing blocks could be transmitted again with a better ECC coding or just send the actual error correction bits to be combined with the ordinary received data block bits (assuming proper interleaving) at the receiver.
Hi Clifford,

Clifford Heath wrote:
> Resend, last didn't appear. Sorry if you get it twice. > > D Yuniskis wrote: >>> Very simple protocol; two messages only (since files are >>> only identified by SHA-1 hash); just NeedFile and WillSend, >>> having the same packet structure. >> >> The control packets are broadcast. But, it is unclear as >> to how the actual payload is delivered (seems to be left >> to the application to decide?). > > TCP - it's documented under the heading "TCP file transfers". > Either a raw stream is sent in response to a NeedFile > request (if the needer advertised a port number), or a > simplified HTTP-style raw request-response, if the needer > responds to an advertised port in a WillSend promise.
Yes but all "unicast" (i.e., connection oriented protocol)... the "need-er" and the "have-er" engage in a dedicated dialog.
> The two forms are needed in case one system has a (software?) > firewall.
Hmmm... not sure I see why (though my brain is frozen from lying on the roof for the past hour :-/ )
>> I don't see anything akin to reliable multicast inherent >> in the protocol (though, conceivably, a host that loses/drops >> payload can subsequently reissue a "NeedFile" request). > > Everything is checked through SHA-1 hashes. If a file reaches > its expected size but the hash doesn't match, the whole file > is dropped (since there's no way to know where the error occurred) > and the search starts afresh.
Understood. You don't split the file into "pieces" (though, conceivably, one could "pre-split" the REAL *files* into smaller "files" at the expense of a tiny bit more overhead...
>> OK, different usage model than what I was targeting. > > Yes. > >> E.g., consider N (N being large) diskless workstations >> powering up simultaneously and all wanting to fetch >> the (identical) image from a single/few server(s). >> Clearly, a broadcast/reliable multicast scheme would >> best utilize the network bandwidth (in this case). > > Yes, but beware that a single dropped packet at the source > will cause every recipient to send a NACK. This is the > problem with massive wide-area reliable multicast protocols, > they get you out of the data fan-out problem but replace it > with a NACK fan-in one instead. That's also why some routers > have been taught how to aggregate such NACKs.
Yes.
> If you're dealing with tens or hundreds of machines on a > LAN, it's probably not an issue - low error rate and it's > possible to cope with the NACKs. Even if many of them are > dropped, it only requires one to get through and the requested > section will be multicast again.
I'm looking at a hybrid approach. As always, the initial assumptions drive the design... Let "the" image server multicast (or even broadcast, depending on the domain of the recipients) *THE* image. Let hosts that end up "missing" parts of that image request those parts from their peers (assuming *some* peers have received the parts). So, this can happen concurrent with the rest of the "main" image's delivery (i.e., it doesn't need to be a serial activity). I have to look at the model and how it would apply in mesh networks (where your peer is often responsible for forwarding traffic from other nodes -- i.e., the "image server") to see what the overall traffic pattern looks like. It might be a win for that peer to broadcast/multicast that "piece" in the event other hosts (i.e., those downstream from *you*) have missed the piece as well. <frown> Things seem to get harder, not easier :>
>> Here, I'm looking at the scenario where many devices are powered >> up simultaneously (as above) and need to fetch images over >> a network > > I expect that due to startup timing differences, you'd need > to sit and listen for a relevant multicast to start for a > few seconds before requesting it... or to wait after receiving > such a request for a few seconds before starting to send.
Yes. A short delay and allow "need-ers" to pick up the stream at arbitrary points (by cutting it into pieces) instead of having to listen from the beginning (request the parts you missed, later)
> Include identification inside the stream so latecomers realise > what they're missing and can re-fetch the earlier parts.
Exactly. And, if they can request those parts from peers to distribute the traffic better...
> The other thing we considered doing with massive multicast > involving overlapping sets of parties was to allocate 2**N > multicast groups, and take N bits of the SHA-1 of the file > (or channel ID, if not using content-addressing) to decide > which IGMP group to send it to. That way a smart router can > refrain from sending packets down links where no-one might > be interested.
I'm not sure I follow -- isn't *everyone* interested?
Op Tue, 09 Nov 2010 23:57:21 +0100 schreef D Yuniskis  
<not.going.to.be@seen.com>:
> Jim Stewart wrote: >> D Yuniskis wrote: >>> Jim Stewart wrote: >> >>>> Falling back to the educated guess disclaimer, >>>> I'd say the maximum latency is indeterminate. >>>> >>>> It seems that by definition, that if the multicast >>>> packet collides with another packet, the latency >>>> will be indeterminate. >>> >>> That depends on the buffering in the switch. And, >>> how the multicast packet is treated *by* the switch. >> Since to the best of my knowledge, in the event of >> an ethernet collision, both senders back off a random >> amount of time then retransmit, I can't see how the >> switch buffering would make any difference. > > The time a packet (*any* packet) spends buffered in > the switch looks like an artificial transport delay > (there's really nothing "artificial" about it :> ). > Hence my comment re: "speed of light" delays. > > When you have multicast traffic, the delay through > the switch can vary depending on the historical > traffic seen by each targeted port. I.e., if port A > has a packet already buffered/queued while port B > does not, then the multicast packet will get *to* > the device on port B quicker than on port A. > > If you have two or more streams and are hoping to > impose a temporal relationship on them, you need to > know how they will get to their respective consumers.
Or use RTCP timestamps to synchronize the streams.
>> For that matter, does the sender even monitor for >> collisions and retransmit in a multicast environment. >> I guess I don't know... > > Multicast is like "shouting from the rooftop -- WITH A > DEAF EAR". If it gets heard, great. If not, <shrug>. > > There are reliable multicast protocols that can be built > on top of this. They allow "consumers" to request > retransmission of portions of the "broadcast" that they > may have lost (since the packet may have been dropped > at their doorstep or anyplace along the way). > > With AV use, this gets to be problematic because you > want to reduce buffering in the consumers, minimize > latency,
Latency? Why would you have noticeable latency? You can start playing the media before the buffer is full, then stretch it a bit to allow the buffer to catch up.
> etc. So, the time required to detect a > missing packet, request a new copy of it and accept > that replacement copy (there is no guarantee that > you will receive this in a fixed time period!) conflicts > with those other goals (assuming you want to avoid > audio dropouts, video pixelation, etc.). > > Remember that any protocol overhead you *add* contributes > to the problem, to some extent (as it represents more > network traffic and more processing requirements). > The "ideal" is just to blast UDP packets down the pipe > and *pray* they all get caught.
-- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/ (remove the obvious prefix to reply by mail)
Hi Paul,

Paul Keinanen wrote:
> On Tue, 16 Nov 2010 12:38:18 -0700, D Yuniskis > <not.going.to.be@seen.com> wrote: > >> OK, different usage model than what I was targeting. >> E.g., consider N (N being large) diskless workstations >> powering up simultaneously and all wanting to fetch >> the (identical) image from a single/few server(s). >> Clearly, a broadcast/reliable multicast scheme would >> best utilize the network bandwidth (in this case). > > One way would be to break up the message into numbered blocks with > CRCs and use a simple carousel to repeatedly broadcast those blocks. > This is used e.g. for firmware updates for TV STBs, in which no return > channel is available.
The back channel, here, would see very little traffic (when compared to the forward channel). So, aside from the requirement that it places on the "image server", its impact is relatively small. I was thinking of a protocol that could offload this "missed block" portion of the process to peers who *may* have correctly received the block. This should fare well when transposed to a mesh topology -- where your peer may, in fact, be your actual "upstream link" (so why propagate the request all the way upstream if your peer can -- and, ultimately *will* -- handle it?)
> Each receiver accumulates blocks and if you did not get all blocks > during the first cycle, wait for the next carousel cycle to pick the > missing blocks. > > If there is a return channel, each slave could initially request all > blocks, after one full cycle check which blocks are missing and only > request those missing blocks. After each full cycle, the server would > check all the update requests received during the cycle and drop those > blocks from the carousel which have not been requested, those speeding > up the update cycle, finally shutting down the carousel.
Rather than "requesting all blocks", I envision requesting a larger object (e.g., a file or an entire image). *Assume* it will arrive intact at each consumer (concurrently). Then, handle the missing parts as I described above. So, the image server is, effectively, the "peer of last resort" if no other *true* peer can satisfy the request -- this might be handled with something as simple as a timeout (i.e., the image server deliberately ignores these requests for some period of time to allow "peers" to attempt to satisfy it, instead).
> If the expected error rate is low, the missing blocks could be asked > even with unicasts. > > If the expected error rate is high, such as in some radio links with > large blocks, a memory ARQ system could be used, in which blocks > failing the CRC are stored and if subsequent reception(s) of the same > block also fail the CRC check, the previously received blocks are > accumulated, until the accumulated block passes the CRC.
Huh? Perhaps you meant "until the failing block also passes the CRC"?
> Alternatively, the few missing blocks could be transmitted again with > a better ECC coding or just send the actual error correction bits to > be combined with the ordinary received data block bits (assuming > proper interleaving) at the receiver.
Now you've lost me. Why change ECC (you've got horsepower on the Rx end so a "less effective" CRC doesn't really buy you much of anything)?
Hi Boudewijn,

Boudewijn Dijkstra wrote:
> Op Tue, 09 Nov 2010 23:57:21 +0100 schreef D Yuniskis > <not.going.to.be@seen.com>: >> Jim Stewart wrote: >>> D Yuniskis wrote: >>>> Jim Stewart wrote: >>> >>>>> Falling back to the educated guess disclaimer, >>>>> I'd say the maximum latency is indeterminate. >>>>> >>>>> It seems that by definition, that if the multicast >>>>> packet collides with another packet, the latency >>>>> will be indeterminate. >>>> >>>> That depends on the buffering in the switch. And, >>>> how the multicast packet is treated *by* the switch. >>> Since to the best of my knowledge, in the event of >>> an ethernet collision, both senders back off a random >>> amount of time then retransmit, I can't see how the >>> switch buffering would make any difference. >> >> The time a packet (*any* packet) spends buffered in >> the switch looks like an artificial transport delay >> (there's really nothing "artificial" about it :> ). >> Hence my comment re: "speed of light" delays. >> >> When you have multicast traffic, the delay through >> the switch can vary depending on the historical >> traffic seen by each targeted port. I.e., if port A >> has a packet already buffered/queued while port B >> does not, then the multicast packet will get *to* >> the device on port B quicker than on port A. >> >> If you have two or more streams and are hoping to >> impose a temporal relationship on them, you need to >> know how they will get to their respective consumers. > > Or use RTCP timestamps to synchronize the streams.
But you can only synchronize to the granularity that the "buffering discrepancy" in the switch allows.
>>> For that matter, does the sender even monitor for >>> collisions and retransmit in a multicast environment. >>> I guess I don't know... >> >> Multicast is like "shouting from the rooftop -- WITH A >> DEAF EAR". If it gets heard, great. If not, <shrug>. >> >> There are reliable multicast protocols that can be built >> on top of this. They allow "consumers" to request >> retransmission of portions of the "broadcast" that they >> may have lost (since the packet may have been dropped >> at their doorstep or anyplace along the way). >> >> With AV use, this gets to be problematic because you >> want to reduce buffering in the consumers, minimize >> latency, > > Latency? Why would you have noticeable latency?
Each consumer would need to be designed with a "deep enough" buffer to be able to handle any dropped packets, short-term network overload, etc. I.e., if, statistically, it requires T time to restart/resume an interrupted stream then your buffer has to be able to maintain the integrity of the "A/V signal" for that entire duration (else the failure to do so becomes a "noticeable event" to the user). Consider that some causes of "missed packets" can be system-wide. I.e., *many* nodes could have lost the same packet -- or, packets adjacent (temporally). In that case, multiple (N) retransmission requests can be destined for the server simultaneously. If those are processed as unicast requests, then multiple packet times (N) may elapse before a particular node's request is satisfied.
> You can start playing > the media before the buffer is full, then stretch it a bit to allow the > buffer to catch up.
With video, the user can *tolerate* (though not *enjoy*!) the occasional "frozen frame" -- as long as it doesn't become frequent. (persistence of vision) With audio, it's much harder to span any gaps. You can't just replay the past T of the audio stream without it *really* being noticeable. You also have to be able to re-synchronize the streams *after* the "dropout" (imagine two displays/speakers side by side; the image/sound from each must be in phase for a "pleasing A/V experience" :>). So, you can't just "stretch time" to span the dropout.
>> etc. So, the time required to detect a >> missing packet, request a new copy of it and accept >> that replacement copy (there is no guarantee that >> you will receive this in a fixed time period!) conflicts >> with those other goals (assuming you want to avoid >> audio dropouts, video pixelation, etc.). >> >> Remember that any protocol overhead you *add* contributes >> to the problem, to some extent (as it represents more >> network traffic and more processing requirements). >> The "ideal" is just to blast UDP packets down the pipe >> and *pray* they all get caught.