Hello fellow developers, I would like your opinion about the case below: Mid 2014 I finished the prototypes of a new product using a WiFi module of a manufactur I will call 'X', and some other peripherals on an SPI bus. I did the usual tests I perform on a new board: check if some test code runs on the micro and check if this test code can access all peripherals and I/O on the board. This was the case, I could talk to all SPI devices, including the WiFi module. I could send commands to the WiFi module and the WiFi module responded correctly. So I sent the design to our manufacturer to have 240 boards produced. A procedure I always follow successfully, it buys me time to finish the firmware. So I continued developing the firmware and soon found that the WiFi module did not work according to the specifications. First of all, once selected with the SPI CS signal, it would stay selected and block access to all other devices on the SPI bus. An e-mail to manufacturer 'X' resulted in a firmware update which fixed that problem. Interesting detail: this module was on the market for over 2 years already... Then it appeared that the module would stop communicating seemingly randomly, requiring a reset to get it on track again. This module works "synchronously", meaning you send a command and the module responsds with a status, data or a simple "OK". It is then ready for a new command. That is what the datasheet and the command reference say. After tens of e-mails with trace files of the SPI communications with X, I got the message "you're doing everything right, the problem seems to be in our module". We're now 2 months further in time. After some more testing and exchanging e-mails and trace files, the repsonse from X was: "if you receive the ""OK"" from the module, you need to wait 20ms before sending a new command". WTF???? This is not in the datasheet and this severely throttles the bandwidth if you need to keep 4 TCP connections active, where each connection needs a command to send, wait for the OK, wait 20ms, poll for received data, wait for the OK, wait another 20ms and repeat this 3 times for the other 3 sockets. And X claimed a sustained data rate of 4.5Mb/s .... It appeared that X had never discovered this problem in 2 years because they NEVER built an embedded system to test their own module in a real situation. They had always tested using some sort of USB-to-SPI gateway connected to a PC, running Python scripts that executed the commands. This process has an inherent system delay of around 16ms! An already long story short: X and I exchanged about 260 e-mails, I received 5 firmware updates and still X could not solve the problem put the blame on Broadcom and the combo FreeRTOS/LwIP. Funny enough, I no use a Lantronix xPico WiFi module, which is also based on the same Broadcom chip and this module runs flawlessly. This entire adventure has cost me over a year, including a complete redesign with with the new xPico WiFi. Not to mention the trust of our customers who have been waiting for this new product for an extra year. Meindert
WiFi module Woes
Started by ●March 9, 2016
Reply by ●March 9, 20162016-03-09
On 09.3.2016 г. 11:16, customwarempx@gmail.com wrote:> Hello fellow developers, > > I would like your opinion about the case below: > > Mid 2014 I finished the prototypes of a new product using a WiFi module of a manufactur I will call 'X', and some other peripherals on an SPI bus. I did the usual tests I perform on a new board: check if some test code runs on the micro and check if this test code can access all peripherals and I/O on the board. This was the case, I could talk to all SPI devices, including the WiFi module. I could send commands to the WiFi module and the WiFi module responded correctly. > > So I sent the design to our manufacturer to have 240 boards produced. A procedure I always follow successfully, it buys me time to finish the firmware. > > So I continued developing the firmware and soon found that the WiFi module did not work according to the specifications. First of all, once selected with the SPI CS signal, it would stay selected and block access to all other devices on the SPI bus. An e-mail to manufacturer 'X' resulted in a firmware update which fixed that problem. Interesting detail: this module was on the market for over 2 years already... > > Then it appeared that the module would stop communicating seemingly randomly, requiring a reset to get it on track again. This module works "synchronously", meaning you send a command and the module responsds with a status, data or a simple "OK". It is then ready for a new command. That is what the datasheet and the command reference say. > > After tens of e-mails with trace files of the SPI communications with X, I got the message "you're doing everything right, the problem seems to be in our module". We're now 2 months further in time. > > After some more testing and exchanging e-mails and trace files, the repsonse from X was: "if you receive the ""OK"" from the module, you need to wait 20ms before sending a new command". WTF???? This is not in the datasheet and this severely throttles the bandwidth if you need to keep 4 TCP connections active, where each connection needs a command to send, wait for the OK, wait 20ms, poll for received data, wait for the OK, wait another 20ms and repeat this 3 times for the other 3 sockets. And X claimed a sustained data rate of 4.5Mb/s .... > > It appeared that X had never discovered this problem in 2 years because they NEVER built an embedded system to test their own module in a real situation. They had always tested using some sort of USB-to-SPI gateway connected to a PC, running Python scripts that executed the commands. This process has an inherent system delay of around 16ms! > > An already long story short: X and I exchanged about 260 e-mails, I received 5 firmware updates and still X could not solve the problem put the blame on Broadcom and the combo FreeRTOS/LwIP. > > Funny enough, I no use a Lantronix xPico WiFi module, which is also based on the same Broadcom chip and this module runs flawlessly. > > This entire adventure has cost me over a year, including a complete redesign with with the new xPico WiFi. Not to mention the trust of our customers who have been waiting for this new product for an extra year. > > > Meindert >The secrecy wifi is covered with is so deep one cannot help thinking in conspiracy theory mode only I suppose. The modules one could use to write a driver for so they would look like an ethernet cable to the tcp/ip stack are just secret, only politburo members may access their firmware command syntax. And I don't know if all the politburo members may access it or only the most senior ones (google, ms, apple). So the rest of the world is left using crippled thingies like the one you describe, which want to sell you a defunct tcp/ip stack over - as you have found out - defunct hardware. Wifi is just not a viable option for a product unless you are a senior politburo member, that's how life is. Sorry for the rant, I know it won't help you much but you pressed my red button with your post... Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Reply by ●March 9, 20162016-03-09
On 3/9/2016 4:16 AM, customwarempx@gmail.com wrote:> Hello fellow developers, > > I would like your opinion about the case below: > > Mid 2014 I finished the prototypes of a new product using a WiFi module of a manufactur I will call 'X', and some other peripherals on an SPI bus. I did the usual tests I perform on a new board: check if some test code runs on the micro and check if this test code can access all peripherals and I/O on the board. This was the case, I could talk to all SPI devices, including the WiFi module. I could send commands to the WiFi module and the WiFi module responded correctly. > > So I sent the design to our manufacturer to have 240 boards produced. A procedure I always follow successfully, it buys me time to finish the firmware.. > > So I continued developing the firmware and soon found that the WiFi module did not work according to the specifications. First of all, once selected with the SPI CS signal, it would stay selected and block access to all other devices on the SPI bus. An e-mail to manufacturer 'X' resulted in a firmware update which fixed that problem. Interesting detail: this module was on the market for over 2 years already... > > Then it appeared that the module would stop communicating seemingly randomly, requiring a reset to get it on track again. This module works "synchronously", meaning you send a command and the module responsds with a status, data or a simple "OK". It is then ready for a new command. That is what the datasheet and the command reference say. > > After tens of e-mails with trace files of the SPI communications with X, I got the message "you're doing everything right, the problem seems to be in our module". We're now 2 months further in time. > > After some more testing and exchanging e-mails and trace files, the repsonse from X was: "if you receive the ""OK"" from the module, you need to wait 20ms before sending a new command". WTF???? This is not in the datasheet and this severely throttles the bandwidth if you need to keep 4 TCP connections active, where each connection needs a command to send, wait for the OK, wait 20ms, poll for received data, wait for the OK, wait another 20ms and repeat this 3 times for the other 3 sockets. And X claimed a sustained data rate of 4.5Mb/s .... > > It appeared that X had never discovered this problem in 2 years because they NEVER built an embedded system to test their own module in a real situation. They had always tested using some sort of USB-to-SPI gateway connected to a PC, running Python scripts that executed the commands. This process has an inherent system delay of around 16ms! > > An already long story short: X and I exchanged about 260 e-mails, I received 5 firmware updates and still X could not solve the problem put the blame on Broadcom and the combo FreeRTOS/LwIP. > > Funny enough, I no use a Lantronix xPico WiFi module, which is also based on the same Broadcom chip and this module runs flawlessly. > > This entire adventure has cost me over a year, including a complete redesign with with the new xPico WiFi. Not to mention the trust of our customers who have been waiting for this new product for an extra year.I feel your pain. The lesson here is that a third party product should never be assumed to work correctly until you have used it in a design. Organizing your project to optimize development time rather than mitigate risk was the problem. Had you properly wrung out the module before you sent out for boards you could have found the problem and designed out the module when it didn't work up to snuff. -- Rick
Reply by ●March 9, 20162016-03-09
On 3/9/2016 12:26 PM, rickman wrote:> On 3/9/2016 4:16 AM, customwarempx@gmail.com wrote: >> Hello fellow developers, >> >> I would like your opinion about the case below: >> >> Mid 2014 I finished the prototypes of a new product using a WiFi >> module of a manufactur I will call 'X', and some other peripherals on >> an SPI bus. I did the usual tests I perform on a new board: check if >> some test code runs on the micro and check if this test code can >> access all peripherals and I/O on the board. This was the case, I >> could talk to all SPI devices, including the WiFi module. I could send >> commands to the WiFi module and the WiFi module responded correctly. >> >> So I sent the design to our manufacturer to have 240 boards produced. >> A procedure I always follow successfully, it buys me time to finish >> the firmware.. >> >> So I continued developing the firmware and soon found that the WiFi >> module did not work according to the specifications. First of all, >> once selected with the SPI CS signal, it would stay selected and block >> access to all other devices on the SPI bus. An e-mail to manufacturer >> 'X' resulted in a firmware update which fixed that problem. >> Interesting detail: this module was on the market for over 2 years >> already... >> >> Then it appeared that the module would stop communicating seemingly >> randomly, requiring a reset to get it on track again. This module >> works "synchronously", meaning you send a command and the module >> responsds with a status, data or a simple "OK". It is then ready for a >> new command. That is what the datasheet and the command reference say. >> >> After tens of e-mails with trace files of the SPI communications with >> X, I got the message "you're doing everything right, the problem seems >> to be in our module". We're now 2 months further in time. >> >> After some more testing and exchanging e-mails and trace files, the >> repsonse from X was: "if you receive the ""OK"" from the module, you >> need to wait 20ms before sending a new command". WTF???? This is not >> in the datasheet and this severely throttles the bandwidth if you need >> to keep 4 TCP connections active, where each connection needs a >> command to send, wait for the OK, wait 20ms, poll for received data, >> wait for the OK, wait another 20ms and repeat this 3 times for the >> other 3 sockets. And X claimed a sustained data rate of 4.5Mb/s .... >> >> It appeared that X had never discovered this problem in 2 years >> because they NEVER built an embedded system to test their own module >> in a real situation. They had always tested using some sort of >> USB-to-SPI gateway connected to a PC, running Python scripts that >> executed the commands. This process has an inherent system delay of >> around 16ms! >> >> An already long story short: X and I exchanged about 260 e-mails, I >> received 5 firmware updates and still X could not solve the problem >> put the blame on Broadcom and the combo FreeRTOS/LwIP. >> >> Funny enough, I no use a Lantronix xPico WiFi module, which is also >> based on the same Broadcom chip and this module runs flawlessly. >> >> This entire adventure has cost me over a year, including a complete >> redesign with with the new xPico WiFi. Not to mention the trust of our >> customers who have been waiting for this new product for an extra year. > > I feel your pain. The lesson here is that a third party product should > never be assumed to work correctly until you have used it in a design. > Organizing your project to optimize development time rather than > mitigate risk was the problem. Had you properly wrung out the module > before you sent out for boards you could have found the problem and > designed out the module when it didn't work up to snuff.A real world case where early evaluation saved out butts... We were designing a product with a GPS function using a module. We narrowed our choices to two vendors and got evals. We tested them in real world conditions (well, real signals in a lab with an antenna on the roof) and found one very significant difference. One brand could not meet their time to first fix spec. We invited them to show us what we were doing wrong and they declined. So we went with the guys who worked. Until then my preference had been to use the product that failed our test. Modules are great if they work. Even large vendors with a significant presence can produce crap. -- Rick







