Bugzilla – Bug 1726
minstrel rate manager doesn't save state
Last modified: 2015-01-25 15:08:13 EST
It seems that the station state isn't completely saved, i.e. MinstrelRate table should be stored in MinstrelWifiRemoteStation rather in the rate manager class itself. This means that the rate manager bases decisions only on the last 100ms, rather than using the exponential weighting like it should. Am I missing something ?
(In reply to comment #0) > It seems that the station state isn't completely saved, i.e. MinstrelRate table > should be stored in MinstrelWifiRemoteStation rather in the rate manager class > itself. > > This means that the rate manager bases decisions only on the last 100ms, rather > than using the exponential weighting like it should. > > Am I missing something ? Hi. I think you are right. I was going to report a bug and found yours. In particular I found 2 problems: - The table is maintained in the Manager and so global for the node but it should be a parameter of the RemoteStation, i.e. a table for each associated node. - The table is reseted at the end of UpdateStats losing information for the next update. Matías
Matías, Thanks for your confirmation. I will move the table into the MinstrelWifiRemoteStation class, and post an update for testing. >The table is reseted at the end of UpdateStats losing information for the next update. I noticed removing this line allows correct operation when the STA or AP has only one connection. This might be helpful in some cases...
Created attachment 1642 [details] Move the MinstrelTable to the remote station
(In reply to comment #2) > Matías, > > Thanks for your confirmation. I will move the table into the > MinstrelWifiRemoteStation class, and post an update for testing. > > >The table is reseted at the end of UpdateStats losing information for the > next update. > > I noticed removing this line allows correct operation when the STA or AP has > only one connection. > > This might be helpful in some cases... Hi Jonathan. I've already made some modifications and it seems to work. I attach it. Tell me what you think.
Matías, I would say that this seems to work. However our troubles are not over yet. If you look at the ministrel table it isn't updated correctly when there are packet retries. I've uploaded proposed changes (unfortunately not in patch form), along with extra code to print out the table for debugging purposes. Let me know if you agree.
Created attachment 1644 [details] Patch for retries
Created attachment 1645 [details] Minstrel retry patch - 4 files in zip
(In reply to comment #7) > Created attachment 1645 [details] > Minstrel retry patch - 4 files in zip Hi Jonathan. I think I understand your changes but could you explain a bit more what are the implications or what it was wrong before? What I understand of the current minstrel implementation is that numRateAttempt is updated when all the retransmissions (MaxSlrc times) for a given frame had failed. That's why it is updated in FinalDataFail. It is difficult to determine if the implementation is correct (if it follows the original design) as this implementation is based on a low latency system and Minstrel is designed for a high latency system (see https://www.nsnam.org/bugzilla/show_bug.cgi?id=889) Does your changes imply a different behaviour of the algorithm?
Apologies for not being more specific. Basically we want for each packet sent at a particular rate, the success or failure to be reflected in the minstrel table, i.e. counters, and the ewma throughput. This is then used for the choice of the next rate. The current implementation does something odd when it comes to retries. It seems to ignore the different rates chosen for 2nd/3rd, etc. retries, and only the original rate gets updated incorrectly. This results in wrong throughput. I will try to get an example out using the new debugging mechanism to print the minstrel table.
Hi everyone, Sorry for the late response, I'm not very familiar with Minstrel so it took me some time to understand what's going on here. Thanks for the patch. I looked through the patch and moving the table to WifiRemoteStation makes sense. I think that is one obvious bug here. I'm not sure about the low/high latency though. I thought (please correct me if I'm wrong) that the UpdateStats is roughly done every 100ms? Does that make it high latency already? I mean the first thing UpdateStats does is checking if it's time to update the statistics or not.
As Jonathan pointed out, the the current implementation is incorrectly updating retry statistics (e.g. the failed look around rate is counted towards the "normal" rate). Also, I think that the current re-transmission counts are incorrect? My understanding is that, for each rate in the retry chain, you can retry multiple times within a 6ms time limit (for a total of 24ms in the retry chain). The current implementation seems to change the rate as soon as the transmission failed. (The code has an if-statement to prevent that, but I think the limit is always 1?)
(In reply to Daniel L. from comment #11) > As Jonathan pointed out, the the current implementation is incorrectly > updating retry statistics (e.g. the failed look around rate is counted > towards the "normal" rate). > > Also, I think that the current re-transmission counts are incorrect? My > understanding is that, for each rate in the retry chain, you can retry > multiple times within a 6ms time limit (for a total of 24ms in the retry > chain). The current implementation seems to change the rate as soon as the > transmission failed. (The code has an if-statement to prevent that, but I > think the limit is always 1?) Hi Daniel. Sorry, but I've been without time to look at this sooner. I think you are right in all you say. It seems that in the ns3 implementation is missing the calculation of the retry count for each segment of the retry chain and so it is always 1. In the original minstrel code at linux-wireless the retry counts are calculated so that the hardware do not stay more than 6ms at each rate, as you mentioned. I'll try to implement a patch with all the corrections we have been discussing.
fixed in changeset 11059:2fe905238013 (pushed patch found in Daniel's code review issue here: https://codereview.appspot.com/59710043/) this also resolved bug 1989
We need to reopen this due to valgrind issues with the recently committed patch, and the need to clean up some commented out code (and code that calls std::cout) in the previously committed patch. I'll add the valgrind output as an attachment.
Created attachment 1918 [details] valgrind output (text file)
It appears that the memory problem is solved in changeset 11066 d78d77e4b920. I attach a patch with the corrections of the commented code and the couts.
Created attachment 1937 [details] Correct commented code and couts
(In reply to Matías Richart from comment #16) > It appears that the memory problem is solved in changeset 11066 d78d77e4b920. > I attach a patch with the corrections of the commented code and the couts. Matias, if we were to apply your cleanup patch that you just submitted, do you think that this bug can now be closed? Are you aware of any other Minstrel issues? We don't have any test scripts for validating minstrel. Do you have a good reference/example for using Minstrel that we might be able to add to the codebase?
(In reply to Tom Henderson from comment #18) > (In reply to Matías Richart from comment #16) > > It appears that the memory problem is solved in changeset 11066 d78d77e4b920. > > I attach a patch with the corrections of the commented code and the couts. > > Matias, if we were to apply your cleanup patch that you just submitted, do > you think that this bug can now be closed? Are you aware of any other > Minstrel issues? > > We don't have any test scripts for validating minstrel. Do you have a good > reference/example for using Minstrel that we might be able to add to the > codebase? Tom. I have just found a bug while running an example. It was giving a segmentation fault when testing two nodes out of range and logging in debug level. I attach a new patch with the cleanup and this correction. Yes, I have an example with which is possible to reproduce the tests done here http://www.nsnam.org/wiki/GSOC2009Minstrel#Validation_Plan. I couldn't find anywhere the scripts used for that validation. I attach the example as a patch. The results are similar as the one on that page. I think a test with 1 AP and 2 associated STAs is necessary so as to test some of the corrections of this bug. The algorithm seems to work well. However, I found it difficult to come up with a thorough example that tests the correctness of the implementation.
Created attachment 1938 [details] Cleanup commented code and correct bug mentioned in comment #19
Created attachment 1939 [details] Example for using rate control algorithms
latest patches pushed in changeset 11154:305803f4125c