Bug 1726 - minstrel rate manager doesn't save state
minstrel rate manager doesn't save state
Status: RESOLVED FIXED
Product: ns-3
Classification: Unclassified
Component: wifi
ns-3.17
All All
: P5 normal
Assigned To: Daniel L.
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-01 14:05 EDT by Jonathan Ling
Modified: 2015-01-25 15:08 EST (History)
6 users (show)

See Also:


Attachments
Move the MinstrelTable to the remote station (18.40 KB, patch)
2013-07-17 16:10 EDT, Matías Richart
Details | Diff
Patch for retries (27.05 KB, text/x-c++src)
2013-07-18 14:50 EDT, Jonathan Ling
Details
Minstrel retry patch - 4 files in zip (19.42 KB, application/octet-stream)
2013-07-18 14:54 EDT, Jonathan Ling
Details
valgrind output (text file) (11.47 KB, application/octet-stream)
2014-11-14 13:04 EST, Tom Henderson
Details
Correct commented code and couts (2.50 KB, patch)
2015-01-07 18:12 EST, Matías Richart
Details | Diff
Cleanup commented code and correct bug mentioned in comment #19 (3.83 KB, patch)
2015-01-08 17:12 EST, Matías Richart
Details | Diff
Example for using rate control algorithms (9.54 KB, patch)
2015-01-08 17:13 EST, Matías Richart
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jonathan Ling 2013-07-01 14:05:40 EDT
It seems that the station state isn't completely saved, i.e. MinstrelRate table should be stored in MinstrelWifiRemoteStation rather in the rate manager class itself.

This means that the rate manager bases decisions only on the last 100ms, rather than using the exponential weighting like it should.

Am I missing something ?
Comment 1 Matías Richart 2013-07-15 14:50:23 EDT
(In reply to comment #0)
> It seems that the station state isn't completely saved, i.e. MinstrelRate table
> should be stored in MinstrelWifiRemoteStation rather in the rate manager class
> itself.
> 
> This means that the rate manager bases decisions only on the last 100ms, rather
> than using the exponential weighting like it should.
> 
> Am I missing something ?

Hi. 
I think you are right. I was going to report a bug and found yours.

In particular I found 2 problems:
- The table is maintained in the Manager and so global for the node but it should be a parameter of the RemoteStation, i.e. a table for each associated node.
- The table is reseted at the end of UpdateStats losing information for the next update.

Matías
Comment 2 Jonathan Ling 2013-07-17 14:41:57 EDT
Matías,

Thanks for your confirmation.  I will move the table into the MinstrelWifiRemoteStation class, and post an update for testing.

>The table is reseted at the end of UpdateStats losing information for the
next update.

I noticed removing this line allows correct operation when the STA or AP has only one connection.

This might be helpful in some cases...
Comment 3 Matías Richart 2013-07-17 16:10:37 EDT
Created attachment 1642 [details]
Move the MinstrelTable to the remote station
Comment 4 Matías Richart 2013-07-17 16:11:17 EDT
(In reply to comment #2)
> Matías,
> 
> Thanks for your confirmation.  I will move the table into the
> MinstrelWifiRemoteStation class, and post an update for testing.
> 
> >The table is reseted at the end of UpdateStats losing information for the
> next update.
> 
> I noticed removing this line allows correct operation when the STA or AP has
> only one connection.
> 
> This might be helpful in some cases...

Hi Jonathan.
I've already made some modifications and it seems to work.
I attach it. Tell me what you think.
Comment 5 Jonathan Ling 2013-07-18 14:49:18 EDT
Matías,
I would say that this seems to work.  However our troubles are not over yet.  If you look at the ministrel table it isn't updated correctly when there are packet retries.   I've uploaded proposed changes (unfortunately not in patch form), along with extra code to print out the table for debugging purposes.

Let me know if you agree.
Comment 6 Jonathan Ling 2013-07-18 14:50:56 EDT
Created attachment 1644 [details]
Patch for retries
Comment 7 Jonathan Ling 2013-07-18 14:54:09 EDT
Created attachment 1645 [details]
Minstrel retry patch - 4 files in zip
Comment 8 Matías Richart 2013-07-19 13:20:13 EDT
(In reply to comment #7)
> Created attachment 1645 [details]
> Minstrel retry patch - 4 files in zip

Hi Jonathan.
I think I understand your changes but could you explain a bit more what are the implications or what it was wrong before?

What I understand of the current minstrel implementation is that numRateAttempt is updated when all the retransmissions (MaxSlrc times) for a given frame had failed. That's why it is updated in FinalDataFail.
It is difficult to determine if the implementation is correct (if it follows the original design) as this implementation is based on a low latency system and Minstrel is designed for a high latency system (see https://www.nsnam.org/bugzilla/show_bug.cgi?id=889)

Does your changes imply a different behaviour of the algorithm?
Comment 9 Jonathan Ling 2013-07-23 09:57:48 EDT
Apologies for not being more specific.

Basically we want for each packet sent at a particular rate, the success or failure to be reflected in the minstrel table, i.e. counters, and the ewma throughput.  This is then used for the choice of the next rate.

The current implementation does something odd when it comes to retries.  It seems to ignore the different rates chosen for 2nd/3rd, etc. retries, and only the original rate gets updated incorrectly.  This results in wrong throughput.

I will try to get an example out using the new debugging mechanism to print the minstrel table.
Comment 10 Daniel L. 2013-11-06 10:33:22 EST
Hi everyone,

Sorry for the late response, I'm not very familiar with Minstrel so it took me some time to understand what's going on here.

Thanks for the patch. I looked through the patch and moving the table to WifiRemoteStation makes sense. I think that is one obvious bug here.

I'm not sure about the low/high latency though. I thought (please correct me if I'm wrong) that the UpdateStats is roughly done every 100ms? Does that make it high latency already? I mean the first thing UpdateStats does is checking if it's time to update the statistics or not.
Comment 11 Daniel L. 2013-11-11 11:52:20 EST
As Jonathan pointed out, the the current implementation is incorrectly updating retry statistics (e.g. the failed look around rate is counted towards the "normal" rate).

Also, I think that the current re-transmission counts are incorrect? My understanding is that, for each rate in the retry chain, you can retry multiple times within a 6ms time limit (for a total of 24ms in the retry chain). The current implementation seems to change the rate as soon as the transmission failed. (The code has an if-statement to prevent that, but I think the limit is always 1?)
Comment 12 Matías Richart 2014-06-04 16:27:47 EDT
(In reply to Daniel L. from comment #11)
> As Jonathan pointed out, the the current implementation is incorrectly
> updating retry statistics (e.g. the failed look around rate is counted
> towards the "normal" rate).
> 
> Also, I think that the current re-transmission counts are incorrect? My
> understanding is that, for each rate in the retry chain, you can retry
> multiple times within a 6ms time limit (for a total of 24ms in the retry
> chain). The current implementation seems to change the rate as soon as the
> transmission failed. (The code has an if-statement to prevent that, but I
> think the limit is always 1?)

Hi Daniel. Sorry, but I've been without time to look at this sooner.

I think you are right in all you say. It seems that in the ns3 implementation is missing the calculation of the retry count for each segment of the retry chain and so it is always 1.
In the original minstrel code at linux-wireless the retry counts are calculated so that the hardware do not stay more than 6ms at each rate, as you mentioned.

I'll try to implement a patch with all the corrections we have been discussing.
Comment 13 Tom Henderson 2014-11-12 13:51:52 EST
fixed in changeset 11059:2fe905238013  (pushed patch found in Daniel's code review issue here:  https://codereview.appspot.com/59710043/)

this also resolved bug 1989
Comment 14 Tom Henderson 2014-11-14 13:03:20 EST
We need to reopen this due to valgrind issues with the recently committed patch, and the need to clean up some commented out code (and code that calls std::cout) in the previously committed patch.  I'll add the valgrind output as an attachment.
Comment 15 Tom Henderson 2014-11-14 13:04:45 EST
Created attachment 1918 [details]
valgrind output (text file)
Comment 16 Matías Richart 2015-01-07 18:11:09 EST
It appears that the memory problem is solved in changeset 11066	d78d77e4b920.
I attach a patch with the corrections of the commented code and the couts.
Comment 17 Matías Richart 2015-01-07 18:12:10 EST
Created attachment 1937 [details]
Correct commented code and couts
Comment 18 Tom Henderson 2015-01-07 18:18:51 EST
(In reply to Matías Richart from comment #16)
> It appears that the memory problem is solved in changeset 11066	d78d77e4b920.
> I attach a patch with the corrections of the commented code and the couts.

Matias, if we were to apply your cleanup patch that you just submitted, do you think that this bug can now be closed?  Are you aware of any other Minstrel issues?

We don't have any test scripts for validating minstrel.  Do you have a good reference/example for using Minstrel that we might be able to add to the codebase?
Comment 19 Matías Richart 2015-01-08 17:10:41 EST
(In reply to Tom Henderson from comment #18)
> (In reply to Matías Richart from comment #16)
> > It appears that the memory problem is solved in changeset 11066	d78d77e4b920.
> > I attach a patch with the corrections of the commented code and the couts.
> 
> Matias, if we were to apply your cleanup patch that you just submitted, do
> you think that this bug can now be closed?  Are you aware of any other
> Minstrel issues?
> 
> We don't have any test scripts for validating minstrel.  Do you have a good
> reference/example for using Minstrel that we might be able to add to the
> codebase?

Tom. I have just found a bug while running an example. It was giving a segmentation fault when testing two nodes out of range and logging in debug level. I attach a new patch with the cleanup and this correction.

Yes, I have an example with which is possible to reproduce the tests done here http://www.nsnam.org/wiki/GSOC2009Minstrel#Validation_Plan. I couldn't find anywhere the scripts used for that validation.
I attach the example as a patch.
The results are similar as the one on that page.

I think a test with 1 AP and 2 associated STAs is necessary so as to test some of the corrections of this bug.

The algorithm seems to work well. However, I found it difficult to come up with a thorough example that tests the correctness of the implementation.
Comment 20 Matías Richart 2015-01-08 17:12:28 EST
Created attachment 1938 [details]
Cleanup commented code and correct bug mentioned in comment #19
Comment 21 Matías Richart 2015-01-08 17:13:38 EST
Created attachment 1939 [details]
Example for using rate control algorithms
Comment 22 Tom Henderson 2015-01-25 15:08:13 EST
latest patches pushed in changeset 11154:305803f4125c