Bug 2080 - MPTCP - FullMesh: queue_delayed_work is only successful on the first call.
MPTCP - FullMesh: queue_delayed_work is only successful on the first call.
Status: RESOLVED FIXED
Product: dce
Classification: Unclassified
Component: kernel
unspecified
PC Linux
: P5 normal
Assigned To: Hajime Tazaki
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-03-18 07:56 EDT by Richard Withnell
Modified: 2015-05-25 10:43 EDT (History)
1 user (show)

See Also:


Attachments
Patch for arch/sim/workqueue.c to fix bug. (488 bytes, patch)
2015-03-18 07:56 EDT, Richard Withnell
Details | Diff
Simulation file to replicate bug. (5.59 KB, text/x-c++src)
2015-03-18 08:25 EDT, Richard Withnell
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Withnell 2015-03-18 07:56:36 EDT
Created attachment 1995 [details]
Patch for arch/sim/workqueue.c to fix bug.

I'm trying to add multiple addresses to a nodes net device at arbitrary time values throughout a simulation. 

In mptcp_fullmesh.c, if mptcp_address_worker() exits, subsequent addresses that are added to a node do not get put on to the work queue, resulting in the second address not being added to the meta socket. 

To replicate, add an address to a net device at Second(x), then add another address at Second(x + 1). The second address will not cause mptcp_address_worker() to be called. 

If multiple addresses are added at Second(x), mptcp_address_worker() processes all events before exiting, and they are added successfully.  

In add_pm_event():

880	if (!delayed_work_pending(&fm_ns->address_worker))
881		queue_delayed_work(mptcp_wq, &fm_ns->address_worker,
882				   msecs_to_jiffies(500));

Delayed_work_pending() returns the expected result, but queue_delayed_work() returns false. 

I've fixed this with the following patch; Is there any reason for using WORK_STRUCT_PENDING instead of WORK_STRUCT_PENDING_BIT here?

diff --git a/arch/sim/workqueue.c b/arch/sim/workqueue.c
index fead47b..47dd75f 100644
--- a/arch/sim/workqueue.c
+++ b/arch/sim/workqueue.c
@@ -144,7 +144,7 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
       return queue_work (wq, work);
     }
 
-  if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
+  if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
     {
       sim_assert (!timer_pending (timer));
       dwork->wq = wq;
Comment 1 Richard Withnell 2015-03-18 08:25:56 EDT
Created attachment 1996 [details]
Simulation file to replicate bug.

I have attached my simulation file that demonstrates the bug. 

Running the simulation file, files-0/var/log/messages shows;

addr4_event_handler created event for 192.168.0.10, code 1 prio 0
addr4_event_handler created event for 10.1.0.10, code 1 prio 0
mptcp_alloc_mpcb: created mpcb with token 0x23b56742
mptcp_add_sock: token 0x23b56742 pi 1, src_addr:192.168.0.10:41546 dst_addr:172.16.1.1:5001, cnt_subflows now 1
mptcp_close: Close of meta_sk with tok 0x23b56742
mptcp_del_sock: Removing subsock tok 0x23b56742 pi:1 state 7 is_meta? 0


After applying the patch I get the following, which matches the behavior outside of NS3;

addr4_event_handler created event for 192.168.0.10, code 1 prio 0
addr4_event_handler created event for 10.1.0.10, code 1 prio 0
mptcp_alloc_mpcb: created mpcb with token 0x23b56742
mptcp_add_sock: token 0x23b56742 pi 1, src_addr:192.168.0.10:41546 dst_addr:172.16.1.1:5001, cnt_subflows now 1
mptcp_add_sock: token 0x23b56742 pi 2, src_addr:0.0.0.0:0 dst_addr:0.0.0.0:0, cnt_subflows now 2
mptcp_init4_subsockets: token 0x23b56742 pi 2 src_addr:10.1.0.10:0 dst_addr:172.16.1.1:5001
mptcp_close: Close of meta_sk with tok 0x23b56742
mptcp_del_sock: Removing subsock tok 0x23b56742 pi:2 state 7 is_meta? 0
mptcp_del_sock: Removing subsock tok 0x23b56742 pi:1 state 7 is_meta? 0


Assuming there are no issues with my suggested patch, all instances of test_set_bit should probably be changed to use WORK_STRUCT_PENDING_BIT.

As follows;

diff --git a/arch/sim/workqueue.c b/arch/sim/workqueue.c
index fead47b..d3f8b70 100644
--- a/arch/sim/workqueue.c
+++ b/arch/sim/workqueue.c
@@ -96,7 +96,7 @@ bool queue_work_on(int cpu, struct workqueue_struct *wq,
 {
   int ret = 0;
 
-  if (!test_and_set_bit (WORK_STRUCT_PENDING, work_data_bits(work))) {
+  if (!test_and_set_bit (WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
       list_add_tail (&work->entry, &wq->list);
       sim_task_wakeup (workqueue_task (wq));
       ret = 1;
@@ -119,7 +119,7 @@ void flush_workqueue(struct workqueue_struct *wq)
 bool cancel_work_sync(struct work_struct *work)
 {
   int retval = 0;
-  if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
+  if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
     {
       // work was not yet queued
       return 0;
@@ -144,7 +144,7 @@ bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
       return queue_work (wq, work);
     }
 
-  if (!test_and_set_bit(WORK_STRUCT_PENDING, work_data_bits(work)))
+  if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work)))
     {
       sim_assert (!timer_pending (timer));
       dwork->wq = wq;
Comment 2 Hajime Tazaki 2015-03-18 22:39:54 EDT
thanks for the report.

which version (i.e., branch) of net-next-sim tree are you using ?


I noticed this issue and fixed it in the HEAD branch (sim-ns3-dev-branch), but not backported yet..

https://github.com/direct-code-execution/net-next-sim/blob/sim-ns3-dev-branch/arch/lib/workqueue.c
Comment 3 Richard Withnell 2015-03-19 11:35:09 EDT
I'm currently on sim-ns3-3.14.0-branch.
Comment 4 Hajime Tazaki 2015-05-25 10:43:39 EDT
I've fixed this issue on the sim-ns3-3.14.0-branch: no other branches are ported yet.

https://github.com/direct-code-execution/net-next-sim/commit/d6ed19f948135e486bc55aaa886f26d0d7ae2b91

In order to track this issue, I've updated the test case for mptcp in ns-3-dce, based
on the example provided by Richard.

http://code.nsnam.org/ns-3-dce/rev/03b20ee32e51

Thanks, Richard !