[Iccrg] Re: [tcpm] RFC5681: why halving FlightSize not cwnd?

Fri Sep 7 08:42:16 BST 2012

Hi all,

Not too long ago, Yuchung pointed out a very similar problem to the one here with our own work, draft-hurtig-tcpm-rtorestart-02.txt, which intends to make the RTO timer more aggressive if the really used window is very small. We used FlightSize in our algorithm to represent this "really used window". As with RFC 5681, our rationale is that FlightSize captures the amount of data that is actually transmitted onto the network, which may be significantly smaller than cwnd if the sender is application-limited.

Yuchung has now shown us that this choice is bad, twice. FlightSize captures what we want, but additionally, a small FlightSize is also always reached towards the end of an arbitrarily large window (yes, this is also an application-limited case - it's about the difference between an application with a low constant sending rate and the end of a transfer (or, similarly, an application with a stop-and-go behavior)). Neither our draft nor RFC 5681 really captures this situation. Indeed, it seems to me the same problem appears in draft-fairhurst-tcpm-newcwv-03.txt, which also relies on FlightSize in a similar way:

Section 4.2:
***
      Non-validated phase: FlightSize <(2/3)*cwnd.  This is the phase
      where the cwnd has a value based on a previous measurement of the
      available capacity, and the usage of this capacity has not been
      validated in the previous RTT.
***

If FlightSize reaches this value at the end of a larger window, I think this statement is wrong. Indeed, in accordance with Yuchung's examples, this would make the mechanisms in this draft kick in every time at the end of e.g. a 20-packet-long web flow (or indeed at the end of any transfer, with arbitrary length). I don't think that's the intention?

I suspect that we can find more examples of RFCs or drafts with this issue. The problem is indeed more fundamental - as Matt said, we're using the wrong state variables. I can well imagine that the problem automatically disappears with TCP Laminar, but as a smaller update to existing / ongoing work, what we really need is probably a means to differentiate Yuchung's case from the case of concern in at least RFC 5681, draft-hurtig-tcpm-rtorestart and draft-fairhurst-tcpm-newcwv-03.txt.

Here's a simple suggestion: can we identify Yuchung's case by saying that FlightSize has been continuously decreasing?

Cheers,
Michael

On 7. sep. 2012, at 08:53, gorry at erg.abdn.ac.uk wrote:

> If FlightSize is not ~cwnd, then at the time, the flow is not fully
> utilising cwnd, but there are a wide range of (app) behaviours that result
> in this condition. My thoughts are that we should update the behaviour in
> both in standards-track TCP and also in Laminar.
> 
> This has been the topic of the new-cwv draft that we have been proposing
> as an update to TCP. It's been discussed on the ICCRG list, and we're
> re-structuring our draft and expect to publish this revision in a week or
> so.
> 
> Gorry
> 
> 
>> On Thu, Sep 6, 2012 at 2:25 PM, Ethan Blanton <eblanton at cs.ohiou.edu>
>> wrote:
>>> Mark Allman spake unto us the following wisdom:
>>>> A few things ...
>>>> 
>>>>  - I don't buy Ethan's argument that the burden on the network is 4
>>>>    packets if you lose the 17th.  It seems to me the burden is
>>>> measured
>>>>    from the front of the window not the back.  So, in this case it was
>>>>    a burden of 17 packets that caused the loss.
>>> 
>>> Note that this was not intended to be my argument; my argument is that
>>> a TCP that doesn't "remember" such things (and 5681 does not) only
>>> knows about the 4 packets at the time of the loss, so it *thinks* the
>>> burden is 4.  This is clearly not optimal, and I would not argue that
>>> it is.  Perhaps I stated this poorly.
>> 
>> I would say that you are using the wrong state variables:  at this
>> point cwnd is simultaneously being used to suppress bursts and
>> remember the congestion state from before the application pause.   If
>> you parameterize it differently (e.g. Laminar) this becomes a
>> non-problem.
>> 
>>>>  - So, without additional schemes we're left with being too aggressive
>>>>    (using cwnd) or too conservative (using FlightSize).  But, if we're
>>>>    going to error that is probably the right direction.
>>> 
>>> Agreed.
>> 
>> Yes exactly.   One solution would be to pace from FS up to ccwind....
>> This case is described in the Laminar draft.
>> 
>>>>  - I probably would not have all that much heartburn making the
>>>>    ssthresh 10 in the case you describe as long as there was some
>>>>    knowledge that a cwnd of 20 was used recently.  I.e., it isn't the
>>>>    result of some large storage of permission to send that was built
>>>> up
>>>>    over time, but was in fact the result of the application's sending
>>>>    pattern.  I think one could design some rules around that notion
>>>>    that would be OK.
>>> 
>>> Also agreed.  I think it's reasonable to assume that the FlightSize in
>>> effect at the time a lost packet was *sent* is safe, for sure.
>> 
>> I point out this logic assumes that it was the instantaneous queue
>> length that triggered the losses (as it is for drop tail).  With RED
>> or CoDel, the losses are normally triggered by a persistent queue,
>> which may or may not depend on the instantaneous window at the time
>> the packet was either sent or received.   With CoDel, drops are
>> triggered when the minimum window size is still large enough to
>> sustain a queue....  The peak window has no effect on the drops
>> (unless the queue overflows).
>> 
>> Thanks,
>> --MM--
>> The best way to predict the future is to create it.  - Alan Kay
>> _______________________________________________
>> tcpm mailing list
>> tcpm at ietf.org
>> https://www.ietf.org/mailman/listinfo/tcpm
>> 
> 
> 
> _______________________________________________
> tcpm mailing list
> tcpm at ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm