[Iccrg] Re: Review draft-fairhurst-tcpm-newcwv-03 - response

Tue Aug 7 21:43:58 BST 2012

Hi Gorry,

two responses and a comment; the comment first: Did you post the existence of 
your draft on the rtcweb-cc list? This should be intersting for these guys as 
well.

1) cwnd = min(max(FlightSize*2, IW),cwnd)
I'd say this is needed to cover the case where 2*Flightsize > cwnd

2) halving the window on loss/congestion
Your draft should only address how to adapt the congestion window when 
application-limited. It should not preclaim what kind of congestion control 
is done otherwise. Halving the window on loss is part of the congestion 
control algorithm itself. You can say that in case of loss/congestion the 
cwnd should be reset to Flightsize-R and then the normal congestion response 
should be applied. In case of RFC5681 this means halving. Does this make 
sense to you?

Mirja

On Monday 06 August 2012 20:42:42 Gorry Fairhurst wrote:
> Mirja,
>
> Many thanks for your review. Since we got a number of people providing
> comments, we have tried to
> respond to many of the issues raised by updating the draft, but we
> may have missed something important, so see the specific responses
> below.
>
> Gorry
>
>  > Mirja Kuehlewind mirja.kuehlewind at ikr.uni-stuttgart.de
>  >
>  > Hi Gorry,
>  >
>  > Here as promised the more detailed comments:
>  >
>  > - section 4: Please add a reference to the later chapter that the
>
> value of 5
>
>  > minutes will be further discussed.
>  >
>  > - section 4.2: Please explain where the 2/3*cwnd comes from. I would
>
> have
>
>  > though validated phase is when flightsize >= cwnd or maybe >= cwnd -
>  > maxburstsize (as in today's implementation the cwnd will be limited to
>  > flightsize+maxburstsize anyway)
>
> These values were to reduce the effects of variations in the measured
> FlightSize.
> There is not a significant impact on performance for the flow, so we wished
> to introduce hysteris, and avoid flapping between the two states. We
> wondered if the choice of 1/4 and 3/4 was more consistent, and could be
> handled easier
> by integer decision.
>
>  > - section 4.3: Should the case that the 5 minutes are over here be
>
> mentioned
>
>  > as well...?
>  >
>  > - section 4.3: in 2. bulletpoint the reference is wrong; should be to
>  > section 4.3.2.
>
> OK checked these and rearranged section 4.3
>
>  > - section 4.3: I would merge the last paragraph into there first
>
> bulletpoint
>
>  > as I was asking me exactly that question when reading the first
>
> bulletpoint
>
> Done.
>
>  > - section 4.3.1: "by setting a burst size limit, using a pacing
>
> algorithm,
>
>  > or some other method." -> I think there should be one (or more)
>
> mechanisms
>
>  > specified here as part of the algorithm
>
> The community has to date been quite resistant on how to do pacing etc,
> but I
> agree if there is consensus to add more detail here. For the moment, we
> cite Joe Touch et al's draft, although old this provides a starting point.
>
>  > - section 4.3.1: "cwnd = max(FlightSize*2, IW)" -> Why flightsize*2?
>  > This can still be a very large not validated window. I would go for
>  > flightsize+maxburstsize (at least have static offset here).
>
> It's true this could be large, we could rethink this, but the rationale
> was that in this current RTT the sender could have sent more, and given
> an application is likely to be variable rate, we should not try to cap
> at the mean-rate or anything like that. The value 2 may be more applicable
> for SS though. We're willing to discuss this one.
>
>  > - section 4.3.1: "cwnd = max(FlightSize*2, IW)"
>  > -> If you keep this, it should
>  > be  "cwnd = min(max(FlightSize*2, IW),cwnd)"
>
> We thought about this, but is it right? cwnd < IW, and hence the difference
> between the two terms is only important for some range of cwnd/2<FD<cwnd?
>
>  > - section 4.3.1: "ssthresh = max(ssthresh, 3*cwnd/4)" -> Is this the
>
> new or
>
>  > the old cwnd? Why are you taking a different value than usually?
>
> This is hopefully now clearer in the rearrangement.
>
>  > - section 4.3.2: Why is the reaction to congestion different
>  > than what should be done when leaving the nonvalidated phase?
>
> Because the sender needs to be more conservative
>
>  > - section 4.3.2: "cwnd = ((FlightSize - R)/2)" -> I wouldn't want to
>  > specify the concrete congestion response in this document.
>  > There might be congestion
>  > control algorithms like DCTCP that so something different than
>
> halving (at
>
>  > least for ECN). The document should just say something like "perform the
>  > normal congestion response specified by the used congestion control
>
> algorithm
>
>  > after the adaption of the congestion window". You might even think
>  > about having two window variables here. Will the congestion control only
>  > controls the cwnd, your second window, which than determines the allowed
>  > packets in flight, might be larger. Moreover, you have to make sure that
>  > the congestion control will not raise the cwnd any further while being
>  > application-limited!
>
> I think DTCP could be different, but this document should directly
> target TCP.
>
>  > - section 4.4: I would make this an own section 5
>
> Done.
>
>  > - section 4.4: Are there references available on what the "idle
>
> intervals of
>
>  > common applications"
>
> We only know what we have seen - more information would be useful, but I
> think
> that any app that goes idle for periods of many minutes >>5 is not going to
> suffer if it needs to do SS.
>
>  > and "period for which the capacity of an Internet path
>  > may commonly be regarded as stable" are?
>
> This caused us to look a lot, and it was best guess. The worst case,
> probably some pathological failure or a marginal wireless link is
> probably not
> that interesting. So this is a compromise.
>
>  > - section 4.4: 3. paragraph: Maybe be even more strict and say something
>  > like "If rapid changes in the path characteristics are detected, the new
>  > method SHOULD not be used anymore".
>
> That would be cool with me, but I really do not think this signal exists
> unless
> the end host is the last system before the "problematic" link/router. I
> hesitate
> to try to predict what to do more generally.
>
>  > Maybe even require to monitor the RTT's
>  > and give a threshold when to disable this method.
>
> I think monitoring RTTs with variable rate traffic is not necessarily
> going to
> help a lot to detect anomolies, but combining several mechanisms together
> could help, is this just a little complicated for this case?
>
>  > - section 4.4: Also refer the respective TCP mechanisms that use a
>  > 5 minutes timer.
>
> We included an RFC ref for one.
>
>  > Mirja
>
> I do not know how best to deal with the burst issue you note, but I agree
> this is something that is desirable, but have evaded standards so far.
>
> On Wednesday 18 July 2012 19:55:14 Mirja Kühlewind wrote:
>  > Hi Gorry,
>  >
>  > I believe this draft addresses a very important problem which exists and
>  > need to be solved. But I'm not too sure about the solution that is
>  > proposed. But I also don't know what the right/perfect solution would
>  > be.
>  >
>  > My biggest concern is regarding potential bursts you might introduce.
>
> Maybe
>
>  > I'm too conservative but I believe whatever you gone chance in TCP, one
>  > thing to avoid is the chance to cause large bursts of packets.
>
> Recently, it
>
>  > was actually recognized that the block sending behavior of youTube
>  > (eventough it is using normal standard-conform TCP) leads to bursts and
>  > large loss probabilities. Actually that is a problem that exists in
>
> today's
>
>  > TCP and needs to be addressed too.
>  >
>  > So I guess what I propose is to limit the burst size. This is not
>  > completely though through yet but you could allow some slow start like
>  > behavior in CA if the flow has been application limited. So maybe your
>  > mechanism could be extended to allow only bursts of 2 (or 3) packets.
>
> That
>
>  > would mean that you would be able to increase your sending rate form
>  > 1*flightsize to 2*(or 3*)flightsize in one RTT if the flow was
>
> application
>
>  > limited before. Don't you think that might be already sufficient?
>  > In case of idle times this would (unfortunately) lead back to normal
>  > slow start behavior but the IW10 draft also specifies a restart window
>  > of 10. This might help the problem already a lot...?
>  > Btw. in Linux (and potentially in some RFC?) there is already a maximum
>  > burst size of 3 implemented but this is only used to limited the
>
> congestion
>
>  > window to flightsize + maxburstsize.
>  >
>  > I will send some more detailed comments on the draft tomorrow (don't
>  > have my notes with me at the moment). But one more point are the values
>
> you have
>
>  > chosen for the several thresholds. I know there is a whole section
>  > reasoning why 5 minutes have been chosen but that didn't really convince
>  > me. One more though on my proposal above (which is still not though
>  > through): If you would track the max cwnd since the last congestion
>  > notification event and use this as a threshold upto which the kind of
>  > slow-start-behavior in CA (as explained above) is allow, I don't
>
> think that
>
>  > there is any time limit needed at all...
>  >
>  > I guess this whole idea leads to something similar as proposed by TCP
>  > Laminar. As Yuchung mentioned those two proposals should be regarded
>  > together.
>  >
>  > Some more detailed comments tomorrow...
>  >
>  > Mirja
>  >
>  >
>  > _______________________________________________
>  > tcpm mailing list
>  > tcpm at ietf.org
>  > https://www.ietf.org/mailman/listinfo/tcpm

-- 
-------------------------------------------------------------------
Dipl.-Ing. Mirja Kühlewind
Institute of Communication Networks and Computer Engineering (IKR)
University of Stuttgart, Germany
Pfaffenwaldring 47, D-70569 Stuttgart

tel: +49(0)711/685-67973
email: mirja.kuehlewind at ikr.uni-stuttgart.de
web: www.ikr.uni-stuttgart.de
-------------------------------------------------------------------