[Iccrg] LT-TCP followup

Tue Aug 7 07:49:23 BST 2007

Hi Lachlan,

> Thanks for taking the time to try to understand this, and for your
> clear statements of your objections.  This is really clarifying things
> for me.

You're welcome - but it's really me who has to thank you for
taking the time to explain it in such detail! This conversation
is getting very interesting - as you know, I'm very interested
in the response to corruption and would hence love to see a
sound suggestion - that's why I'm very happy that you brought
us this proposal.

You made things quite clear here - this last email of yours is
much easier for me to understand than the prior ones. There are
several explanations from you in this email to which I would
only answer "Ah, okay, now I get it, thanks!" (e.g. the
economic analogy). Instead of doing so, I'll speed things up
by simply removing all these parts.

> Applications certainly can produce variable rate data.  It might be a
> particulary active scene in a video.  It might be that you're logging
> data from an experiment where something suddenly happens.  It might be
> that you're using telnet and suddenly 'cat' a file.

Now I understand that your scenario is: "first the application limits
the rate, then, all of a sudden, the rate jumps to a higher value
because the limit no longer exists". This seems to be the same idea
as further below, where you describe re-routing to a new path (which
I'd consider a more reasonable example for a flow to suddenly cause
a certain amount of congestion, as a congestion control mechanism would
normally prevent an application from such "jumps").

> > This assumes
> > that the packet loss ratio experienced by the two-hop flow is
> > exactly 2 times the packet loss ratio experienced by the one-hop
> > flows, which isn't true.
> > Even if we assume that p doesn't depend on the rate of the
> > flows under consideration (which is a story in itself...),
> > then p only becomes approximately 2p in the 2-hop case if
> > p is very small.
> > ... 2p-p^2...
> 
> I agree that this is not exactly true for loss rates, but it is as
> good an approximation as replacing bursty rates by a fluid limit.  For
> sources which produce congestion,  p  *is* very small.  A window of a
> mere 10 packets needs roughly p = 10^-2.  Approximating, 0.0199 by
> 0.02 is trivial compared to assuming AIMD has a well-defined rate.

"For sources which produce congestion", p can be anything.
The larger p is, the more important I'd consider it to
reasonably estimate it.

In the real Internet, p is very small indeed. However, p is
also under the influence of your own flow, which will make
the 2p estimate worse.

> > Since in fact p of the second hop will be less than
> > p of the first hop
> 
> a) Replace "loss" by "ECN" and this doesn't apply.

Why? Just like packets can only be dropped once along
a path, they can also only be ECN-marked once.

> > with all these weird
> > assumptions as a basis, it makes it hard for me to intuitively
> > understand what you say.
> 
> The assumptions are:
> - flow rate is roughly constant through the life of the path -- which
> is a 1% or better approximation for current networks

Who says that? Since the first hop is not unlikely to be the bottleneck,
the outgoing rate after the first router can be significantly lower.

> - loss probability is cumulative through the network -- again 1% or better

Who says that? Do you have measurements to back this up?

My personal guess is that this is quite wrong. In the simple
parking lot scenario that we discussed, where you argue
that the assumption that p of the two-hop flow is 2*p of
the one-hop flows, we also have the assumption that p
is constant throughout the network, which I absolutely
don't believe to be true in reality.

> - each "lost packet" causes one "loss event"

Quite improbable. Losses are typically clustered in the
Internet; in recent measurements that we did, we usually
lost 1, 2 or 3 packets in a row. If that happens within
a RTT, it's all one "loss event" (one event that a sender
should react to).

> - packet loss probability is equal for all flows

I could live with this one if you'd assume that all
flows traverse exactly the same path. In our parking
lot scenario, they don't, and as I said above, I don't
buy the assumption that the packet loss probability
is equal throughout the network - so I don't buy that
assumption too.

> I believe they're all pretty standard, and more intuitive than
> replacing the last two by something more accurate.

No, I think they're all completely wrong, and it's up
to you to show that they're not (i.e. back them up
with measurements).

Note that I'm not playing devil's advocate - I honestly
believe that they're completely wrong. For instance,
you'll probably lose some packets at some bottleneck
along the path, and lose nothing in most other routers,
as opposed to losing the same amount everywhere - if
that's the case, this renders some of your assumptions
above wrong.

> > > when the flow
> > > suddenly starts losing 90% of its packets, the TCP-fair thing to do
> > > would be to reduce its payload by *less* than a factor of 10.  Of
> > > course, to do that it needs to increase the sending rate slightly.
> >
> > If I understand you
> > correctly, what you're saying is, in order for the TCP receiver
> > to get exactly as many packets as it would get if there was
> > no corruption, the sender would have to increase the sending rate.
> No, I didn't mean that.  If we suddenly cause 10 times the congestion
> per received packet (re-routed over a path with 10 times the number of
> equally-congested links, but the same RTT), TCP would start giving us
> data at about 1/3  ( 1/sqrt(10) ) the rate.
> 
> If we want to have the same response when we suddently cause 10 times
> the congestion per received packet because of 90% corruption, then we
> need to send at a rate about 10/3 higher, so we still get data at a
> rate of about 1/3 the original.

I'm honestly very sorry, but I still don't get it. Why 10/3?

> > > This flow causes more congestion than the lossless flow, the same as a
> > > two-hop flow causes more congestion than a one-hop flow.  That is just
> > > what TCP's fairness scheme implies.
> >
> > Of course it causes more congestion, but why would TCP-friendliness
> > imply that you do that? It's all about the sending rate, not the
> > throughput as seen by the receiver.
> 
> I agree this is the biggest conceptual leap.  Where's Bob Briscoe when
> you need him? :)
> 
> TCP originally specified sending rates because it specified an
> algorithm run at the sender, not a fairness paradigm.  In a
> non-collapsed world with reliable links, sending rate is approximately
> receiving rate, so TCP essentially specifies both sending rates and
> receiving rates.  (If anyone on this list remembers discussion of why

I can't live with the "non-collapsed world with a reliable links"
assumption here. We're talking about severe cases in our examples -
severe corruption, possibly severe congestion.

> VJ introduced CWND at the sender, instead of getting the receiver to
> limit the window it advertised, I'd be hapy to hear it.)
> 
> When we start looking at networks where "sending rate" and "receiving
> rate" are significantly different, we must choose whether to keep TCPs
> receiving rate or its sending rate.  Let's look at reasons for each:
> 
> Sending rate:  reflects actual harm done to the network.
> Receiving rate:  determines actual benefit done to the user.
> 
> In Kelly's framework, the "harm done to the network" is correctly
> captured by the *congestion level*.  The sending rate is at best a
> surrogate for that, since the same sending rate causes different
> amounts of harm in different settings.  In the tradeoff between
> congestion caused and data delivered, the sensible (to me) "rate" of
> the flow is the data delivered.

As a measure of "harm done to the network", I agree, but not
as the rate that we want to control.

> > > In that case, being TCP-friendly implies we  should
> > > increasing the send-rate of flows suffering packet corruption.
> >
> > If I understood you correctly, I disagree. that's because
> > TCP-friendliness is about the sending rate. I think that
> > the earliest RFC to define it is RFC 2309:
> >
> >    We introduce the term "TCP-compatible" for a flow that behaves under
> >    congestion like a flow produced by a conformant TCP.  A TCP-
> >    compatible flow is responsive to congestion notification, and in
> >    steady-state it uses no more bandwidth than a conformant TCP running
> >    under comparable conditions (drop rate, RTT, MTU, etc.)
> >
> > Despite the slightly different term, it's clear that this is
> > about the bandwidth that is used, not about the rate at
> > which the receiver is supposed to get its data.
> 
> True.  It also answers the question that we're revisiting: "what
> impact on rate should the drop rate have?"  This says that if a packet
> is dropped due to a corrupt checksum, then it has to slow the source
> at least as much as Reno does for congestion loss.  That's clearly not
> the right response to corruption.
> 
> This is the RFC that we're moving forward from.  Let's not be
> constrained by its wording, but by what is sensible/fair (and then of
> course a backward-compatible implementation).  If a good reason can be
> put forward why the tradeoff between   congestion and sending rate
> (a "cost/cost" tradeoff) is more appropriate than the tradeoff between
>   congestion and receiving rate   (a "cost/benefit" tradeoff), I'm all
> ears.

I agree about moving forward from that RFC - but anyhow,
that's the common understanding of TCP-friendliness (as
a ton of other material shows), so you really can't call
your design TCP-friendly if it doesn't match that. That
was my point.

> > Somewhat - but at that point, I'd say that either I still
> > don't fully understanding it, or the main message is simply:
> > "in order for a receiver to get as much as it would
> > get from a standard TCP sender without corruption in
> > the network, the sender has to increase its rate".
> 
> As above, this is definitely  not  what I'm calling the TCP-friendly
> case.  (It's the "max-min" case in my draft, which is very different.)
>  We should definitely get less data through when we cause more
> congestion.  It's purely a question of how much less.

Okay, I get that now, but still don't understand the calculation
(the 10/3 above).

> I think there are two parts:
> 1) Using Mo and Walrand's  "alpha=2 fairness",  what is the "fair" rate?
> 2) Should TCP-friendliness refer to the receive rate or the send rate?
> 
> Should we focus on the first of these first?  Or do you think that
> would be a waste of time if we'll never agree on the second?

I think that you really can't use the term "TCP-friendliness"
if you refer to the receive rate. It's been defined otherwise,
and used a lot in that fashion. If you want to match the
receive rate of TCP, I'd suggest to invent a new term  :)

Cheers,
Michael