[Iccrg] LT-TCP followup

Mon Aug 6 17:58:00 BST 2007

Greetings Michael,

Thanks for taking the time to try to understand this, and for your
clear statements of your objections.  This is really clarifying things
for me.

On 06/08/07, Michael Welzl <michael.welzl at uibk.ac.at> wrote:
> > > > Lachlan Wrote
> > Intuitively, the wireless links being the last hop is the worst case,
> > because it causes the most congestion for a given received rate.
>
> you assume that the receiving
> end point of the wireless link drops any corrupt packet
> anyway (otherwise I absolutely don't understand why
> this would be the worst case)

I assume that the "packet lost" indicator is no larger than the
original packet.  If it is equal in size (like a changed checksum),
then loss anywhere is "equal worst-case".  If the packet is replaced
by a small ack-like "corruption flag", then loss on/after the last
congested link is strictly the worst case.

The latter approach is motivated by the suggestion earlier in this
list that burst-errors make packets either totally garbled or totally
reliable, but a link can still tell that a garbled packet was
received.  (It doesn't deal with the fact that we wouldn't know which
flow the packet belonged to, but that may be able to come from the
"link layer prediction" that Wes said was being used for
interplanetary IP header compression.)

> > flow which gets  1/k  of its packets through is exactly analogous to a
> > flow which uses  k  congested hops.  For a given receive rate, both
> > use  k  times "their share" of the resources, compared with a
> > single-hop lossless flow.
>
> Sorry: I just tried hard, but I don't understand that analogy.
> Maybe that's the problem. I can think of proportional fairness
> in terms of multiple congested resources, but I never understood
> how this relates to flows which have their packets dropped because
> of corruption.

It might help to think in terms of the economic analogy:  You get
charged for each packet you send through each link (at a level that
depends on how congested each link is), and you get benefit from each
packet you receive.  The amount you get charged is the same whether it
is charged at  k  links, or you are charged  k  times as much at one
link.  For a given "willingness to pay", you'll send at the same rate
in either case.

> > Imagine a two-hop "parking lot" topology, in
> > which the two-hop flow is application limited.
>
> What do you mean with application limited?
>
> Normally, to me, this means that the application can't send
> at the rate at which it should (according to what the network
> allows - i.e. a TCP can't increase it's rate as the congestion
> control mechanism normally would). Then why would it increase
> its sending rate at all? It can't!

Applications certainly can produce variable rate data.  It might be a
particulary active scene in a video.  It might be that you're logging
data from an experiment where something suddenly happens.  It might be
that you're using telnet and suddenly 'cat' a file.

> > "Proportional fairness" (which in our contexts allocates equal send
> > rate to users, regardless of drop rate) would allocate 2/3 of each
> > bottleneck capacity to the one-hop flows, and 1/3 of each capacity to
> > the two-hop flow.  That corresponds to setting a window proportional
> > to  1/p  for a drop rate of  p.  In this simple case, the long flow
> > causes twice the congestion per unit throughput (the same amount on
> > two routers), and consequently gets half the rate.  Good.
> >
> > TCP is "more fair", and gives a higher rate to the low-throughput
> > flow, by setting the window proportional to  1/sqrt(p).  (Is that
> > "acceptable"?  That's not for me to justify...)  This gives a ratio of
> > 1:sqrt(2) instead of 1:2 in the rates.  Thus, the flow with the lower
> > rate is allowed to create more total congestion.  This is "fair",
> > because it is the underdog and needs help.
>
> This assumes
> that the packet loss ratio experienced by the two-hop flow is
> exactly 2 times the packet loss ratio experienced by the one-hop
> flows, which isn't true.
> Even if we assume that p doesn't depend on the rate of the
> flows under consideration (which is a story in itself...),
> then p only becomes approximately 2p in the 2-hop case if
> p is very small.
> ... 2p-p^2...

I agree that this is not exactly true for loss rates, but it is as
good an approximation as replacing bursty rates by a fluid limit.  For
sources which produce congestion,  p  *is* very small.  A window of a
mere 10 packets needs roughly p = 10^-2.  Approximating, 0.0199 by
0.02 is trivial compared to assuming AIMD has a well-defined rate.

> Since in fact p of the second hop will be less than
> p of the first hop

a) Replace "loss" by "ECN" and this doesn't apply.
b) Again, this is a 1% approximation, trivial compared with the 50%
approximation of AIMD, or the x% approximation that RTT fairness or
jumbo frames are somehow "fair".

> with all these weird
> assumptions as a basis, it makes it hard for me to intuitively
> understand what you say.

The assumptions are:
- flow rate is roughly constant through the life of the path -- which
is a 1% or better approximation for current networks
- loss probability is cumulative through the network -- again 1% or better
- each "lost packet" causes one "loss event"
- packet loss probability is equal for all flows

I believe they're all pretty standard, and more intuitive than
replacing the last two by something more accurate.

> > when the flow
> > suddenly starts losing 90% of its packets, the TCP-fair thing to do
> > would be to reduce its payload by *less* than a factor of 10.  Of
> > course, to do that it needs to increase the sending rate slightly.
>
> If I understand you
> correctly, what you're saying is, in order for the TCP receiver
> to get exactly as many packets as it would get if there was
> no corruption, the sender would have to increase the sending rate.

No, I didn't mean that.  If we suddenly cause 10 times the congestion
per received packet (re-routed over a path with 10 times the number of
equally-congested links, but the same RTT), TCP would start giving us
data at about 1/3  ( 1/sqrt(10) ) the rate.

If we want to have the same response when we suddently cause 10 times
the congestion per received packet because of 90% corruption, then we
need to send at a rate about 10/3 higher, so we still get data at a
rate of about 1/3 the original.

> > This flow causes more congestion than the lossless flow, the same as a
> > two-hop flow causes more congestion than a one-hop flow.  That is just
> > what TCP's fairness scheme implies.
>
> Of course it causes more congestion, but why would TCP-friendliness
> imply that you do that? It's all about the sending rate, not the
> throughput as seen by the receiver.

I agree this is the biggest conceptual leap.  Where's Bob Briscoe when
you need him? :)

TCP originally specified sending rates because it specified an
algorithm run at the sender, not a fairness paradigm.  In a
non-collapsed world with reliable links, sending rate is approximately
receiving rate, so TCP essentially specifies both sending rates and
receiving rates.  (If anyone on this list remembers discussion of why
VJ introduced CWND at the sender, instead of getting the receiver to
limit the window it advertised, I'd be hapy to hear it.)

When we start looking at networks where "sending rate" and "receiving
rate" are significantly different, we must choose whether to keep TCPs
receiving rate or its sending rate.  Let's look at reasons for each:

Sending rate:  reflects actual harm done to the network.
Receiving rate:  determines actual benefit done to the user.

In Kelly's framework, the "harm done to the network" is correctly
captured by the *congestion level*.  The sending rate is at best a
surrogate for that, since the same sending rate causes different
amounts of harm in different settings.  In the tradeoff between
congestion caused and data delivered, the sensible (to me) "rate" of
the flow is the data delivered.

> > In that case, being TCP-friendly implies we  should
> > increasing the send-rate of flows suffering packet corruption.
>
> If I understood you correctly, I disagree. that's because
> TCP-friendliness is about the sending rate. I think that
> the earliest RFC to define it is RFC 2309:
>
>    We introduce the term "TCP-compatible" for a flow that behaves under
>    congestion like a flow produced by a conformant TCP.  A TCP-
>    compatible flow is responsive to congestion notification, and in
>    steady-state it uses no more bandwidth than a conformant TCP running
>    under comparable conditions (drop rate, RTT, MTU, etc.)
>
> Despite the slightly different term, it's clear that this is
> about the bandwidth that is used, not about the rate at
> which the receiver is supposed to get its data.

True.  It also answers the question that we're revisiting: "what
impact on rate should the drop rate have?"  This says that if a packet
is dropped due to a corrupt checksum, then it has to slow the source
at least as much as Reno does for congestion loss.  That's clearly not
the right response to corruption.

This is the RFC that we're moving forward from.  Let's not be
constrained by its wording, but by what is sensible/fair (and then of
course a backward-compatible implementation).  If a good reason can be
put forward why the tradeoff between   congestion and sending rate
(a "cost/cost" tradeoff) is more appropriate than the tradeoff between
  congestion and receiving rate   (a "cost/benefit" tradeoff), I'm all
ears.

> Somewhat - but at that point, I'd say that either I still
> don't fully understanding it, or the main message is simply:
> "in order for a receiver to get as much as it would
> get from a standard TCP sender without corruption in
> the network, the sender has to increase its rate".

As above, this is definitely  not  what I'm calling the TCP-friendly
case.  (It's the "max-min" case in my draft, which is very different.)
 We should definitely get less data through when we cause more
congestion.  It's purely a question of how much less.

I think there are two parts:
1) Using Mo and Walrand's  "alpha=2 fairness",  what is the "fair" rate?
2) Should TCP-friendliness refer to the receive rate or the send rate?

Should we focus on the first of these first?  Or do you think that
would be a waste of time if we'll never agree on the second?

Cheers,
Lachlan

-- 
Lachlan Andrew  Dept of Computer Science, Caltech
1200 E California Blvd, Mail Code 256-80, Pasadena CA 91125, USA
Phone: +1 (626) 395-8820    Fax: +1 (626) 568-3603