[Iccrg] CUBIC I-D feedback from an implementor's perspective

Thu Nov 6 09:18:20 GMT 2008

Hi Lisong,

To clarify something that I did not explain in my previous email, all of
the work I'm doing is to create clean room implementations of the
algorithms without referring to any existing code. I'm sitting down with
the draft, the FreeBSD modular congestion control framework [1,2] and
creating a completely separate BSD licenced implementation.

This hopefully puts into context why some of my
suggestions/questions/clarifications may seem obvious given that an
existing implementation is already available.

Now, on with the show. Additional comments inline...

Lisong Xu wrote:
> Thanks, Lawrence!
> 
> Please see my comments below.
> Lisong
> 
> Lawrence Stewart wrote:
>> Hi All,
>>

[snip]

>>
>> All comments below relate to draft-rhee-tcpm-cubic-01.txt
>>
>>
>> Units for variables
>> -------------------
>>
>> The correct units for variables appear to be ambiguous. (For example, I
>> presume cwnd is in pkts, and time t is in seconds - but I couldn't
>> find that clearly stated in the I-D.)
>>
>> For example in the BSD stack, cwnd is calculated and used in
>> bytes, and time is calculated in terms of kernel ticks. However putting
>> cwnd in bytes into the CUBIC equations produces results that I believe
>> are demonstrably incorrect, whereas using pkts seems to produce expected
>> results. Similar problems arise using ticks instead of regular time.
>>
>> It might even be worth providing equivalent functions for both pkt and
>> byte based cwnd so that the implementor doesn't have to think too hard.
>> At the moment, I'm just converting to pkts by dividing cwnd by smss, but
>> it's not ideal. And again, similar issues exist using ticks for time.
>>
> 
> 
> The unit of Cwnd does not matter for any equation in Section 3. When we 
> give the specific example of Cwnd in Section 4, such tables 1 and 2, the 
> unit of Cwnd is MSS, as stated in the tables.
> 
> 

I have to disagree with you on this point. A quick example will
hopefully illustrate why.

Let's use t=1, SMSS=1448 bytes and W_max=100000 bytes or 100000/1448 =
69 pkts.

Bytes case:

W(t) = 0.4 * [ 1 - cubic_root(100000 * 0.2 / 0.4)]^3 + 100000
      = 81584 bytes
      = 81584 / 1448 = 56 pkts

Pkts case:

W(t) = 0.4 * [ 1 - cubic_root(69 * 0.2 / 0.4)]^3 + 69
      = 64 pkts
      = 64 * 1448 = 92672 bytes

For what you say to be true, both values of W(t) should be identical,
which they are not.

The constants have been chosen based on some implicit assumptions
regarding the units of the various variables in the equations.

The constant C for example can have units of pkts *or* bytes per sec^3,
but not both. Therefore the value of C must be chosen appropriately
based on the units in use.

If we assume that C=0.4 was chosen based on pkts and time in secs, then
changing C to 460 and redoing the bytes case results in the following:

W(t) = 460 * [ 1 - cubic_root(100000 * 0.2 / 460)]^3 + 100000
      = 92670 bytes
      = 92670 / 1448 = 64 pkts

Changing units of cwnd and time both affect the outcome of the equations
in section 3. SMSS also effects the outcome of the calculations and yet
there is no discussion on this.

As Lachlan pointed out in his reply, there is the possibility of
introducing unfairness between flows using different SMSS which is
directly applicable to the I-Ds of proposed congestion control
protocols. I haven't as yet read much in the way of discussion on the
matter.

I believe all of these issues need to be addressed in a much more
thorough and explicit manner, perhaps partially in the draft and more so
in an external reference.

> 
>>
>> Convention for "beta"
>> ---------------------
>>
>> The I-D does not use the convention that beta represents the
>> multiplicative decrease factor (i.e. cwnd = cwnd * beta on congestion).
>> It instead uses beta in the following context: cwnd = cwnd * (1-beta) on
>> congestion.
>>

[snip]

> 
> You are right that maybe we should standardize it. I myself is sometimes 
> confused with the definition of this "multiplicative decrease factor" :-)
> 

For me, the terminology "multiplicative decrease factor" when applied to
beta implies very strongly "cwnd = cwnd*beta" on congestion. To my mind,
you can no longer call beta the multiplicative decrease factor if you
have to do "cwnd = cwnd*(1-beta)" on congestion. Beta is something else.
1-beta is the multiplicative decrease factor.

However, as I said I'm happy to be told otherwise if there is consensus
on another interpretation.

> 
>>
>> Overloaded variables
>> --------------------

[snip]

>>
>> Parentheses in equations
>> ------------------------
>>
>> I'd suggest adding parentheses to equations (e.g. 3 and 4 in particular)
>> to completely rule out any possibility of ambiguous interpretation. For
>> example with equation 4, is the last term [(3*beta)/(2-beta)] * (t/RTT)
>> or is it (3*beta)/[(2-beta) * (t/RTT)]?
>>
> 
>    (alpha/2 * (2-beta)/beta * 1/p)^0.5 (Eq. 3)
> is
>    [(alpha/2) * ((2-beta)/beta) * (1/p)]^0.5 (Eq. 3)
> 
> 
>    W_tcp(t) = W_max*(1-beta) + 3*beta/(2-beta)* t/RTT (Eq. 4)
> is
>    W_tcp(t) = W_max*(1-beta) + [3*beta/(2-beta)]*(t/RTT) (Eq. 4)
> 
> 

These sorts of minor clarifications help to ensure obvious mistakes are
avoided and leave no doubt in the reader's mind what the author really
wanted to convey.

> 
>>
>> Explicit help for implementors
>> ------------------------------
>>
>> I'd suggest adding a new (sub)section somewhere in the document
>> focusing on implementation related discussion.
>>
>> Calculating the cubic root of a number in the kernel is a non trivial
>> task because you need to be able to do it using fixed point math.
>>

[snip]

> 
> Our original CUBIC code used the bisection method to calculate the cubic 
> root. Later it was replaced by a Newton-Raphson method with table 
> loopups for small values. This results in more than 10 times performance 
> improvement in the cubic root calculation. On average, the bisection
> method costs 1032 clocks while the improved version costs only 79 clocks.
> 
> You can find more information at
> * Hemminger, S. Cubic root benchmark code. 
> http://lkml.org/lkml/2007/3/13/331
> * Tarreau, W. Cubic optimization.
> http://git.kernel.org/?p=linux/kernel/git/davem/net2.6.git;a=commit;h=7e58886b45bc4a309aeaa8178ef89ff767daaf7f 
> 

This is useful information and should be discussed or at least
referenced in the I-D. I'd be wary of referring people to GPL'd code
because of the licensing issues, hence my suggestion of including public
domain code fragments/pseudo code in the I-D.

Alternatively, you could consider dual-licencing the CUBIC code with GPL
and a more liberal licence (e.g. BSD).

I believe the method I implemented is somewhat different to what you
describe. I found the Newton-Raphson method in my reading but it seemed
overly complicated. My implementation currently uses the method
described in [3].

When you refer to "1032 clocks vs 79 clocks", what is a "clock" exactly?
I'd be curious to do a quick comparison between my current method and
the figures you quote. My goal has been correctness and not optimisation
thus far, but still good to know.

> 
>>
>> TCP friendly region
>> -------------------
>>
>> The TCP friendly region (section 2.2) concept is still fuzzy for me.
>> During which periods of a congestion epoch would one expect the TCP
>> friendly region to actually have any effect?
>>
>> I think this concept and its importance needs to be explained more
>> clearly in the document.
>>
> 
> 
> I see. As described in Section 4.1, TCP friendly region has an effect 
> when standard TCP performs well. For example, in the following two types 
> of networks:
>       1. networks with a small bandwidth-delay product (BDP).
>       2. networks with a short RTT, but not necessarily a small BDP
> 

I assume you're referring to section 3.1 and not 4.1.

Discussing section 3.1, there is no mention of the terminology "TCP
friendly region", so as a reader, I have no concept of what this
subsection is relating to. Reading sections 2.2 and 3.1 still leaves me
with very little material understanding of why the CUBIC TCP friendly
region is relevant or how it is likely to affect communications over a
standard congestion epoch.

Assuming we now have the understanding that 3.1 is referring to CUBIC's
TCP friendly region, simply stating the two sets of network conditions
it helps in is not the same as discussing why/how it is relevant. I
don't think in depth discussion needs to be in the I-D, but a reference
to a paper that discusses the concepts in depth would I suspect be a
worthwhile addition.

> 
>>
>> Fast convergence
>> ----------------
>>
>> Why is the fast convergence heuristic so brutal in it's attempted
>> detection of new flows starting up?
>>
>> Having the hard check of "is cwnd < cwnd_prev" in place seems far too
>> black and white, particularly in BSD where cwnd is in bytes, so a 1 byte
>> difference will trigger the more aggressive backoff.
>>
>> Perhaps a "is cwnd < 0.97*prev_cwnd" or similar check would soften the
>> edge just a little bit?
>>
> 
> Actually we have tested something like "cwnd < 0.97*prev_cwnd", but 
> there is no obvious difference. This is why we are still using it.
> 

Were you testing with cwnd measured in units of pkts or bytes?

Yes, I'm playing devil's advocate here, but I hope this discussion will 
improve the document which is on its way to becoming a standard.

Cheers,
Lawrence

[1] http://caia.swin.edu.au/reports/071218A/CAIA-TR-071218A.pdf

[2] http://svn.freebsd.org/viewvc/base/projects/tcp_cc_8.x/

[3] http://www.worldserver.com/turk/computergraphics/CubeRoot.pdf