[Iccrg] CUBIC I-D feedback from an implementor's perspective

Fri Nov 7 03:43:07 GMT 2008

Hi Lawrence,

Please use the following program to check the performance of cubic
root calculation. Check it out from http://lkml.org/lkml/2007/3/13/331
In the code, 'bictcp' uses a bisection method while ncubic_tab0 uses
precomputed values in the table.
They show 1032 and 79 clocks, respectively (more than 10 times
performance difference).
It might be useful to check how your cubic root calculation costs
compared to current optimizations.

Thanks,
Sangtae

On Thu, Nov 6, 2008 at 10:21 PM, Lisong Xu <xu at cse.unl.edu> wrote:
> Lawrence,
>
> Thanks! Please see my comments inline.
>
> Sangtae, could you please comment on the cubic root implementation? ("1032
> clocks vs 79 clocks") Thanks.
>
> Lisong
>
>>>> Units for variables
>>>> -------------------
>>
>> The constant C for example can have units of pkts *or* bytes per sec^3,
>> but not both. Therefore the value of C must be chosen appropriately
>> based on the units in use.
>>
>> If we assume that C=0.4 was chosen based on pkts and time in secs, then
>> changing C to 460 and redoing the bytes case results in the following:
>>
>> I believe all of these issues need to be addressed in a much more
>> thorough and explicit manner, perhaps partially in the draft and more so
>> in an external reference.
>
> Now I understand what you mean. You are right!
>
>
>>>> Explicit help for implementors
>>>> ------------------------------
>>>
>>> Our original CUBIC code used the bisection method to calculate the cubic
>>> root. Later it was replaced by a Newton-Raphson method with table loopups
>>> for small values. This results in more than 10 times performance improvement
>>> in the cubic root calculation. On average, the bisection
>>> method costs 1032 clocks while the improved version costs only 79 clocks.
>>>
>>> You can find more information at
>>> * Hemminger, S. Cubic root benchmark code.
>>> http://lkml.org/lkml/2007/3/13/331
>>> * Tarreau, W. Cubic optimization.
>>>
>>> http://git.kernel.org/?p=linux/kernel/git/davem/net2.6.git;a=commit;h=7e58886b45bc4a309aeaa8178ef89ff767daaf7f
>>
>> This is useful information and should be discussed or at least
>> referenced in the I-D. I'd be wary of referring people to GPL'd code
>> because of the licensing issues, hence my suggestion of including public
>> domain code fragments/pseudo code in the I-D.
>>
>> Alternatively, you could consider dual-licencing the CUBIC code with GPL
>> and a more liberal licence (e.g. BSD).
>>
>> I believe the method I implemented is somewhat different to what you
>> describe. I found the Newton-Raphson method in my reading but it seemed
>> overly complicated. My implementation currently uses the method
>> described in [3].
>>
>> When you refer to "1032 clocks vs 79 clocks", what is a "clock" exactly?
>> I'd be curious to do a quick comparison between my current method and
>> the figures you quote. My goal has been correctness and not optimisation
>> thus far, but still good to know.
>
> Sangtae, could you please comment on this?
>
>
>>>> TCP friendly region
>>>> -------------------
>>>>
>>>> The TCP friendly region (section 2.2) concept is still fuzzy for me.
>>>> During which periods of a congestion epoch would one expect the TCP
>>>> friendly region to actually have any effect?
>>>>
>>>> I think this concept and its importance needs to be explained more
>>>> clearly in the document.
>>>>
>>> I see. As described in Section 4.1, TCP friendly region has an effect
>>> when standard TCP performs well. For example, in the following two types of
>>> networks:
>>>      1. networks with a small bandwidth-delay product (BDP).
>>>      2. networks with a short RTT, but not necessarily a small BDP
>>>
>>
>> I assume you're referring to section 3.1 and not 4.1.
>
> I am referring to Section 4.1 "Fairness to standard TCP" of the following
> draft.
> http://www.ietf.org/internet-drafts/draft-rhee-tcpm-cubic-02.txt
>
>
>>>> Fast convergence
>>>> ----------------
>>>>
>>>> Why is the fast convergence heuristic so brutal in it's attempted
>>>> detection of new flows starting up?
>>>>
>>>> Having the hard check of "is cwnd < cwnd_prev" in place seems far too
>>>> black and white, particularly in BSD where cwnd is in bytes, so a 1 byte
>>>> difference will trigger the more aggressive backoff.
>>>>
>>>> Perhaps a "is cwnd < 0.97*prev_cwnd" or similar check would soften the
>>>> edge just a little bit?
>>>>
>>>
>>> Actually we have tested something like "cwnd < 0.97*prev_cwnd", but there
>>> is no obvious difference. This is why we are still using it.
>>>
>>
>> Were you testing with cwnd measured in units of pkts or bytes?
>
> in units of pkts
>
>
>> Yes, I'm playing devil's advocate here, but I hope this discussion will
>> improve the document which is on its way to becoming a standard.
>
> Thank you for your comments!
> Lisong
>