[Iccrg] Explicit feedback

Mon Aug 7 17:36:16 BST 2006

Tony Li <tli at tropos.com> wrote:
> 
> IGRP has carried loading information since the early 80's.  Similarly,
> one of the early Arpanet routing algorithms (SPREAD, I think) carried
> load information.  Both were used in routing decisions and resulted in
> oscillation.

   Oscillation is a serious problem in routing. In a research environment,
it can be measured and appropriate damping can be calcultated. In actual
deployment, oscillation must be avoided.

> The control loop time constant for routing protocols is simply too long
> to be useful at conveying high frequency congestion event notification.
> However, very low frequency utilization information might be possible.
> I would tend to consider this first for IGPs, given their more limited
> scope than BGP.

   IGPs are a simpler problem, certainly; but oscillation must be avoided
there as well.

   IGRP is an excellent starting point to understand the issues. It
correctly identifies the issues of latency, bandwidth, load, and
reliability. More research, IMHO, would help determine a better way
of evaluating these and adjusting forwarding decisions.

- Latency is a minimal delay to reach the origin of the route. It
  obviously can be incremented by the (known) latency of each step
  along the way (not to be confused with the actual delay experienced
  by packets being queued along the way). I would suggest floating-point
  units of seconds.

- Bandwidth is the upper limit of bytes per second. It obviously can
  be evaluated and reduced if necessary at each forwarding step. I
  would suggest floating-point units of bytes per second.

- Load can be expressed either as bytes per second used or bytes per
  second unused. I would suggest the latter. I think it's clear that
  this should be a long-term average, say one hour. Further, I suggest
  that it be two (floating-point) numbers, such as mean and standard
  deviation.

- Reliability is a harder problem. Research is needed. As a starting
  point, I'd think in terms of reporting each failure to forward,
  with coding for the reason (discard, loss of carrier, etc.) and a
  floating-point estimate of bytes failed per bytes sent. Clearly
  such reports should not be propagated very far unless they show
  a substantial change in reliability.

   All of these numbers should be reported based upon the actual path
that a router will use for packets matching that route. We will have
to support environments in which different classes of packets are
routed differently. I assume routing updates will identify classes
to which the reported route applies. I also expect that multiple
paths may be used (without changing the routing updates sent).

   "Reliability", IMHO, should not be confused with route withdrawl.
If "reliability" actually reaches zero, obviously the route should
have been withdrawn; I would guess that "reliability" below 90%
might well justify route withdrawl, but perhaps research would show
otherwise. I only mean to suggest a benefit from reporting reduced
reliability appreciably before a route would be withdrawn.

   It's not clear what action a router "should" take when reduced
reliability is reported. In practice, we've found dropping one
packet per thousand to signal congestion can adequately maintain
basically dependable connections via end-to-end congestion avoidance.
I suspect some improvement might be possible through queueing and
redundancy (to avoid the need for retransmission), but I have no
data to back that up.

   (I hope it's clear I'm not talking about deploying changes to
backbone routers: I'm talking about research to find the sort of
changes which might be worth trying to deploy in the future.)

--
John Leslie <john at jlc.net>