December 18, 2012

Nicer, but stricter

Lately I've been working on making the redirector nicer to the mirrors and to some potential users. More specifically, those behind a caching proxy.

The redirector is now nicer to traditional web proxies by redirecting to objects that are known not to change with a "moved permanently" code (HTTP status code 301.) This applies to files in the pool/ directory and ".pdiff" files, among others.
Previously, a traditional caching web proxy would sometimes end up with multiple copies of the same object, fetched from different mirrors; and the redirection would not be cached at all. With this change, this is no longer the case.

Using a caching proxy that is aware of the Debian repository design is still more likely to yield better results, however: If my memory serves correctly, apt-cacher has the ability of updating the Packages, Sources, and similar files with the ".pdiff"s on the server side. Apt-Cacher-NG apparently can use debdelta, and so on.
Check my blog post about one APT caching proxy not being efficient for some comments related to those tools.

Another recent change is that mirrors that can't be used by the redirector will no longer be monitored as often as the other mirrors. For instance, if a mirror doesn't generate a trace file (used for monitoring) then the redirector will gradually limit the rate at which the mirror is checked.
This rate-limiting mechanism applies to different kinds of errors, and should reduce the amount of wasted time and bandwidth while still allowing automatic-detection of mirrors that recover.


Projection of a rate-limited mirror over six weeks. The mirror would have to fail in every attempt for that to happen.
N.b. there's a bump in the scale.

The rate limiter applies an initial exception to allow temporary errors to not affect the use of the mirror by the redirector. After that exception, it is pretty much linear. However, that chart doesn't really reflect the effect of the rate limiter, so put in comparison with the normal checking behaviour:


Comparison of the two behaviours over an 8 weeks period using a logarithmic scale.
Nice chart colours by Libreoffice.

The code to detect mirrors that don't perform a two-stages sync that I talked about in a previous post has not yet been integrated as the current implementation would be too expensive on the mirrors to just add it as-is.

While tracking down problems exposed to users, I decided to take a stricter approach as to what mirrors are used by the redirector. Suffice to say that the remaining mirrors using the obsolete anonftpsync are going to be ignored entirely. ftpsync has been around for a few years now and it is the standard tool.
Whether you are mirroring Debian, Raspbian, Ubuntu, or any other Debian-like packages repository, ftpsync is the right tool to use.

Most of the issues I've been discovering, and sometimes working around, affect direct users of the mirrors and are not related to the http.debian.net redirector. When not detected beforehand they happen to be exposed by the redirector, but like I said, I plan to be stricter in order to increase the redirector's reliability. Once a strict and reliable foundation is built, more workarounds might see their way in to better use the available resources.

That's it for now. The road is long, the challenge is great, and being an observer in an uncontrolled environment makes it even more interesting.

No comments:

Post a Comment