Short notice: due to holidays and people, rightfully, not paying much attention to the online world, this Wednesday there won't be a post from the "a bashism a week" series.
Enjoy the break.
December 24, 2012
December 19, 2012
A bashism a week: testing for equality
Well known, yet easy to find just about everywhere: using the "test"/"[" commands to test for equality with two equals signs (==).
Contrary to many programming languages, if you want to test for equality in a shell script you must only use the equals sign once.
Try to keep this in mind: under a shell that implements what is required by POSIX:2001, you may hit the unexpected in the following code.
Contrary to many programming languages, if you want to test for equality in a shell script you must only use the equals sign once.
Try to keep this in mind: under a shell that implements what is required by POSIX:2001, you may hit the unexpected in the following code.
if [ foo == foo ]; then echo expected else echo unexpected fi
December 18, 2012
Nicer, but stricter
Lately I've been working on making the redirector nicer to the mirrors and to some potential users. More specifically, those behind a caching proxy.
The redirector is now nicer to traditional web proxies by redirecting to objects that are known not to change with a "moved permanently" code (HTTP status code 301.) This applies to files in the pool/ directory and ".pdiff" files, among others.
Previously, a traditional caching web proxy would sometimes end up with multiple copies of the same object, fetched from different mirrors; and the redirection would not be cached at all. With this change, this is no longer the case.
Using a caching proxy that is aware of the Debian repository design is still more likely to yield better results, however: If my memory serves correctly, apt-cacher has the ability of updating the Packages, Sources, and similar files with the ".pdiff"s on the server side. Apt-Cacher-NG apparently can use debdelta, and so on.
Check my blog post about one APT caching proxy not being efficient for some comments related to those tools.
Another recent change is that mirrors that can't be used by the redirector will no longer be monitored as often as the other mirrors. For instance, if a mirror doesn't generate a trace file (used for monitoring) then the redirector will gradually limit the rate at which the mirror is checked.
This rate-limiting mechanism applies to different kinds of errors, and should reduce the amount of wasted time and bandwidth while still allowing automatic-detection of mirrors that recover.
Projection of a rate-limited mirror over six weeks. The mirror would have to fail in every attempt for that to happen.
N.b. there's a bump in the scale.
The rate limiter applies an initial exception to allow temporary errors to not affect the use of the mirror by the redirector. After that exception, it is pretty much linear. However, that chart doesn't really reflect the effect of the rate limiter, so put in comparison with the normal checking behaviour:
Comparison of the two behaviours over an 8 weeks period using a logarithmic scale.
Nice chart colours by Libreoffice.
The code to detect mirrors that don't perform a two-stages sync that I talked about in a previous post has not yet been integrated as the current implementation would be too expensive on the mirrors to just add it as-is.
While tracking down problems exposed to users, I decided to take a stricter approach as to what mirrors are used by the redirector. Suffice to say that the remaining mirrors using the obsolete anonftpsync are going to be ignored entirely. ftpsync has been around for a few years now and it is the standard tool.
Whether you are mirroring Debian, Raspbian, Ubuntu, or any other Debian-like packages repository, ftpsync is the right tool to use.
Most of the issues I've been discovering, and sometimes working around, affect direct users of the mirrors and are not related to the http.debian.net redirector. When not detected beforehand they happen to be exposed by the redirector, but like I said, I plan to be stricter in order to increase the redirector's reliability. Once a strict and reliable foundation is built, more workarounds might see their way in to better use the available resources.
That's it for now. The road is long, the challenge is great, and being an observer in an uncontrolled environment makes it even more interesting.
The redirector is now nicer to traditional web proxies by redirecting to objects that are known not to change with a "moved permanently" code (HTTP status code 301.) This applies to files in the pool/ directory and ".pdiff" files, among others.
Previously, a traditional caching web proxy would sometimes end up with multiple copies of the same object, fetched from different mirrors; and the redirection would not be cached at all. With this change, this is no longer the case.
Using a caching proxy that is aware of the Debian repository design is still more likely to yield better results, however: If my memory serves correctly, apt-cacher has the ability of updating the Packages, Sources, and similar files with the ".pdiff"s on the server side. Apt-Cacher-NG apparently can use debdelta, and so on.
Check my blog post about one APT caching proxy not being efficient for some comments related to those tools.
Another recent change is that mirrors that can't be used by the redirector will no longer be monitored as often as the other mirrors. For instance, if a mirror doesn't generate a trace file (used for monitoring) then the redirector will gradually limit the rate at which the mirror is checked.
This rate-limiting mechanism applies to different kinds of errors, and should reduce the amount of wasted time and bandwidth while still allowing automatic-detection of mirrors that recover.
Projection of a rate-limited mirror over six weeks. The mirror would have to fail in every attempt for that to happen.
N.b. there's a bump in the scale.
The rate limiter applies an initial exception to allow temporary errors to not affect the use of the mirror by the redirector. After that exception, it is pretty much linear. However, that chart doesn't really reflect the effect of the rate limiter, so put in comparison with the normal checking behaviour:
Comparison of the two behaviours over an 8 weeks period using a logarithmic scale.
Nice chart colours by Libreoffice.
The code to detect mirrors that don't perform a two-stages sync that I talked about in a previous post has not yet been integrated as the current implementation would be too expensive on the mirrors to just add it as-is.
While tracking down problems exposed to users, I decided to take a stricter approach as to what mirrors are used by the redirector. Suffice to say that the remaining mirrors using the obsolete anonftpsync are going to be ignored entirely. ftpsync has been around for a few years now and it is the standard tool.
Whether you are mirroring Debian, Raspbian, Ubuntu, or any other Debian-like packages repository, ftpsync is the right tool to use.
Most of the issues I've been discovering, and sometimes working around, affect direct users of the mirrors and are not related to the http.debian.net redirector. When not detected beforehand they happen to be exposed by the redirector, but like I said, I plan to be stricter in order to increase the redirector's reliability. Once a strict and reliable foundation is built, more workarounds might see their way in to better use the available resources.
That's it for now. The road is long, the challenge is great, and being an observer in an uncontrolled environment makes it even more interesting.
December 12, 2012
A bashism a week: $RANDOM numbers
Commonly used to sleep a random amount of time or to create unique temporary file names, $RANDOM is one of those bashisms that you are best avoiding it altogether.
It is not uncommon to see scripts generating a "unique" temporary file name with code that goes like: tempf="/tmp/foo.$RANDOM", or tempf="/tmp/foo.$$.$RANDOM".
Under some shells the "unique" temporary file name will be "/tmp/foo." for the first example code. So much for randomness, right?
Even if you go around it by defining $RANDOM to the output of cksum after reading some bytes from /dev/urandom, please: don't do that. Use the mktemp command instead.
When creating temporary files there's more than just generating a file name. Just don't do it on your own: use mktemp. Really, use it, the list of those who weren't using mktemp (or similar) is large enough as it is.
Don't even dare to mention the linux kernel-based protection against symlink attacks. There's no excuse for not using mktemp.
Tip: If you are going to use multiple temporary files, create a temporary directory instead. Use mktemp -d.
Tip: Don't reuse a temporary file's name, even if you unlink/remove it. Generate a new one with mktemp.
Tip: Reusing also means doing things like tmp="$(mktemp)"; some_command > "$tmp.stdout" 2> "$tmp.stderr"
Tip: Even if $RANDOM is not empty, don't use it. It could have been exported as an environment variable. Again, just use mktemp.
For the remaining cases where you may want a pseudo random number such as for sleeping a random number of seconds: you can use something as simple as $$. Use shell arithmetic to adjust it as needed: use the modulo operator, multiply it, etc.
If you think you need something more "random" than the process' id, then you should probably not be using $RANDOM in the first place.
It is not uncommon to see scripts generating a "unique" temporary file name with code that goes like: tempf="/tmp/foo.$RANDOM", or tempf="/tmp/foo.$$.$RANDOM".
Under some shells the "unique" temporary file name will be "/tmp/foo." for the first example code. So much for randomness, right?
Even if you go around it by defining $RANDOM to the output of cksum after reading some bytes from /dev/urandom, please: don't do that. Use the mktemp command instead.
When creating temporary files there's more than just generating a file name. Just don't do it on your own: use mktemp. Really, use it, the list of those who weren't using mktemp (or similar) is large enough as it is.
Don't even dare to mention the linux kernel-based protection against symlink attacks. There's no excuse for not using mktemp.
Tip: If you are going to use multiple temporary files, create a temporary directory instead. Use mktemp -d.
Tip: Don't reuse a temporary file's name, even if you unlink/remove it. Generate a new one with mktemp.
Tip: Reusing also means doing things like tmp="$(mktemp)"; some_command > "$tmp.stdout" 2> "$tmp.stderr"
Tip: Even if $RANDOM is not empty, don't use it. It could have been exported as an environment variable. Again, just use mktemp.
For the remaining cases where you may want a pseudo random number such as for sleeping a random number of seconds: you can use something as simple as $$. Use shell arithmetic to adjust it as needed: use the modulo operator, multiply it, etc.
If you think you need something more "random" than the process' id, then you should probably not be using $RANDOM in the first place.
December 05, 2012
Introducing: a bashism a week
No matter how many scripting programming languages exist, it appears that shell programming is here to stay around. In many cases it is fast, it "does the job", and best of all: it is available "everywhere". The shell is used by makefiles, on every call to system(), and whatnot.
However, it is a real pain, implementations differ from the standards, some implementations still in use pre-date them, they leave room for undefined behaviour, and bugs in the implementations are nothing but unknown. You can't just specify a given shell interpreter and think you've dealt with the problem. Writing shell scripts that are portable among many platforms is a nightmare, if even possible.
Surprisingly, in spite of all that, a great amount of shell scripts appear to work flawlessly in many systems.
The switch from bash to dash as the default shell interpreter in Debian wasn't done without quite some work (more if you list archived bug reports), and the work ain't over.
For the following months I will be writing about different "bashisms" every Wednesday, hopefully helping people write slightly-more-portable shell scripts. The posts are going to be focused on widely-seen bashisms, probably ignoring those that Debian's policy defines as required to be implemented.
The term "bashism" must be understood as any feature or behaviour not required by SUSv3 (aka POSIX:2001), no matter what its origins are or even if the behaviour is not exhibited by the bash shell.
One of the key points is documenting the script's requirements, starting by specifying the right shell interpreter in the shebang.
Let's see what comes out of this experiment.
As a matter of fact, I have a few months worth of posts written already. All posts are going to be published at scheduled dates, just like this very post.
However, it is a real pain, implementations differ from the standards, some implementations still in use pre-date them, they leave room for undefined behaviour, and bugs in the implementations are nothing but unknown. You can't just specify a given shell interpreter and think you've dealt with the problem. Writing shell scripts that are portable among many platforms is a nightmare, if even possible.
Surprisingly, in spite of all that, a great amount of shell scripts appear to work flawlessly in many systems.
The switch from bash to dash as the default shell interpreter in Debian wasn't done without quite some work (more if you list archived bug reports), and the work ain't over.
For the following months I will be writing about different "bashisms" every Wednesday, hopefully helping people write slightly-more-portable shell scripts. The posts are going to be focused on widely-seen bashisms, probably ignoring those that Debian's policy defines as required to be implemented.
The term "bashism" must be understood as any feature or behaviour not required by SUSv3 (aka POSIX:2001), no matter what its origins are or even if the behaviour is not exhibited by the bash shell.
One of the key points is documenting the script's requirements, starting by specifying the right shell interpreter in the shebang.
Let's see what comes out of this experiment.
As a matter of fact, I have a few months worth of posts written already. All posts are going to be published at scheduled dates, just like this very post.
December 03, 2012
Some things you wanted to know about http.debian.net
After quite a bit of, very welcome, feedback I've put together a FAQ page in an attempt to respond to the most common questions about http.debian.net.
Emails have been accumulating for a few weeks now, but I will get to them. So please be patient if you send me an email, or if you have sent me one.
Emails have been accumulating for a few weeks now, but I will get to them. So please be patient if you send me an email, or if you have sent me one.
Subscribe to:
Posts (Atom)