Raphael's blog

April 08, 2013

How the world ended up in Costa Rica

Even though I haven't had much time to dedicate to http.debian.net lately, it has been up and running, or should I say serving?

Part of its job is to detect mirrors that have temporary issues or are entirely gone, down, unavailable. It does so, and many other things, by monitoring the so-called "trace files". A very important one being the "master" (or "origin") trace file.

With the recent integration of backports into the main archive, the master trace file of the backports mirrors also changed. Long story short, this change caused backports mirrors to no longer be considered by the mirror redirector as candidates. As long as they were up to date.

After the usual mirror synchronisation delay, more and more mirrors were disabled and subsets of "up to date" candidates re-calculated. This reached a critical point when only one mirror was left in the database. The mirror had not been synchronised for a couple of weeks.

This mirror is located in Costa Rica, and as the only candidate left in the database it was the only one used to serve requests for the backports archive. No matter where the client was located in the world.

The issue was later noticed and the necessary updates to the mirrors master list made. Mirrors started to be re-considered as they were re-checked (with some delay due to the rate limiter) and the subsets re-calculated. In a few hours everything was back to normality.

Correctness and fault-tolerance don't always get together very well...

March 29, 2013

Chocolate quote

Kinda appropriate for some recent events:

Il y a autant de générosité à recevoir qu'à donner

- Julien Green

Roughly translated to

There is only as much generosity to receive as there is to give.

March 27, 2013

A bashism a week: substrings (dynamic offset and/or length)

Last week I talked about the substring expansion bashism and left writing a portable replacement of dynamic offset and/or length substring expansion as an exercise for the readers.

The following was part of the original blog post, but it was too long to have everything in one blog post. So here is one way to portably replace said code.

Let's consider that you have the file name foo_1.23-1.dsc of a given Debian source package; you could easily find its location under the pool/ directory with the following non-portable code:

file=foo_1.23-1.dsc
echo ${file:0:1}/${file%%_*}/$file

Which can be re-written with the following, portable, code:

file=foo_1.23-1.dsc
echo ${file%${file#?}}/${file%%_*}/$file

Now, in the Debian archive source packages with names with the lib prefix are further split, so the code would need to take that into consideration if file is libbar_3.2-1.dsc.

Here's a non-portable way to do it:

file=libbar_3.2-1.dsc
if [ lib = "${file:0:3}" ]; then
    length=4
else
    length=1
fi

# Note the use of a dynamic length:
echo ${file:0:$length}/${file%%_*}/$file

While here's one portable way to do it:

file=libbar_3.2-1.dsc
case "$file" in
    lib*)
        length=4
    ;;
    *)
        length=1
    ;;
esac

length_pattern=
while [ 0 -lt $length ]; do
    length_pattern="${length_pattern}?"
    length=$(($length-1))
done

echo ${file%${file#$length_pattern}}/${file%%_*}/$file

The idea is to compute the number of interrogation marks needed and use them where needed. Here are two functions that can replace substring expansion as long as values are not negative (which are also supported by bash.)

genpattern() {
    local pat=
    local i="${1:-0}"

    while [ 0 -lt $i ]; do
        pat="${pat}?"
        i=$(($i-1))
    done
    printf %s "$pat"
}

substr() {
    local str="${1:-}"
    local offset="${2:-0}"
    local length="${3:-0}"

    if [ 0 -lt $offset ]; then
        str="${str#$(genpattern $offset)}"
        length="$((${#str} - $length))"
    fi

    printf %s "${str%${str#$(genpattern $length)}}"
}

Note that it uses local variables to avoid polluting global variables. Local variables are not required by POSIX:2001.

Enough about substrings!

Remember, if you rely on non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

March 20, 2013

A bashism a week: substrings

Sometimes obtaining a substring in a shell script is needed. The bashism of this week comes handy as it allows one to obtain a substring by indicating the offset and even the length of the substring. This is the ${varname:offset:length} bashism, also known as substring expansion.

The portable "replacements" are simple if the offset (and the length) are static. For example, the following code would print the substring of "foo" consisting of only the last two characters:

var=foo
# Replace the bashism ${var:1} with:
echo ${var#?}

The length can then be limited with additional pattern-matching removal expansions:

var="portable code"
# Replace the bashism ${var:3:5} with the following code

# Offset is 3, so we use three ? (interrogation) characters:
part=${var#???}

# Length is 5, so we use five ? characters:
echo ${part%${part#?????}}

As it can be seen, it is not impossible to replace a substring expansion.

The portable code becomes slightly more complex if the offset and/or the length are dynamic. I leave that as an exercise for the readers.

Feel free to post your code as a comment (use the <pre> tags, please) or in another public way. My own response is already scheduled to be published next week at the same time as usual.

Note: substring expansions can also be replaced with a wide variety of external commands. This is a pure-POSIX shell scripting example.

March 13, 2013

A bashism a week: assigning to variables and special built-ins

Assigning a value to a variable when executing a command is a way to populate the command's environment, without the variable assignment persisting after the command completes. This is not true, however, when a special built-in is the command being executed.

POSIX:2001 states that "Variable assignments specified with special built-in utilities remain in effect after the built-in completes".

Not only this is tricky because it depends on whether a utility is a special built-in or not, but the bash interpreter does not respect that behaviour of the POSIX standard. That is, special built-ins are not so "special" to the bash interpreter.

This leaves two things to take into account when assigning to a variable when executing a command: whether the command is a special built-in, and whether bash is interpreting the script.

Now, the list of special built-ins is rather short and it would be a bit unusual to perform variable assignments when calling them, except for some cases: "exec", "eval", "." (dot), and ":" (colon).

It is important to note that ":" and "true" differ in this regard; the former is a special built-in, the latter is just a utility. Watch out for this kind of differences when using ":" or "true" to nullify a command. E.g.

Compare

$ dash -c '
method=sed
# some condition or user setting ends up making:
method=true
# later:
foo=bar $method
echo foo: $foo'
foo:

To (redacted for brevity):

$ dash -c '
method=:
foo=bar $method
echo foo: $foo'
foo: bar

March 06, 2013

A bashism a week: returning

Inspired by Thorsten Glaser's comment about where you can break from, this "bashism a week" is about a behaviour not implemented by bash.

return is a special built-in utility, and it should only be used on functions and scripts executed by the dot utility. That's what the POSIX:2001 specification requires.

If you return from any other scope, for example by accidentally calling it from a script that was not sourced but executed directly, the bash shell won't forgive you: it does not abort the execution of commands. This can lead to undesired behaviour.

A wide variety of shell interpreters silently handle such calls to return as if exit had been called.

An easy way to avoid such undesired behaviours is to follow the best practice of setting the e option, i.e.

set -e

. With that option set at the moment of calling return outside of the allowed scopes, bash will abort the execution, as desired.

The POSIX specification does not guarantee the above behaviour either as the result in such cases is "unspecified", however.

February 27, 2013

A bashism a week: appending

The very well known appending operator += is a bashism commonly found in the wild. Even though it can be used for things such as adding to integers (when the variable is declared as such) or appending to arrays, it is usually used for appending to a string variable.

As I previously blogged about it, the appending operator bashism is only useful when programming for the bash shell.

Whenever you want to append a string to a variable, repeating the name of the variable is the portable way. I.e.

foo=foo
foo="${foo} bar"
# Instead of foo+=" bar", which is a bashism

See? Replacing the += operator is not rocket science.

Note: One should be aware that makefiles do have a += operator which is safe to use when appending to a make variable. But don't let this "exception" fool you: code in configure.ac and similar files is executed by the shell interpreter. So don't use the appending operator there.