Showing posts with label bashisms. Show all posts
Showing posts with label bashisms. Show all posts

November 13, 2013

A bashism a week: heredocs

One great feature of POSIX shells is the so-called heredoc. They are even available in languages such as Perl, PHP, and Ruby.

So where is the bashism?

It's in the implementation. What odd thing do you see below?


$ strace -fqe open bash -c 'cat <<EOF
foo
EOF' 2>&1 | grep -v /lib
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/dev/tty", O_RDWR|O_NONBLOCK|O_LARGEFILE) = 3
open("/proc/meminfo", O_RDONLY|O_CLOEXEC) = 3
[pid 6696] open("/tmp/sh-thd-1384296303", O_WRONLY|O_CREAT|O_EXCL|O_TRUNC|O_LARGEFILE, 0600) = 3
[pid 6696] open("/tmp/sh-thd-1384296303", O_RDONLY|O_LARGEFILE) = 4
[pid 6696] open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
foo
--- SIGCHLD (Child exited) @ 0 (0) ---


Yes, it uses temporary files!

So do ksh, pdksh, mksh, posh and possible other shells. Busybox's sh and dash do not use temporary files, though:


$ strace -fqe open dash -c 'cat <<EOF
foo
EOF' 2>&1 | grep -v /lib
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[pid 6767] open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
foo
--- SIGCHLD (Child exited) @ 0 (0) ---


Next time you want data to never hit a hard disk, beware that heredocuments and herestrings are best avoided.

October 09, 2013

A bashism a week: maths

You've probably already done some basic maths in shell scripts, but do you know what else you can actually do?

Pick at least 4 operations that you can do in bashisms-free shell scripts:

$((n+1))
$((n>8))
$((n^4))
$((--n))
$((n*=5))
$((n++))
$((n==1?2:3))

The POSIX:2001 standard defines the arithmetic expansion requirements, which leads us to selecting all of the above operations except two:

$((--n))
$((n++))

"--" and "++" are not required to be implemented, and in some cases they may lead to unexpected results, such as the following:


$ bash -c 'n=1; echo $((++n))'
2
$ dash -c 'n=1; echo $((++n))'
1


Remember, if you rely on any non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

October 02, 2013

A bashism a week: dangerous exports

As a user of a shell you have most likely had the need to export a variable to another process; i.e. set/modify an environment variable.

Now, how do you stop exporting an environment variable? can you export anything else?

The bash shell offers the -n option of the export built-in, claiming it "remove[s] the export property". However, this feature is not part of the POSIX:2001 standard and is, therefore, a bashism.

A portable way to stop exporting an environment variable is to unset it. E.g. the effect of "export MY_VAR=foo" can be reverted by calling "unset MY_VAR" - surely enough, this will also destroy the content of the variable.

An equivalent could then be:

# to stop exporting/"unexport" the MY_VAR environment variable:
my_var="$MY_VAR" ; unset MY_VAR ;
MY_VAR="$my_var" ; unset my_var ;


The above code will make a copy of the variable before destroying it and then restoring its content.

How about exporting other things? did you know that you can export shell functions?

With the bash shell, you can export a function with the -f parameter of the export built-in. Needless to say, this is a bashism. Its secret? it's just an environment variable with the name of the function and the rest of the function definition as its value.

Try this:

$ echo="() { /bin/echo 'have some bash' ; }" bash -c 'echo "Hello World!"'
have some bash


Yes, this means that if you can control the content of an environment variable passed to bash you can probably execute whatever code you want. It comes handy when you want to alter a script's behaviour without modifying the script itself.

Possibilities are endless thanks to bash's support for non-standard characters in function names. Functions with slashes can also be exported, for example:

/sbin/ifconfig() {
echo "some people say you should be using ip(1) instead" ;
}


Are you into bug hunting? export exec='() { echo mount this ; }'

September 25, 2013

A bashism a week: aliases

In a response to my blogpost about bashisms in function names, reader Detlef L pointed out in a comment that aliases allow non-standard characters in their names, contrary to functions. They could then be used to, for example, set an alias of the run-parts(1) command (cf. the blog post).

Aliases indeed allow characters such as commas ( , ) to be used in the alias name. However, aliases are an extension to the POSIX:2001 specification and are therefore bashisms. Moreover, the characters set defined by POSIX does not include dashes.

Last but not least, aliases belong to the list of shell features that are usually "disabled" when the shell is run in non-interactive mode. I.e.


$ bash <<EOF
alias true=false;
if true; then echo alias disabled; else echo alias enabled; fi
EOF

alias disabled
$ bash -i <<EOF # force interactive mode
alias true=false;
if true; then echo alias disabled; else echo alias enabled; fi
EOF

$ alias true=false;
$ if true; then echo alias disabled; else echo alias enabled; fi
alias enabled
$ exit


To add to the fun of different behaviours, other shells always expand aliases.

If you decide to play with aliases you should note one thing: they are only enabled from the line after the definition. E.g. note the difference below

$ dash -c 'alias foo="echo foo"; foo'
dash: foo: not found
$ dash -c 'alias foo="echo foo";
foo'
foo

September 18, 2013

A bashism a week: testing strings

The bashism of this week deals again with tests. How do you compare two strings to tell which one goes before or after the other in the alphabet in a shell script?
(also know as lexicographic comparison)

If you are familiar with perl and the relationship between its comparison operators and that of shell scripts, you would probably write the test as follows:


$ test bar '<' foo
$ test foo '>' bar


And yes, that would work, but not with all shells. I'm afraid to tell you that the < and > comparison operators are not required by the POSIX:2001 specification and are, therefore, bashisms.

What does this have to do with perl? well, the shell way is the inverse of the perl way.
What? In Perl you test the equality of two numbers with '==', in shell you use '-eq'; in perl the equality of strings is 'eq', in shell it is '='. In shell scripts you can compare two numbers with -gt (test 1 -gt 0) while in perl the equivalent comparison is with > (1 > 0).

So in this case the perl way is "bar lt foo" and based on the above the shell way should be "bar < foo". However, care must be taken when using such operators in shell scripts. Since < and > are used for redirections in shell scripts, one must quote them even when using bash. An alternative is to use the [[ special test function of bash which alters the shell syntax.

This time I didn't write a function to portably (or "somewhat portably") replace such comparisons. Feel free to share your solution in the comments, with bonus points if you come up with a solution without using external commands.

September 04, 2013

A bashism a week: tilde expansion

Did you know that you can get a user's home directory with the "~username" tilde expansion? You probably do, but how about other tilde expansions?

Thanks to a few bashisms you can make your script even more difficult to read by using "~+" to get $PWD, and "~-" to get $OLDPWD.

Are you using bash and want to access the directories stack created when using the pushd and popd bashisms? The ~i (where i is an integer) expansion can give you that. It gives you forward (~+i) and backward (~-i) access to the directories stack.

But beware, if the directories stack is smaller than the number you used, there won't be any expansion.

When using tildes in shell scripts, make sure you quote to avoid unwanted expansions. Note that the posh shell in wheezy and older do support those non-POSIX expansions.

in the first examples the tilde is quoted for easier reading; the expansion doesn't occur if the tilde is quoted.

August 21, 2013

A bashism a week: function names

The bashism of this week is easy to hit when overriding the execution of a command with a shell function. Think of the following example scenario:

Replacing the yes(1) command with a shell function:
$ exec /bin/bash
$ type -t yes
file
$ yes() { while :; do echo ${1:-y}; done; }
$ type -t yes
function

Now every time yes is called the above-defined shell function will be called instead of /usr/bin/yes.

Apply the same principle to replace the run-parts(8) command with the following overly-simplified shell function:

$ run-parts() { 
    if [ "$1" = "--test" ]; then
        shift;
        simulate=true;
    else
        simulate=false;
    fi;
    for f in "$1"/*; do
        [ -x "$f" ] && [ -f "$f" ] || continue;
        case "$(basename "$f")" in 
            *[!a-zA-Z0-9_/-]*)
                :
            ;;
            *)
                if $simulate; then
                    echo $f;
                else
                    $f;
                fi
            ;;
        esac;
    done
}
$ type -t run-parts
function
(note the use of negative matching)

It also works as expected. However, when running it under a shell that only supports the function names required by the POSIX:2001 specification it will fail. One such shell is dash, which aborts with a "Syntax error: Bad function name", another is posh which aborts with a "run-parts: invalid function name".

If you ever want to have function names with dashes, equal signs, commas, and other unusual characters make sure you use bash and ksh-like shells (and keep that code to yourself). Yes, you can even have batch-like silencing of stdout with
function @ { "$@" > /dev/null ; }

Update: there were missing quotation marks in the example @ function.

August 14, 2013

A bashism a week is back

After a while without posts on the a bashism a week series, it is coming back!

Next week, at the usual time and day of the week, the series of blog posts about bashisms will be back for at least one more month. Subscribe via Atom and don't miss any post and check all the previous posts.

The a bashism a week series cover some of the differences between bash and the behavior of other shells, and the requirements by the POSIX standard regarding shell scripting. Or put simply: they are a guide to common bashisms, allowing you to identify them and avoid their use for a more compatible and portable code.

Happy reading!

May 22, 2013

Dealing with bashisms in proprietary software

Sometimes it happens that for one reason or another there's a need to use a proprietary application (read: can not be modified due to its licence) that contains bashisms. Since the application can not be modified and it might not be desirable to change the default /bin/sh, dealing with such applications can be a pain. Or not.

The switchsh program (available in Debian) by Marco d'Itri can be used to execute said application under a namespace where bash is bind-mounted on /bin/sh. The result:

$ sh --help
sh: Illegal option --
$ switchsh sh --help | head -n1
GNU bash, version 4.1.5(1)-release-(i486-pc-linux-gnu)

Simple, yet handy.

March 27, 2013

A bashism a week: substrings (dynamic offset and/or length)

Last week I talked about the substring expansion bashism and left writing a portable replacement of dynamic offset and/or length substring expansion as an exercise for the readers.

The following was part of the original blog post, but it was too long to have everything in one blog post. So here is one way to portably replace said code.

Let's consider that you have the file name foo_1.23-1.dsc of a given Debian source package; you could easily find its location under the pool/ directory with the following non-portable code:
file=foo_1.23-1.dsc
echo ${file:0:1}/${file%%_*}/$file

Which can be re-written with the following, portable, code:
file=foo_1.23-1.dsc
echo ${file%${file#?}}/${file%%_*}/$file

Now, in the Debian archive source packages with names with the lib prefix are further split, so the code would need to take that into consideration if file is libbar_3.2-1.dsc.

Here's a non-portable way to do it:
file=libbar_3.2-1.dsc
if [ lib = "${file:0:3}" ]; then
    length=4
else
    length=1
fi

# Note the use of a dynamic length:
echo ${file:0:$length}/${file%%_*}/$file

While here's one portable way to do it:
file=libbar_3.2-1.dsc
case "$file" in
    lib*)
        length=4
    ;;
    *)
        length=1
    ;;
esac

length_pattern=
while [ 0 -lt $length ]; do
    length_pattern="${length_pattern}?"
    length=$(($length-1))
done

echo ${file%${file#$length_pattern}}/${file%%_*}/$file

The idea is to compute the number of interrogation marks needed and use them where needed. Here are two functions that can replace substring expansion as long as values are not negative (which are also supported by bash.)

genpattern() {
    local pat=
    local i="${1:-0}"

    while [ 0 -lt $i ]; do
        pat="${pat}?"
        i=$(($i-1))
    done
    printf %s "$pat"
}

substr() {
    local str="${1:-}"
    local offset="${2:-0}"
    local length="${3:-0}"

    if [ 0 -lt $offset ]; then
        str="${str#$(genpattern $offset)}"
        length="$((${#str} - $length))"
    fi

    printf %s "${str%${str#$(genpattern $length)}}"
}

Note that it uses local variables to avoid polluting global variables. Local variables are not required by POSIX:2001.

Enough about substrings!

Remember, if you rely on non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

March 20, 2013

A bashism a week: substrings

Sometimes obtaining a substring in a shell script is needed. The bashism of this week comes handy as it allows one to obtain a substring by indicating the offset and even the length of the substring. This is the ${varname:offset:length} bashism, also known as substring expansion.

The portable "replacements" are simple if the offset (and the length) are static. For example, the following code would print the substring of "foo" consisting of only the last two characters:
var=foo
# Replace the bashism ${var:1} with:
echo ${var#?}

The length can then be limited with additional pattern-matching removal expansions:
var="portable code"
# Replace the bashism ${var:3:5} with the following code

# Offset is 3, so we use three ? (interrogation) characters:
part=${var#???}

# Length is 5, so we use five ? characters:
echo ${part%${part#?????}}

As it can be seen, it is not impossible to replace a substring expansion.

The portable code becomes slightly more complex if the offset and/or the length are dynamic. I leave that as an exercise for the readers.

Feel free to post your code as a comment (use the <pre> tags, please) or in another public way. My own response is already scheduled to be published next week at the same time as usual.

Note: substring expansions can also be replaced with a wide variety of external commands. This is a pure-POSIX shell scripting example.

March 13, 2013

A bashism a week: assigning to variables and special built-ins

Assigning a value to a variable when executing a command is a way to populate the command's environment, without the variable assignment persisting after the command completes. This is not true, however, when a special built-in is the command being executed.

POSIX:2001 states that "Variable assignments specified with special built-in utilities remain in effect after the built-in completes".

Not only this is tricky because it depends on whether a utility is a special built-in or not, but the bash interpreter does not respect that behaviour of the POSIX standard. That is, special built-ins are not so "special" to the bash interpreter.

This leaves two things to take into account when assigning to a variable when executing a command: whether the command is a special built-in, and whether bash is interpreting the script.

Now, the list of special built-ins is rather short and it would be a bit unusual to perform variable assignments when calling them, except for some cases: "exec", "eval", "." (dot), and ":" (colon).

It is important to note that ":" and "true" differ in this regard; the former is a special built-in, the latter is just a utility. Watch out for this kind of differences when using ":" or "true" to nullify a command. E.g.

Compare
$ dash -c '
method=sed
# some condition or user setting ends up making:
method=true
# later:
foo=bar $method
echo foo: $foo'
foo: 
To (redacted for brevity):
$ dash -c '
method=:
foo=bar $method
echo foo: $foo'
foo: bar

March 06, 2013

A bashism a week: returning

Inspired by Thorsten Glaser's comment about where you can break from, this "bashism a week" is about a behaviour not implemented by bash.

return is a special built-in utility, and it should only be used on functions and scripts executed by the dot utility. That's what the POSIX:2001 specification requires.

If you return from any other scope, for example by accidentally calling it from a script that was not sourced but executed directly, the bash shell won't forgive you: it does not abort the execution of commands. This can lead to undesired behaviour.

A wide variety of shell interpreters silently handle such calls to return as if exit had been called.

An easy way to avoid such undesired behaviours is to follow the best practice of setting the e option, i.e.
set -e
. With that option set at the moment of calling return outside of the allowed scopes, bash will abort the execution, as desired.

The POSIX specification does not guarantee the above behaviour either as the result in such cases is "unspecified", however.

February 27, 2013

A bashism a week: appending

The very well known appending operator += is a bashism commonly found in the wild. Even though it can be used for things such as adding to integers (when the variable is declared as such) or appending to arrays, it is usually used for appending to a string variable.

As I previously blogged about it, the appending operator bashism is only useful when programming for the bash shell.

Whenever you want to append a string to a variable, repeating the name of the variable is the portable way. I.e.
foo=foo
foo="${foo} bar"
# Instead of foo+=" bar", which is a bashism

See? Replacing the += operator is not rocket science.

Note: One should be aware that makefiles do have a += operator which is safe to use when appending to a make variable. But don't let this "exception" fool you: code in configure.ac and similar files is executed by the shell interpreter. So don't use the appending operator there.

February 20, 2013

A bashism a week: pushing and pop'ing directories

Want to switch back-and-forth between directories in your shell script?
The bashism of this week can be of some help, but for most needs, the cd utility is more than enough.

pushd, popd, and the extra built-in dirs are bashisms that allow one to create and manipulate a stack of directory entries. For a simple, temporary, switch of directories the following code is portable as far as POSIX:2001 is concerned:

cd /some/directory
  touch some files
  unlink others
  # etc
cd - >/dev/null
# We are now back at where we were before the first 'cd'

Which is equivalent to the following, also portable, code:

cd /some/directory
  touch some files
  unlink others
  # etc
cd "$OLDPWD"
# We are now back at where we were before the first 'cd'

Multiple switches can also be implemented portably without storing the name of the directories in variables at the expense of using subshells (and their side-effects).

However, if you think you can solve your problem more conveniently by using "pushd" and "popd" don't forget to document the need of those built-ins and to adjust the shebang of your script to that of a shell that implements them, such as bash.

February 13, 2013

A bashism a week: negative matches

Probably due to the popular way of expressing the negation of a character class in regular expressions, it is common to see negative patterns such as [^b] in shell scripts.

However, using an expression such as [^b] where the shell is the one processing the pattern will cause trouble with shells that don't support that extension. The right way to express the negation is using an exclamation mark, as in: [!b]

Big fat note: this only applies to patterns that the shell is responsible for processing. Some of such cases are:

case foo in
    [!m]oo)
        echo bar
    ;;
esac
and
# everything but backups:
for file in documents/*[!~]; do
    echo doing something with "$file" ...
done

If the pattern is processed by another program, beware that most won't interpret the exclamation the way the shell does. E.g.

$ printf "foo\nbar\nbaz\n" | grep '^[^b]'
foo
$ printf "foo\nbar\nbaz\n" | grep '^[!b]'
bar
baz

February 06, 2013

A bashism a week: short-circuiting tests

The test/[ command is home to several bashisms, but as I believe I have demonstrated: incompatible behaviour is to be expected.

The "-a" and "-o" binary logical operators are no exception, even if documented by the Debian Policy Manual.

One feature of writing something like the following code, is that upon success of the first command, the second won't be executed: it will be short-circuited.
[ -e /dev/urandom ] || [ -e /dev/random ]

Now, using the "-a" or "-o" bashisms even in shell interpreters that support them can result in unexpected behaviour: some interpreters will short-circuit the second test, others won't.

For example, bash doesn't short-circuit:
$ strace bash -c '[ -e /dev/urandom -o -e /dev/random ]' 2>&1|grep /dev
stat64("/dev/urandom", ...) = 0
stat64("/dev/random", ...) = 0
Neither does dash:
$ strace dash -c '[ -e /dev/urandom -o -e /dev/random ]' 2>&1|grep /dev
stat64("/dev/urandom", ...) = 0
stat64("/dev/random", ...) = 0
But posh does:
$ strace posh -c '[ -e /dev/urandom -o -e /dev/random ]' 2>&1|grep /dev
stat64("/dev/urandom", ...) = 0
And so does pdksh:
$ strace pdksh -c '[ -e /dev/urandom -o -e /dev/random ]' 2>&1|grep /dev
stat64("/dev/urandom", ...) = 0

output of strace redacted for brevity

So even in Debian, where the feature can be expected to be implemented, its semantics are not very well defined. So much for using this bashism... better avoid it.

Remember, if you rely on any non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

January 30, 2013

A bashism a week: sleep

To delay execution of some commands in a shell script, the sleep command comes handy.
Even though many shells do not provide it as a built-in and the GNU sleep command is used, there are a couple of things to note:

  • Suffixes may not be supported. E.g. 1d (1 day), 2m (2 minutes), 3s (3 seconds), 4h (4 hours).
  • Fractions of units (seconds, by default) may not be supported. E.g. sleeping for 1.5 seconds may not work under all implementations.

This of course is regarding what is required by POSIX:2001; it only requires the sleep command to take an unsigned integer. FreeBSD's sleep command does accept fractions of seconds, for example.

Remember, if you rely on any non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time.

In this case, since the sleep command is not required to be a built-in, it does not matter what shell you specify in your script's shebang. Moreover, calling /bin/sleep doesn't guarantee you anything. The exception is if you specify a shell that has its own sleep built-in, then you could probably rely on it.

The easiest replacement for suffixes is calculating the desired amount of time in seconds. As for the second case, you may want to reconsider your use of a shell script.

January 23, 2013

A bashism a week: output redirection

Redirecting stdout and stderr to the same file or file descriptor with &> is common and nice, except that it is not required to be supported by POSIX:2001. Moreover, trying to use it with shells not supporting it will do exactly the opposite:

  1. The command's output (to stdout and stderr) won't be redirected anywhere.
  2. The command will be executed in the background.
  3. The file will be truncated, if redirecting to a file and not using >>.

Are the characters saved worth those effects? I don't think so. Just use this instead: "> file 2>&1". Make sure you get the first redirection right, "&> file 2>&1" isn't going to do the trick.

January 16, 2013

A bashism a week: ulimit

Setting resource limits from a shell script is commonly done with the ulimit command.
Shells provide it as a built-in, if they provide it at all. As far as I know, there is no non-built-in ulimit command. One could be implemented with the Linux-specific prlimit system call, but even that requires a fairly "recent" kernel version (circa 2010).

Depending on the kind of resource you want to limit, you may get away with what some shells such as dash provide: CPU, FSIZE, DATA, STACK, CORE, RSS, MEMLOCK, NPROC, NOFILE, AS, LOCKS. I.e. options tfdscmlpnvw, plus H for the hard limit, S for the soft limit, and a for all. Bash allows other resources to be limited.

Remember, if you rely on any non-standard behaviour or feature make sure you document it and, if feasible, check for it at run-time. ulimit is not required by POSIX:2001 to be implemented for the shell.