BashPitfalls - Greg's Wiki - Jay Taylor's notes

Jay Taylor's notes

back to listing index

BashPitfalls - Greg's Wiki

[web search]

Original source (mywiki.wooledge.org)

Tags: programming bash pitfalls mywiki.wooledge.org

Clipped on: 2020-09-07

Search:

most of the time

word=abcde
expr "$word" : ".\(.*\)"
bcde

But WILL fail for the word "match"

word=match
expr "$word" : ".\(.*\)"

The problem is "match" is a keyword. Solution (GNU only) is prefix with a '+'

word=match
expr + "$word" : ".\(.*\)"
atch

Or, y'know, stop using expr. You can do everything expr does by using Parameter Expansion. What's that thing up there trying to do? Remove the first letter of a word? That can be done in POSIX shells using PE or Substring Expansion:

$ word=match
$ echo "${word#?}"    # PE
atch
$ echo "${word:1}"    # SE
atch

Seriously, there's no excuse for using expr unless you're on Solaris with its non-POSIX-conforming /bin/sh. It's an external process, so it's much slower than in-process string manipulation. And since nobody uses it, nobody understands what it's doing, so your code is obfuscated and hard to maintain.

40. On UTF-8 and Byte-Order Marks (BOM)

In general: Unix UTF-8 text does not use BOM. The encoding of plain text is determined by the locale or by mime types or other metadata. While the presence of a BOM would not normally damage a UTF-8 document meant only for reading by humans, it is problematic (often syntactically illegal) in any text file meant to be interpreted by automated processes such as scripts, source code, configuration files, and so on. Files starting with BOM should be considered equally foreign as those with MS-DOS linebreaks.

In shell scripting: 'Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.'

CategoryShell CategoryBashguide

BashPitfalls (last edited 2020-06-18 19:11:42 by GreyCat)