Aa!! It's a zorting zombie!

I came across an interesting Bash issue today, as I was trying to restore a zstd-compressed CloneZilla Partclone image to a raw file in order to extract some data from it. For some reason, none of the solutions on the internet worked, and searching for the error message turned up no useful results. This was the command line I had constructed:

$ zstdcat nvme0n1p3.ntfs-ptcl-img.zst.* | sudo partclone.ntfs -C -r -W -s - -O image.img

Notice the glob at the end of the only argument to zstdcat. This only gave me:

Partclone v0.3.17 http://partclone.org
Starting to restore image (-) to device (image.img)
This is not partclone image.
Partclone fail, please check /var/log/partclone.log !

partclone kept saying This is not partclone image no matter what I did. I did some sanity checking with the commands:

$ ls nvme0n1p3.ntfs-ptcl-img.zst.*

and also

$ ls nvme0n1p3.ntfs-ptcl-img.zst.a[a-x]

Both returned the following list:

nvme0n1p3.ntfs-ptcl-img.zst.ab  nvme0n1p3.ntfs-ptcl-img.zst.ah  nvme0n1p3.ntfs-ptcl-img.zst.an  nvme0n1p3.ntfs-ptcl-img.zst.at
nvme0n1p3.ntfs-ptcl-img.zst.ac  nvme0n1p3.ntfs-ptcl-img.zst.ai  nvme0n1p3.ntfs-ptcl-img.zst.ao  nvme0n1p3.ntfs-ptcl-img.zst.au
nvme0n1p3.ntfs-ptcl-img.zst.ad  nvme0n1p3.ntfs-ptcl-img.zst.aj  nvme0n1p3.ntfs-ptcl-img.zst.ap  nvme0n1p3.ntfs-ptcl-img.zst.av
nvme0n1p3.ntfs-ptcl-img.zst.ae  nvme0n1p3.ntfs-ptcl-img.zst.ak  nvme0n1p3.ntfs-ptcl-img.zst.aq  nvme0n1p3.ntfs-ptcl-img.zst.aw
nvme0n1p3.ntfs-ptcl-img.zst.af  nvme0n1p3.ntfs-ptcl-img.zst.al  nvme0n1p3.ntfs-ptcl-img.zst.ar  nvme0n1p3.ntfs-ptcl-img.zst.ax
nvme0n1p3.ntfs-ptcl-img.zst.ag  nvme0n1p3.ntfs-ptcl-img.zst.am  nvme0n1p3.ntfs-ptcl-img.zst.as  nvme0n1p3.ntfs-ptcl-img.zst.aa

Wait... What? Notice how ab is the first and aa is the last substitution. Obviously, aa needs to be the first substitution! They need to be zstdcated in order for partclone to recognize the file, as I assume there's a file signature/magic bytes at the start of the raw file contained within the archives.

My question was, why is my alphabetization seemingly broken? According to a web search, Bash globs will always return an ordered/alphabetized list. I was able to start the restore process by manually entering the list of files in order instead of globbing, but curiosity got the better of me and I had an inkling this was related to my system locale.

SPOILERS: It was.

$ locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=nb_NO.UTF-8
LC_TIME=en_GB.UTF-8
LC_COLLATE=nb_NO.UTF-8
LC_MONETARY=nb_NO.UTF-8
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=nb_NO.UTF-8
LC_NAME=nb_NO.UTF-8
LC_ADDRESS=nb_NO.UTF-8
LC_TELEPHONE=nb_NO.UTF-8
LC_MEASUREMENT=nb_NO.UTF-8
LC_IDENTIFICATION=nb_NO.UTF-8
LC_ALL=

Observe how LC_COLLATE=nb_NO.UTF-8. I have my system language set to English, but most other locale settings set to Norwegian. In Norwegian, Aa/aa is a common substitution for Å/å, the last character of the Norwegian alphabet. The sorting algorithm, in its infinite wisdom, seems to have decided that a file extension of *.aashould be sorted at the very end because of this, which breaks the argument list.

To fix this, I set LC_COLLATE to C by issuing:

$ sudo localectl set-locale LC_COLLATE=C

This worked for about three seconds before KDE decided that my opinion is wrong, and promptly overwrote it, resurrecting nb_NO.UTF-8 like a digital zombie.

In KDE's System Settings > Region & Language section, there are a few locale related settings, but nothing about sorting. Presumably, it uses one of the other fields to assume the value of LC_COLLATE, and you know what they say about assuming.

To actually fix it, I added export LC_COLLATE="C" to my ~/.bashrc, which seems to work, and persists between terminal sessions.


Comments

  1. Markdown is allowed. HTML tags allowed: <strong>, <em>, <blockquote>, <code>, <pre>, <a>.