I am playing with the Unix hexdump utility. My input file is UTF-8 encoded, which has a single character,
, which is in C3 B1
hexadecimal UTF-8
hexdump Test.txt 0000000 b1c3 0000002
huh? It shows with B1C3
- what I expected was the inverse! Can anyone explain?
I do this to get the desired product:
hexdump-c test.txt 00000000 c3 b1. .. | 00000002
I understand that I understand system encoding.
The reason for this is that hexdump is the default for using 16-bit terms and you are running on a small-endian architecture. Byte sequence B1C3
is thus interpreted as hex term c3b1
. -C
Option to work with hexdump words instead of bytes.
Comments
Post a Comment