The saturated unsigned integer
Published: November 29, 2024
During some work with a radar antenna I found myself in a situation where I didn’t understand something. However, it felt like I understood something, but it didn’t make sense. This false sense of confidence was grounded in experience from working with hardware integrations, because hardware work always has a part that is unknowns and detective work. For some reason, the documentation for the hardware and it’s APIs is never accurate. It’s just a matter of where it is on the It Sucks™ scale. There will be things that surprises you, are different from what’s written down or that simply does not work. It turns out that trusting the actual behaviour is what you mostly do, with some pointers and general guidance from the documentation. The only thing that is usually specified and trust worthy are the data types of the byte protocol for configurations and settings that you send over the socket. This is usually trust worthy because it really maps onto the functionality of the device and both ends (me and the hardware) must be in agreement of what the values means.
Because of this, I assumed things and these things turned out to be wrong.
The work was about acquiring data from an antenna (I’m simplifying) to figure out the phase and amplitude of the signal coming in. The data comes in and it’s UDP packets at 25 Gb/s (there’s a joke here about swallowing, but I can’t find it). I unpack the header with a timestamp that’s wrong (what else, eh?), payload is 2 bytes per sample and there are something like 2k samples per package.
1print(np.frombuffer(payload, dtype=">u2"))
2
3[ 0 65531 0 0 0 65529 0 65526 0 12 0 3
4 0 65516 0 65520 1 65534 1 65535 0 65524 1 65530
5 1 65524 1 65534 0 11 0 65527 0 65514 0 65527
6 1 65534 0 65535 0 65523 0 65532 1 65530 0 0
7 1 65528 0 65523 1 65522 1 65527 0 2 1 65525
8 1 65531 1 65527 0 65531 1 65521 0 0 0 65526
9 0 65534 0 65534 0 0 1 65516 1 65520 0 65531
10 0 65531 0 65525 1 65525 1 65528 1 65524 0 0
11 1 65521 1 7 1 65526 1 1 0 65521 0 2
12 0 1 0 1 1 0 65535 65535 3 3 0 2
13 0 65534 1 1 0 2 1 1 0 1 1 65535
14 1 0 0 0 1 0 1 0 0 1 0 0
15 0 0 0 1 1 65535 1 0 0 1 0 1
16 0 65535 0 65535 1 1 0 1 1 65535 0 0
17 0 0 65535 0 0 1 0 0 0 0 0 0
18 65534 0 1 0 65535 1 0 0 0 65535 65535 0]
The string
">u2"is Python+Numpy magic for big endian unsigned int 16.
See this? It’s basically two discreet values, one is 0xFFFF (the ~65535s) and
the other is 0x0000 (the ~zeroes), with fluctuation of up to 20 something for
both values. So, either a completely saturated number or a completely “empty”
number.
I was scratching my head. For hours. For more than one day!
Obviously there’s a pattern, but it all was so strange. There were fluctuations that corresponds to noise in the atmosphere (what you would expect form an antenna) but what was wrong? Remember before about the software engineer (me) and the hardware had to agree? Here, on the data out part, we don’t have to agree. The hardware just throws what ever it has at me and it’s now my responsibility. My assumption is spelled ‘big endian unsigned int 16’.
My colleague then pointed out that obviously it was signed integers.
1print(np.frombuffer(payload, dtype=">i2"))
2
3[ 0 -5 0 0 0 -7 0 -10 0 12 0 3 0 -20 0 -16 1 -2
4 1 -1 0 -12 1 -6 1 -12 1 -2 0 11 0 -9 0 -22 0 -9
5 1 -2 0 -1 0 -13 0 -4 1 -6 0 0 1 -8 0 -13 1 -14
6 1 -9 0 2 1 -11 1 -5 1 -9 0 -5 1 -15 0 0 0 -10
7 0 -2 0 -2 0 0 1 -20 1 -16 0 -5 0 -5 0 -11 1 -11
8 1 -8 1 -12 0 0 1 -15 1 7 1 -10 1 1 0 -15 0 2
9 0 1 0 1 1 0 -1 -1 3 3 0 2 0 -2 1 1 0 2
10 1 1 0 1 1 -1 1 0 0 0 1 0 1 0 0 1 0 0
11 0 0 0 1 1 -1 1 0 0 1 0 1 0 -1 0 -1 1 1
12 0 1 1 -1 0 0 0 0 -1 0 0 1 0 0 0 0 0 0
13 -2 0 1 0 -1 1 0 0 0 -1 -1 0 2 0 0 0 0 0]
I’ll be damned.
Signed and unsigned numbers
It became painfully obvious that there was something I didn’t know about how
signed and unsigned numbers worked. The thing that I did know for sure is that
there’s one bit in the byte(s) that is set to 1 that determines the sign,
given that you interpret the value as a signed number.
0b0xxxxxxx -> Positive number
0b1xxxxxxx -> Negative number
This is right, however there are details behind all the x-es that I didn’t know.
I assumed that -1 would be encoded as 0b10000001 for a signed int, but it’s
not.
1def list_to_string(items, format):
2 return " ".join([format.format(s) for s in items])
3
4dA = bytes([0xFF, 0xFE, 0xFD, 0xFC, 0xFB])
5dB = bytes([0x01, 0x02, 0x03, 0x04, 0x05])
6
7print("Unsigned |", list_to_string(np.frombuffer(dA, dtype="u1"), "{:8d}"))
8print("Unsigned |", list_to_string(np.frombuffer(dA, dtype="u1"), "{:8d}"))
9print("Signed |", list_to_string(np.frombuffer(dA, dtype="i1"), "{:8d}"))
10print("Bit pattern |", list_to_string(dA, "{:08b}"))
11print()
12print("Unsigned |", list_to_string(np.frombuffer(dB, dtype="u1"), "{:8d}"))
13print("Signed |", list_to_string(np.frombuffer(dB, dtype="i1"), "{:8d}"))
14print("Bit pattern |", list_to_string(dB, "{:08b}"))
Unsigned | 255 254 253 252 251
Signed | -1 -2 -3 -4 -5
Bit pattern | 11111111 11111110 11111101 11111100 11111011
Unsigned | 1 2 3 4 5
Signed | 1 2 3 4 5
Bit pattern | 00000001 00000010 00000011 00000100 00000101
It looks like the encoding of a negative number is something completely different than I expected.
Conclusion
I will forgive the documentation on this specific issue, because it did not explicitly say that it was unsigned numbers, but it also did not say it was signed numbers. My experience with hardware assumed that it was unsigned, so I can only complain about incompleteness rather than error.
I also realize that the interpreting end of any communication (destination) has the intrinsic need to be the more picky one, because it’s the one doing the interpretation. It needs to understand what it gets. When I send bytes to the hardware to do things, the hardware is picky with how it’s structured and what means what. However, when the hardware sends me data it’s not inherently equally picky about what it sends it, because I’m the one doing the interpretation.
In the end, I learned something. In the future I will be able to spot this pattern much more quickly. Additionaly, there are apparently more than one way of representing signed numbers in a binary format. The one that I encountered in my example above is called Two’s complement and seems to be the most common one.