There are two kinds of strings:
BIN (binary) and STR (human-readable, printable ones).
Most significant tag's bit is set. Seventh bit tells is it UTF-8 string,
binary otherwise. Next six bits contain the length of the string.
len
+------+
/ \
1 U L L L L L L
^
\-is it UTF-8?
If length value equals to:
0-60 => Use as is.
61 => 61 + next 8-bits value.
62 => 62 + 255 + next big-endian 16-bit value.
63 => 63 + 255 + 65535 + next big-endian 64-bit value.
String's length *must* be encoded in shortest possible form.
UTF-8 strings *must* be valid UTF-8 sequences, except that null
byte *is not* allowed. That should be normalized Unicode string.
Example representations:
BIN "" | 80
BIN [binary decode hex "01 02 03 04"] | 84 01 02 03 04
BIN [string repeat "A" 64] | BD 03 41 41 ... 41
STR "hello world" | CB 68656C6C6F 20 776F726C64
STR "привет мир" | D3 D0BFD180D0B8D0B2D0B5D182 20 D0BCD0B8D180