There are two kinds of strings: BIN (binary) and STR (human-readable, printable ones). Most significant tag's bit is set. Seventh bit tells is it UTF-8 string, binary otherwise. Next six bits contain the length of the string. len +------+ / \ 1 U L L L L L L ^ \-is it UTF-8? If length value equals to: 0-60 => Use as is. 61 => 61 + next 8-bits value. 62 => 62 + 255 + next big-endian 16-bit value. 63 => 63 + 255 + 65535 + next big-endian 64-bit value. String's length *must* be encoded in shortest possible form. UTF-8 strings *must* be valid UTF-8 sequences, except that null byte *is not* allowed. That should be normalized Unicode string. Example representations: BIN "" | 80 BIN [binary decode hex "01 02 03 04"] | 84 01 02 03 04 BIN [string repeat "A" 64] | BD 03 41 41 ... 41 STR "hello world" | CB 68656C6C6F 20 776F726C64 STR "привет мир" | D3 D0BFD180D0B8D0B2D0B5D182 20 D0BCD0B8D180