There are two kinds of strings: binary (STR) and UTF-8 (human-readable ones, STR). Most significant tag's bit is set for them. Seventh bit tells is it UTF-8 string, binary otherwise. Next six bits contain the length of the string. len +------+ / \ 1 U L L L L L L ^ \-is it UTF-8? If length value equals to: 0-60 => Use as is. 61 => 61 + next 8-bits value. 62 => 62 + 255 + next big-endian 16-bits value. 63 => 63 + 255 + 65535 + next big-endian 64-bits value. String's length *must* be encoded in shortest possible form. UTF-8 strings *must* be valid UTF-8 sequences, except that null byte *is not* allowed. That should be normalized Unicode string. Example representations: 0-byte binary string => 80 4-byte binary string 0x01 0x02 0x03 0x04 => 84 01 02 03 04 64-byte binary string with 0x41 => BD 03 41 41 ... 41 UTF-8 string "привет мир" ("hello world" in russian) => D3 D0 BF D1 80 D0 B8 D0 B2 D0 B5 D1 82 20 D0 BC D0 B8 D1 80