[containers-users] Possible additions to Containers and Friends
peter frey
pjfrey at sympatico.ca
Thu Mar 1 18:44:45 GMT 2018
I'm not sure I understand, what is the point of supporting "more" than
> utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to
'1111110' (i.e.: 1.. 6)
The first char supplies 1 to 7 bits; the following chars supply 6 bits each.
The maximal # bits is 31 bits. (5 * 6 + low bit from 0-byte).
I am using this encoding but it is no longer 'standard'. Instead the range
0xD7FF .. 0xE000 is excluded from the TOTAL range 0 .. 0x10FFFF
In Uutf8 only this range is accepted.
All I tried to say is: My code does not encode the current standard; in
fact it does
little checking. (Encodes more - checks less).
Calling it utf31 would be an informal way of signaling this;
we can call anything what we want to call it.
I will write a filter that does verification; especially :
A code that has length 1 + n must have n bytes following with format
10xxxxxx; if the decoder encounters 0xxxxxxx or x1xxxxxxx or end of
string; that is an error.
Uutf8 replaces such sequences with an error code.
peter
More information about the Containers-users
mailing list