[containers-users] Possible additions to Containers and Friends

peter frey pjfrey at sympatico.ca
Thu Mar 1 18:44:45 GMT 2018


I'm not sure I understand, what is the point of supporting "more" than
> utf8?
In the original utf8 standard the encoding is:
The code is encoded as a string of length 1 + additional length.
The additional length is a 0-ary encoding of the length '10' to 
'1111110'  (i.e.: 1.. 6)
The first char supplies 1 to 7 bits; the following chars supply 6 bits each.
The maximal # bits is 31 bits. (5 * 6 + low bit from 0-byte).
I am using this encoding but it is no longer 'standard'.  Instead the range
   0xD7FF .. 0xE000 is excluded from the TOTAL range 0 .. 0x10FFFF
In Uutf8 only this range is accepted.


All I tried to say is: My code does not encode the current standard; in 
fact it does
little checking. (Encodes more - checks less).

Calling it utf31 would be an informal way of signaling this;
we can call anything what we want to call it.

I will write a filter that does verification; especially :
A code that has length 1 + n must have n bytes following with format 
10xxxxxx;  if the decoder encounters 0xxxxxxx or x1xxxxxxx or end of 
string; that is an error.
Uutf8 replaces such sequences with an error code.

peter




More information about the Containers-users mailing list