Feature #14

Handle different encodings and UTF8

Added by Sputnick about 17 years ago. Updated almost 15 years ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:


We should have support (automatic) encoding selection for both receiving and sending. Unicode detection is crucial.


#1 Updated by Sputnick over 16 years ago

I have recently read up on how utf8 works, and gather that utf8 detection should be no real problem. In particular, any string that passes as valid utf8 can be considered being utf8. Chances for a false positive are very slim, since this would involve character combinations that should not exist in any language.

So we would check any incoming string for being utf8, and use that if it actually is utf8, and otherwise apply a default/configured decoding.

Encodings for both sending and receiving should be configurable globally (i.e default) as well as per-channel. Also we would need encoding settings for the communication between client and server. This should only affect channelnames though (think chan names with non-ascii chars in them).

#2 Updated by Sputnick over 16 years ago

Just a quick update: utf8 detection is done (as a function in util.*) and works as expected. Still need to do the conversion where appropriate, which means some work in server.*, in particular. Or maybe string decoding should even move to the GUI?

#3 Updated by Sputnick almost 16 years ago

We have the basic infrastructure in place, utf8 detection for receiving messages works. A user UI for configuring encodings is not yet available (we have hardcoded ISO-8859-15 for the time being), but that'll come in due time. Closing this bug, since the crucial issues are taken care of.

Also available in: Atom PDF