8/12/2012

08-12-12 - Unicode on Windows Summary Page

Making another summary page for myself to link to.

Posts about the disaster of Unicode on Windows : (mainly with respect to old apps and/or console apps)

cbloom rants 06-14-08 - 3
cbloom rants 06-15-08 - 2
cbloom rants 06-21-08 - 3
cbloom rants 11-06-09 - IsSameFile
cbloom rants 06-07-10 - Unicode CMD Code Page Checkup
cbloom rants 10-11-10 - DeUnicode v1.0
cbloom rants 10-11-10 - Windows 1252 to ASCII best fit
cbloom rants 07-28-12 - DeUnicode 1.1

Brief summary : correctly handling unicode (*) file names in a console app on windows is almost impossible. cblib has some functions to do the best I believe you can do (MakeUnicodeNameFullMatch), but it's so complicated and error prone that I suggest you should not try it. Also never use printf with wchars, it's badly broken; do your own conversion.

(* = actually the problem occurs even for non-unicode 8-bit character names (eg. any time the "A" "OEM" and "ConsoleCP" encodings could be different); Windows console apps only work reliably on file names that are 7-bit ascii).

No comments:

old rants