08 June 2010

Optimal Unicode range for Vietnamese font libraries

As an old-school tip, in order to display correct Vietnamese Unicode characters, we must include these character sets: Basic Latin + Latin I + Latin Extended A + Latin Extended Additional.

Although those 4 sets contain 994 glyphs, Vietnamese characters scatter in those sets and only take about one third of that number of glyphs. It is a waste to include other Latin characters that Vietnamese language never uses.

So I’ve done a quick experiment and come up with an optimal unicode range for Vietnamese characters (for [Embed] tags and compiled with Flex SDK):

//This unicode range include: Basic Latin + Vietnamese Unicode
//(243 glyphs) 
'U+0020-U+002F,U+0030-U+0039,U+003A-U+0040,U+0041-U+005A,U+005B-U+0060,U+0061-U+007A,U+007B-U+007E,U+00C0-U+00C3,U+00C8-U+00CA,U+00CC-U+00CD,U+00D0,U+00D2-U+00D5,U+00D9-U+00DA,U+00DD,U+00E0-U+00E3,U+00E8-U+00EA,U+00EC-U+00ED,U+00F2-U+00F5,U+00F9-U+00FA,U+00FD,U+0102-U+0103,U+0110-U+0111,U+0128-U+0129,U+0168-U+0169,U+01A0-U+01B0,U+1EA0-U+1EF9'
//To support additional Vietnamese composite unicode characters,
//append this range (total 337 glyphs) 
U+02C6-U+0323

However, Flash Player render composite unicodes very ugly (see demo). So the best advice is avoid using them. BTW, you cannot embed composite characters in Flash CS* even though you select all Latin* sets, weird!

Below is a simple demo which display full range of Vietnamese characters (you can also try entering or pasting some Vietnamese text):

The demo would take 23KB (18KB without composite) compared to 50KB of the same demo embedding 4 full sets of characters. And this is only 1 style of 1 font.

You can download the demo source code for usage reference here.

[Vietnamese tag: danh sách tối thiểu các ký tự unicode cho Tiếng Việt khi nhúng font động]

1 comment: