08 June 2010

Optimal Unicode range for Vietnamese font libraries

As an old-school tip, in order to display correct Vietnamese Unicode characters, we must include these character sets: Basic Latin + Latin I + Latin Extended A + Latin Extended Additional.

Although those 4 sets contain 994 glyphs, Vietnamese characters scatter in those sets and only take about one third of that number of glyphs. It is a waste to include other Latin characters that Vietnamese language never uses.

So I’ve done a quick experiment and come up with an optimal unicode range for Vietnamese characters (for [Embed] tags and compiled with Flex SDK):

//This unicode range include: Basic Latin + Vietnamese Unicode
//(243 glyphs) 
//To support additional Vietnamese composite unicode characters,
//append this range (total 337 glyphs) 

However, Flash Player render composite unicodes very ugly (see demo). So the best advice is avoid using them. BTW, you cannot embed composite characters in Flash CS* even though you select all Latin* sets, weird!

Below is a simple demo which display full range of Vietnamese characters (you can also try entering or pasting some Vietnamese text):

The demo would take 23KB (18KB without composite) compared to 50KB of the same demo embedding 4 full sets of characters. And this is only 1 style of 1 font.

You can download the demo source code for usage reference here.

[Vietnamese tag: danh sách tối thiểu các ký tự unicode cho Tiếng Việt khi nhúng font động]

1 comment: