Software/CJKUnifonts/Resources/Tutorial

This is a short tutorial about how to contribute to the fonts.

0. Get ready

  1. I really recommend to do the development on a native Linux system, because all necessary tools come already with most distributions. It doesn't really matter which distribution you use (I personally use Debian Etch and Ubuntu 8.04 Hardy Heron), as long as it is fairly recent.
  2. The fonts are produced using Fontforge a superb open source font editor. Get a recent copy of it (from 2008-03-30 or later) and install it on your development machine. As the fonts we work on are fairly large in size and therefor use a lot of memory when opened in Fontforge, I recommend to have at least 1GB of RAM.
    The following screenshots show my recommended Fontforge preferences: Screenshot 01 Screenshot 02 Screenshot 03 Screenshot 04 Screenshot 05 Screenshot 06

  3. Get the development tarball of the fonts (they are already in fontforge's native sfd format): http://people.ubuntu.com/~arne/cjk-unifonts/uming_ukai_sfd.tar.bz2 (55MB)

  4. Untar them into your working directory: tar xvfj uming_ukai_sfd.tar.bz2

  5. Never modify the original sfd files in place! Use the templates instead: http://people.ubuntu.com/~arne/cjk-unifonts/uming/umingTPL.sfd and http://people.ubuntu.com/~arne/cjk-unifonts/ukai/ukaiTPL.sfd

  6. As the UMing and Ukai fonts are based on the original free Arphic fonts, it's a good idea to have them on the system, too. (On Debian / Ubuntu use apt-get install ttf-arphic-bsmi00lp ttf-arphic-gbsn00lp (for the Ming/Song style fonts) and apt-get install ttf-arphic-bkai00mp ttf-arphic-gkai00mp (for the Kai style fonts). You will find the ttf files in /usr/share/fonts/truetype/arphic/. With other distributions the font packages might have different names.)

  7. Get agrep. It allows you to grep a text file with an AND search. (On Debian / Ubuntu: apt-get install agrep)

  8. One very handy text file, which lists all unique Han characters in Unicode 5.0, their radicals, radical index and remaining strokes and components they are built of in IDS (Ideographic Description Sequence) format: ids_rs.tar.gz (tarball) or ids_rs.zip (zipfile)

    Caveat: This ids_rs.txt file is not final, nor is it official or error free. It is not intended to be redistributed. If you want to have better data, use the CHISE project and the Unihan.txt file provided by Unicode.

  9. Untar the text file in your working directory: tar xvfz ids_rs.tar.gz

  10. Han characters can have different glyph shapes, depending on the region they are used in. Glyphs usually look different in China, Hong Kong, Taiwan, Japan, Korea and Vietnam, even if they share the same codepoint. PDF files, which show these different shapes for each codepoint in the Unicode CJK Unified Ideographs and CJK Unified Ideographs Extension A blocks are available here: http://standards.iso.org/ittf/PubliclyAvailableStandards/c039921_ISO_IEC_10646_2003(E).zip (Please note that the glyph shapes have been submitted to Unicode some time ago. Japan has revised some glyph shapes in its latest JIS X0213-2004 standard.)
    In this PDF, the glyphs for Hong Kong are missing. Therefor I produced a similar PDF for Hong Kong glyphs by myself, using the Ming font which is provided by the HKSAR government: http://people.ubuntu.com/~arne/cjk-unifonts/CJK_Glyphs_HK.tar.bz2 (Untar them with: tar xvfj CJK_Glyphs_HK.tar.bz2)

1. Basics

Now that you've downloaded all this stuff, we are ready to begin.

1.1. ids_rs.txt

If you have downloaded the ids_rs.txt file, now it's the time to take a look at it. You can open it with any text editor. However, as this text file contains all Han characters from Unicode 5.0, including those of Extension B, you would need to have a font installed to actually display the characters. I found the Hannom fonts (they are free to download and use, but not free to modify or redistribute) to be perfect for this task. Install them on your system, on Linux, just copy them to your ~/.fonts/ directory and call fc-cache ~/.fonts/ to update the cache.

The text file has multiple columns:

1.2. IDS (Ideographic Description Sequence)

The IDS describes how a Han character looks like. That is, if a specific Han character is not encoded in Unicode yet, you could use IDS to describe how the character looks like. To do this, IDS consists of description characters and Han characters. The description characters are:

The Han characters are the components the desired character is made of. The syntax is: First the IDC followed by the components (from outside to inside, from left to right, from top to bottom).

Examples:

The description for U+55C0 嗀 reads like: (http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF0.png(http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF1.png(http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF3.png士冖一)口)殳 ).

1.3. How to use this?

Most of the Han characters in Unicode can be composed out of other Han characters. By far the most cases use a LEFT TO RIGHT composition (http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF0.png), the second most common is ABOVE TO BOTTOM (http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF1.png). Almost all of these use a radical in either the LEFT, RIGHT, TOP or BOTTOM position (e.g. LEFT: 亻 冫 口 土 女 山 彳 忄 扌 日 月 氵 ..., RIGHT: 刂 支 攵 阝 ..., TOP: 入 八 冖 宀 艹 ..., BOTTOM: 乙 灬 ...).


Our goal is now to find an existing glyph in the font with a similar arrangement (radical on top, 几 on bottom, the radical does not use much space). Therefor, we can use agrep to filter our ids_rs.txt file: agrep '⿱;几' ids_rs.txt | less . This means we search all lines which have http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF1.png AND 几 and display them with the pager less.

The result in this case is quite long, so we can filter some more... as we are looking for a 几 with a radical component on TOP, we know that the additional strokes (means in addition to the radical component) is 2. Let's put this into our agrep search string: agrep '\.2;⿱;几' ids_rs.txt . Et voilà: the list is much shorter now. (Screenshot)

From this search result the character U+5197 冗 冖 014.2 ⿱冖几 jumps right into sight and seems to be a perfect candidate. Now, assume we have already loaded the font (e.g. UKai) and the matching template (e.g. ukaiTPL) in Fontforge, we can take a look if U+5197 already exists in our font. View -> Goto -> U+5197 reveals that we are lucky (Screenshot). Now we open the glyph with a double click and select the 几 part by carefully double clicking on the spline (Screenshot). Then we copy the selection into the clipboard by pressing CRTL+C. In the template file window we go to U+8281, our missing character: View -> Goto -> U+8281 (Screenshot).

Double click on the empty character and paste the 几 component into it (CRTL+V) (Screenshot). You can see that we have 3 layers available in the editing mode: Front, Back and Guide. Now we have to find a suitable radical component, which fits in size and slant to the 几 we already pasted. For this, we can now switch back to our main font window (e.g. UKai) and go to the same codepoint like in the template: View -> Goto -> U+8281 (which is empty of course) (Screenshot). Now we look around this position if any of the surrounding glyphs has a promising radical components, which we could borrow. In this case U+8293 芓 looks like a good candidate (Screenshot).

Double click on that character, select the radical component by double clicking on it and copy it to the clipboard (CRTL+C) (Screenshot). Now switch back to the template file, double click on the character we want to edit (if you have closed that window before), select the Back layer and paste the radical component into it (CRTL+V) (Screenshot). We can now see if the radical component matches our BOTTOM component. As we used two layers, they won't conflict with each other. Means moving one of the parts around won't disturb the other. In this case it's a perfect match and we don't need to do any further modification. Let's move the radical component onto the Front layer into it's final position: select the radical component, CRTL+X, switch to the Front layer and paste with CRTL+V (Screenshot). If the result looks good, we can take care of the next glyph (Screenshot).

Now, what if we would need to move a component around to make it fit? For this, I found it to be a good idea not to use the mouse to drag the component around, but the arrow keys on the keyboard. Pressing the arrow keys will move the selection one decimal point, holding down the ALT key while pressing the arrow keys will move the selection 10 decimal points.


2. More advanced stuff

For an exhaustive tutorial about how to use Fontforge, please see the Fontforge website.

2.1. Resizing

In this example, we create the character U+2A6A5 𪚥, which simply consists of 4 "dragons" (龍) using the UMing font.BR This glyph could be divided into two horizontal parts (http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF0.png龍龍 stacked on top) or two vertical parts (http://people.ubuntu.com/~arne/cjk-unifonts/png/2FF1.png龍龍 side by side). If we scan through our ids_rs.txt file (grep 龍 ids_rs.txt), we will find the perfect candidate: U+9F98 龘, which consists of 3 "dragons", one on top and two side by side below. This constellation ensures, that the height of the upper and lower parts are about the same, the ideal situation for our target glyph.

  1. We open our source font (UMing in this case) and find our candidate glyph: View -> Goto -> U+9F98

  2. We open the editing window by double clicking on that character (Screenshot 1)

  3. As we only want to copy the lower half of the glyph, we select the parts we want to copy. The bottom right "dragon" is conjoined with the "dragon" on top: select a rectangle around the area we want to copy. Then press CRTL+C. (Screenshot 2)

  4. In our template file, we find our target character: View -> Goto -> U+2A6A5

  5. We open the editing window of that character by double clicking on it.
  6. Now, we can paste our selection into it. CTRL+V (Screenshot 3)

  7. With the pasted stuff still selected, we zoom in a bit to get a better look on the splines and points involved. Press CRTL+ALT+SHIFT+= .
  8. Now we can remove those points and splines we don't need. Select them and press Delete. (Screenshot 4)

  9. If you compare the right "dragon" with the left one, you will notice that at those points, where the bottom right "dragon" was conjoined with the top "dragon", the splines are not closed now (Screenshot 5). We need to fix this:

    1. Copy the missing parts from the left "dragon" one by one: CRTL+C. (Screenshot 6)

    2. Paste the selection at the same position: CRTL+V
    3. Move the pasted part across to the right dragon by using the arrow keys. The arrow keys on your keyboard move the selection exactly one point at the time into the desired direction. Using the mouse to drag would probably distort the glyph when you connect the parts. So, be careful! Holding the ALT key pressed while using the arrow keys, will move the selection 10 points at a time into the desired direction. (Screenshot 7)

    4. Connect the pasted selection with the right "dragon" by using the arrow keys on your keyboard. (Screenshot 8)

    5. Do the same for the other missing part. (Screenshot 9) (Screenshot 10) (Screenshot 11)

  10. Now, the bottom part of our target glyph is complete. We can try to duplicate it and stack it on top: select the whole thing by drawing a rectangle around the glyph with your mouse, press CRTL+C and CRTL+V to paste it at the same position (Screenshot 12). Use ALT+ Arrow Up to move the whole thing on top (Screenshot 13). Zoom out to get a better overview how the result looks like. (View -> Fit).

  11. You will notice, that our result is now too high and sticks out of our bounding box (Screenshot 14). We need to fix this. The best way to do this is to "resize" the bottom part and then duplicate it again.

    1. remove the upper half of the glyph by selecting the upper part and press the Delete key. If that part is still selected from our moving step, just press the Delete key.
    2. Zoom in to get a better view: CRTL+ALT+SHIFT+=
    3. Now we need to skew the two "dragons" vertically. Don't use any of fontforge's builtin Transformation functions, we want to keep the stem widths intact! Instead find a good position to do the resizing manually. I suggest to make the space between the horizontal strokes of the "meat" part ⺼ and the corresponding horizontal strokes of the right part of the "dragon" smaller.

    4. Select the upper part of the two "dragons" carefully, then press the Arrow Down key 5 times (Screenshot 15).

    5. You'll see that the "meat" part has two points on the spline we want to resize. As they have no function here, we can delete them. Select the two points in each "meat" part and press Delete (Screenshot 16). Now the spline is open (Screenshot 17). Select the lower point of the spline, select the "Add corner point" tool http://people.ubuntu.com/~arne/cjk-unifonts/png/ff_corner_point.png from the toolbar and click on the upper point to close the spline (Screenshot 18). Select the "Pointer" tool http://people.ubuntu.com/~arne/cjk-unifonts/png/ff_pointer.png to finish the task. Repeat for the other "meat" part on the right side.

    6. Now include one more horizontal stroke in each part and repeat. Also 5 points down (Screenshot 19).

    7. In total, we skewed our bottom half of the character by 10 points now. Let's see if it fits now.
  12. Select the whole thing, press CRTL+C followed by CRTL+V, move the pasted selection up by using ALT+ Arrow Up a few times. Et Voilà, it fits (Screenshot 20).

Now, why not just use the single "dragon" U+9F8D 龍, scale it by 50% and duplicate it three times and move the parts into their correct position? Would be a lot easier, wouldn't it?

Well, let's compare the result: (Screenshot 21)

On the left hand side the proper glyph, on the right hand side the "scaled" one: as we can easily see, the "scaled" glyph is much thinner than the rest of the glyphs in the font. When reading a text containing this character, this glyph will stick out as being too thin compared to the other ones and will probably look blurry because of that. In other words: it looks ugly!

2.2. Editing splines

2.3. Glyph variants

CJK characters are used in different regions and can look different in each region, although the share the same codepoint. If you have downloaded the PDFs I mentioned in Section 0, item 10, you can see which glyph has been submitted to Unicode by each national body for each codepoint used in that region.

The regions are:

Region

Internal abbrevation

UMing font flavor

UKai font flavor

China

C

AR PL UMing CN

AR PL UKai CN

Taiwan

T

AR PL UMing TW

AR PL UKai TW

Japan

J

AR PL UMing JP

AR PL UKai JP

Korea

K

AR PL UMing KR

AR PL UKai KR

Vietnam

V

AR PL UMing VN

AR PL UKai VN

Hong Kong

H

AR PL UMing HK

AR PL UKai HK

To produce these font flavors, we collect all regional flavored glyphs in the same font file and assign tags to them, depending on for which region they should be used. For example: U+4EE4 令 has 3 different glyph shapes: C, TVH and JK. Therefor, we have three glyphs: uni4EE4.C, uni4EE4.TVH and uni4EE4.JK (Screenshot).