22.01.2015 Views

Chinese-English Translation of Company Names and Addresses in a

Chinese-English Translation of Company Names and Addresses in a

Chinese-English Translation of Company Names and Addresses in a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Journal <strong>of</strong> Advanced Intelligence<br />

Volume 3, Number 2, pp.229-241, July, 2011.<br />

c⃝ AIA International Advanced Information Institute<br />

<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong> <strong>in</strong> a<br />

Large-scale Database<br />

Tiejun Zhao, Conghui Zhu, Muyun Yang<br />

School <strong>of</strong> Computer Science <strong>and</strong> Technology, Harb<strong>in</strong> Institute <strong>of</strong> Technology, 92 West Dazhi St.<br />

Harb<strong>in</strong>, 150001, Ch<strong>in</strong>a<br />

{tjzhao, chzhu, ymy}@mtlab.hit.edu.cn<br />

Y<strong>in</strong>gjian Gu, Gang Zhou, Youyou Hong<br />

National Adm<strong>in</strong>istration for Code Allocation to Organizations, 46 Yuzhong Xili, Yum<strong>in</strong> Rd.<br />

Beij<strong>in</strong>g,100029,Ch<strong>in</strong>a<br />

{guy<strong>in</strong>gjian, zhougang, olyvia}@nacao.org.cn<br />

Received (9 January 2011)<br />

Revised (7 April 2011)<br />

The article presents a large-scale <strong>of</strong> application <strong>of</strong> named entity translation for company names <strong>and</strong><br />

addresses on the database with ten million <strong>of</strong> entries managed by the National Adm<strong>in</strong>istration for Code<br />

Allocation to Organizations <strong>in</strong> Ch<strong>in</strong>a. Based on the structure analysis <strong>of</strong> company names <strong>and</strong> addresses,<br />

a rule-based approach is applied with strategies <strong>of</strong> active learn<strong>in</strong>g <strong>and</strong> stepwise ref<strong>in</strong>ement. Enhanced<br />

by translation knowledge acquisitions on doma<strong>in</strong> terms <strong>and</strong> specific region names the experimental tests<br />

<strong>in</strong>dicate that the system has achieved a promis<strong>in</strong>g performance.<br />

Keywords: Mach<strong>in</strong>e <strong>Translation</strong>; Named Entity; Active Learn<strong>in</strong>g; Stepwise Ref<strong>in</strong>ement.<br />

1. Background<br />

With Ch<strong>in</strong>a’s accelerated economic growth, it becomes press<strong>in</strong>g to share the <strong>in</strong>formation<br />

<strong>of</strong> <strong>in</strong>dustry <strong>and</strong> commerce between Ch<strong>in</strong>a <strong>and</strong> its trad<strong>in</strong>g partners. The basic <strong>in</strong>formation <strong>of</strong><br />

an <strong>in</strong>dustrial company or a commercial company is the description record about its name,<br />

category, address etc. In Ch<strong>in</strong>a, the National Adm<strong>in</strong>istration for Code Allocation to Organizations<br />

(NACAO) takes charge to collect <strong>and</strong> manage all <strong>of</strong> above records for <strong>in</strong>dustrial <strong>and</strong><br />

commercial companies registered <strong>in</strong> ma<strong>in</strong>l<strong>and</strong>. The total amount <strong>of</strong> the records has reached<br />

up to ten million entries <strong>in</strong> the database. As there is a high dem<strong>and</strong> <strong>in</strong> <strong>in</strong>ternational trade<br />

for translat<strong>in</strong>g these <strong>Ch<strong>in</strong>ese</strong> records <strong>in</strong>to <strong>English</strong>, it is a desirable solution by provid<strong>in</strong>g the<br />

<strong>in</strong>formation <strong>of</strong> a company through the bil<strong>in</strong>gual website <strong>of</strong> NACAO.<br />

To provide the bil<strong>in</strong>gual website for the <strong>in</strong>formation search <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> companies, one<br />

<strong>of</strong> the crucial tasks is to translate their names <strong>and</strong> addresses from <strong>Ch<strong>in</strong>ese</strong> <strong>in</strong>to <strong>English</strong>. Here<br />

is Example 1 for the name <strong>and</strong> address <strong>of</strong> a company:<br />

<strong>Company</strong> Name (CN): 苏 州 市 华 佳 照 明 器 材 有 限 公 司<br />

(<strong>in</strong> <strong>English</strong>: Suzhou Huajia Light<strong>in</strong>g Appliance Co. Ltd.)<br />

229


230 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

Address: 苏 州 市 三 香 路 159 号 嘉 理 大 厦 14 楼 D 座<br />

(<strong>in</strong> <strong>English</strong>: Unit D, Floor 14, Jiali Mansion, 159 Sanxiang Rd., Suzhou)<br />

Obviously, it is impractical to translate manually such large amount <strong>of</strong> names <strong>and</strong> addresses.<br />

The structure regularity <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> company names <strong>and</strong> addresses <strong>and</strong> huge number<br />

<strong>of</strong> records make the task a “killer application” for mach<strong>in</strong>e translation (MT), or for<br />

Named Entity (NE) translation to be specific. These names <strong>and</strong> addresses are recorded <strong>in</strong><br />

the database, not identified <strong>in</strong> texts. Although the task skips a difficult phase—NE identification,<br />

there exist at least two challenges for this large-scale eng<strong>in</strong>eer<strong>in</strong>g <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong><br />

(C-E) translation: (1) analyz<strong>in</strong>g complex structures <strong>of</strong> CNs <strong>and</strong> addresses which conta<strong>in</strong><br />

multiple sub NEs; (2) identify<strong>in</strong>g large amounts <strong>of</strong> terms conta<strong>in</strong>ed <strong>in</strong> the names when it is<br />

necessary.<br />

This paper presents a solution to these challenges with the sections that are arranged as<br />

follow<strong>in</strong>g: Section 2 gives an overview <strong>of</strong> techniques <strong>and</strong> strategies that we adopted <strong>in</strong> the<br />

research <strong>and</strong> development <strong>of</strong> the translation system after a brief <strong>in</strong>troduction on NE translation<br />

approaches. Then we have detailed descriptions on the structure analysis <strong>and</strong> translation<br />

<strong>of</strong> company names <strong>and</strong> addresses <strong>in</strong> Section 3 <strong>and</strong> Section 4, respectively. The experiments<br />

<strong>and</strong> tests to the translation system are summarized <strong>in</strong> Section 5 <strong>and</strong> the conclusion is given<br />

<strong>in</strong> the last section.<br />

2. Methodology<br />

2.1. Brief review for NE translation approaches<br />

NE is first <strong>in</strong>troduced <strong>in</strong> 1995 by the Message Underst<strong>and</strong><strong>in</strong>g Conference (MUC-6),<br />

<strong>and</strong> it conta<strong>in</strong>s ma<strong>in</strong>ly 4 types <strong>of</strong> names or expressions: the name <strong>of</strong> a person, an organization,<br />

a location(or address), <strong>and</strong> numeric expressions <strong>in</strong>clud<strong>in</strong>g time, date, money <strong>and</strong><br />

percentage. NE translation has drawn wide attention s<strong>in</strong>ce the end <strong>of</strong> last century 1 as it can<br />

effectively improve the performance <strong>of</strong> translation systems <strong>and</strong> cross-language <strong>in</strong>formation<br />

retrieval applications, e.g. cross-language question answer<strong>in</strong>g 2,3,4 . The ma<strong>in</strong> approach for<br />

NE translation is the transliteration, <strong>and</strong> the comb<strong>in</strong>ation <strong>of</strong> transliteration <strong>and</strong> free translation<br />

is needed for translat<strong>in</strong>g complex NEs such as organization names <strong>and</strong> addresses 5 .<br />

The basic approach to translate NEs is to use the pre-compiled NE translation dictionaries,<br />

but these static dictionaries cannot guarantee the coverage for new names especially<br />

for rarely occurr<strong>in</strong>g names even with a large size <strong>of</strong> vocabulary. Usually, statistical models<br />

for transliteration are tra<strong>in</strong>ed on the bil<strong>in</strong>gual NE dictionaries or corpora to overcome the<br />

limitation <strong>of</strong> NE translation dictionaries. There are many approaches proposed <strong>in</strong> Natural<br />

Language Process<strong>in</strong>g(NLP) community to enhance the dictionary approach for NE translation.<br />

The pair <strong>of</strong> a NE <strong>and</strong> its translation may be widely conta<strong>in</strong>ed <strong>in</strong> Web or some corpora,<br />

so the resources <strong>and</strong> search eng<strong>in</strong>es are usually used as means <strong>of</strong> acquir<strong>in</strong>g correct translation.<br />

Ref. 6 proposed an approach to compare the <strong>Ch<strong>in</strong>ese</strong> NE with <strong>English</strong> NE c<strong>and</strong>idates<br />

<strong>in</strong> retrieved documents based on calculations <strong>of</strong> their phonetic <strong>and</strong> semantic similarities,<br />

<strong>and</strong> then select the best-matched one as the translation on the target side. Learn<strong>in</strong>g algorithms<br />

were presented <strong>in</strong> Ref. 7 to get promis<strong>in</strong>g experimental results for the NE match<strong>in</strong>g


<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

231<br />

(i.e. translation) between <strong>Ch<strong>in</strong>ese</strong> <strong>and</strong> <strong>English</strong>. Research <strong>in</strong> Ref. 8 showed that dist<strong>in</strong>guish<strong>in</strong>g<br />

different classes <strong>of</strong> NEs was helpful to produce better NE translation. And a typological<br />

model to h<strong>and</strong>le types <strong>of</strong> NEs was applied <strong>in</strong> a rule-oriented NE translation system for the<br />

specific doma<strong>in</strong> 9 .<br />

A company name is one type <strong>of</strong> organization names with composite structures, as well<br />

as for an address. For translat<strong>in</strong>g such a NE, the first step is to divide the NE <strong>in</strong>to several<br />

basic units which will be translated by either transliteration approach (if the unit belongs to a<br />

given name or a location name) or free translation approach (if the unit <strong>in</strong>dicates the category<br />

or the bus<strong>in</strong>ess scope <strong>of</strong> a company). The crucial technique for correct translations <strong>of</strong> these<br />

NEs is to get correct segmentations <strong>of</strong> basic translation units. Ref. 10 adopted chunk-based<br />

approach to segment <strong>Ch<strong>in</strong>ese</strong> organization names <strong>and</strong> applied Web-m<strong>in</strong><strong>in</strong>g to f<strong>in</strong>d the best<br />

<strong>English</strong> counterparts. The structures <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> organization names were analyzed <strong>in</strong> Ref.<br />

11 <strong>and</strong> a structure-based approach was proposed for translat<strong>in</strong>g organization names directly.<br />

All <strong>of</strong> above approaches for NE translation needed bil<strong>in</strong>gual translation resource <strong>and</strong><br />

did not guarantee higher precision <strong>of</strong> NE especially CN translations.<br />

2.2. Overview on proposed techniques <strong>and</strong> strategies<br />

As there are <strong>Ch<strong>in</strong>ese</strong> CNs <strong>and</strong> addresses <strong>in</strong> NACAO’s database without a proper C-E<br />

bil<strong>in</strong>gual corpus, we do not use the approach <strong>of</strong> statistical mach<strong>in</strong>e translation to create a<br />

NE translation model. In fact, low precision <strong>of</strong> automatic alignment (e.g. GIZA++ output)<br />

cannot satisfy the actual translation needs <strong>of</strong> CNs <strong>and</strong> addresses. The basic techniques for<br />

the development <strong>of</strong> our translation system are based on classes <strong>of</strong> rules <strong>and</strong> dictionaries.<br />

As with general organization names, it is crucial to segment CNs <strong>and</strong> addresses <strong>in</strong>to<br />

proper translation units. Then we can apply translation rules or search proper dictionaries<br />

to produce their accurate translations. We face the challenges mentioned <strong>in</strong> section 1: what<br />

strategies should be adopted to guarantee the full coverage for the identification <strong>of</strong> terms <strong>and</strong><br />

sub-structures <strong>in</strong> nearly open-size <strong>of</strong> CNs <strong>and</strong> addresses when we use rules <strong>and</strong> dictionaries<br />

In other words, how do we build the architecture to keep translation knowledge <strong>in</strong>creas<strong>in</strong>g<br />

with the grow<strong>in</strong>g new <strong>in</strong>put<br />

The proposed strategies here for the solution <strong>of</strong> above challenges <strong>in</strong>clude three components:<br />

(i) Initial construction <strong>of</strong> translation knowledge bases (KBs). First, we manually analyzed a<br />

certa<strong>in</strong> size <strong>of</strong> the set <strong>of</strong> CNs <strong>and</strong> addresses to confirm their <strong>in</strong>ternal structures. Accord<strong>in</strong>g<br />

to the nam<strong>in</strong>g rules announced by the State Adm<strong>in</strong>istration for Industry <strong>and</strong> Commerce<br />

(SAIC) <strong>of</strong> Ch<strong>in</strong>a, the name <strong>of</strong> a company registered should conta<strong>in</strong> the follow<strong>in</strong>g four<br />

parts <strong>in</strong> sequence: region name, given name, bus<strong>in</strong>ess doma<strong>in</strong>, <strong>and</strong> company category.<br />

And, generally, a complete <strong>Ch<strong>in</strong>ese</strong> address consists <strong>of</strong> three parts with the follow<strong>in</strong>g<br />

order: region name, street name <strong>and</strong> number, <strong>and</strong> apartment number (sometimes with<br />

the build<strong>in</strong>g name). But a complex address will conta<strong>in</strong> an organization name as one<br />

<strong>of</strong> its component as well (see more detail <strong>in</strong> Sec. 3 <strong>and</strong> 4). Thus for the translations<br />

<strong>of</strong> these parts <strong>in</strong> CNs or addresses, we can create one class <strong>of</strong> translation KBs for the<br />

closed sets, e.g. the set <strong>of</strong> region names from the adm<strong>in</strong>istrative divisions <strong>and</strong> their simple


verify<strong>in</strong>g Manual<br />

for the iterations<br />

<strong>of</strong><br />

Fig.<br />

Extraction<br />

1. The process <strong>of</strong> Stepwise Ref<strong>in</strong>ement to improve the translation system.<br />

KBs translation<br />

232 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

translations <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> P<strong>in</strong>y<strong>in</strong>. As for the semi-closed set, e.g. the names <strong>of</strong> bus<strong>in</strong>ess<br />

doma<strong>in</strong>, their translations can be extracted from an <strong>in</strong>itial C-E dictionary. For the open<br />

sets, e.g. given names <strong>of</strong> a company, or street names, each character <strong>of</strong> them is translated<br />

<strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> P<strong>in</strong>y<strong>in</strong>.<br />

(ii) Active learn<strong>in</strong>g for the identification <strong>of</strong> <strong>in</strong>correct translations. Because <strong>of</strong> the limitation<br />

<strong>in</strong> the translation KBs for semi-closed <strong>and</strong> open sets, the <strong>in</strong>itial KBs cannot translate a<br />

complete company name for most entries <strong>in</strong> the CN database. Almost all those <strong>in</strong>correct<br />

translations are caused by segmentation errors <strong>in</strong> bus<strong>in</strong>ess doma<strong>in</strong> or given name. Similarly,<br />

most <strong>in</strong>correct translations for addresses take place <strong>in</strong> the street name if there is a<br />

preced<strong>in</strong>g local region name which is out <strong>of</strong> the adm<strong>in</strong>istrative division list, or take place<br />

<strong>in</strong> the build<strong>in</strong>g name if there is a segmentation ambiguity between it <strong>and</strong> the street name.<br />

We propose a segmentation strategy with the pr<strong>in</strong>ciple <strong>of</strong> active learn<strong>in</strong>g to identify the<br />

error c<strong>and</strong>idates <strong>in</strong> such parts <strong>and</strong> verify them manually.<br />

(iii) Iteration <strong>of</strong> stepwise ref<strong>in</strong>ement for the enlargement <strong>of</strong> KBs. As the small size <strong>of</strong> <strong>in</strong>itial<br />

translation KBs, large amount <strong>of</strong> translation errors occurred at the beg<strong>in</strong>n<strong>in</strong>g while we<br />

applied them for the large-scale <strong>of</strong> translation <strong>in</strong> whole CN <strong>and</strong> address database. Practically,<br />

we controlled the size <strong>of</strong> those translation errors wait<strong>in</strong>g for manual correction<br />

with frequency threshold. Those c<strong>and</strong>idates will be verified to acquire more translation<br />

knowledge. After be<strong>in</strong>g accumulated translation knowledge from many small-size translation<br />

experiments, we run several large-scale translation tests on half-size or full-size<br />

databases. Dur<strong>in</strong>g every iteration <strong>of</strong> the experiment or the test, the translation KBs are<br />

extended <strong>and</strong> ref<strong>in</strong>ed. The process <strong>of</strong> stepwise ref<strong>in</strong>ement will stop till that no more<br />

knowledge can be added to KBs. Fig. 1 illustrates such a process.<br />

errors System runn<strong>in</strong>g system improvements <strong>Translation</strong><br />

Expansion <strong>of</strong> KBs<br />

Initial


Fig. 2. The CN structure <strong>and</strong> correspond<strong>in</strong>g translation resource.<br />

KB translation<br />

<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

233<br />

3. The translation <strong>of</strong> CNs<br />

3.1. The structures <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> CNs<br />

We have mentioned that the key for <strong>Ch<strong>in</strong>ese</strong> CN <strong>and</strong> address translation is to analyze<br />

their structure correctly. For <strong>Ch<strong>in</strong>ese</strong> companies, their names are given accord<strong>in</strong>g to the<br />

nam<strong>in</strong>g rules announced by SAIC <strong>and</strong> <strong>in</strong>clude four parts <strong>in</strong> sequence:<br />

(i) the name <strong>of</strong> the region where the company locates. In general, a region can be a<br />

prov<strong>in</strong>ce, or a city, or a county listed <strong>in</strong> the adm<strong>in</strong>istrative divisions <strong>of</strong> Ch<strong>in</strong>a; or two<br />

regions that the latter one is the sub-division <strong>of</strong> the former one. For <strong>in</strong>stance, “ 苏 州 ”<br />

is a city name <strong>in</strong> Ex. 1. Sometimes such a region name is followed by an adm<strong>in</strong>istrative<br />

division mark, e.g. “ 市 ”(city) <strong>in</strong> the <strong>in</strong>stance, or “ 省 ”(prov<strong>in</strong>ce), or “ 县 ”<br />

(county). All region names <strong>of</strong> counties or larger adm<strong>in</strong>istrative divisions are <strong>in</strong>cluded<br />

<strong>in</strong> a closed-set with <strong>Ch<strong>in</strong>ese</strong> P<strong>in</strong>y<strong>in</strong> as their translations.<br />

(ii) given name <strong>of</strong> the company. Any composition <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> characters can be used, but<br />

conventionally the given name is <strong>in</strong> two (over 50 percent), three, or four characters.<br />

Although it is an open-set, the translation is simple, just P<strong>in</strong>y<strong>in</strong>, e.g. “ 华 佳 ”(Huajia)<br />

<strong>in</strong> the above example.<br />

(iii) the name <strong>of</strong> the bus<strong>in</strong>ess doma<strong>in</strong> that <strong>in</strong>dicates the bus<strong>in</strong>ess scope <strong>of</strong> the company. This<br />

part usually conta<strong>in</strong>s one or more terms to represent what k<strong>in</strong>d <strong>of</strong> products produced<br />

or sold by the company or what k<strong>in</strong>d <strong>of</strong> service provided by the company. “ 照 明 器<br />

材 ”(light<strong>in</strong>g appliance) expla<strong>in</strong>s the trad<strong>in</strong>g scope <strong>of</strong> the company <strong>in</strong> Ex 1.<br />

(iv) the general category name <strong>of</strong> the company, e.g. factory ( 厂 ), Co. Ltd. (“ 有 限 公 司 ”<br />

<strong>in</strong> the above example), etc. It is a closed-set, but more organization category names are<br />

<strong>in</strong>cluded <strong>in</strong>to this translation KB, e.g. “ 局 ”(bureau) because sometimes a company<br />

may subject to another organization.<br />

The four parts <strong>of</strong> the CN <strong>in</strong> Ex. 1 are illustrated <strong>in</strong> Fig. 2. In the base <strong>of</strong> the figure, there<br />

are translation resource to support the translation <strong>of</strong> Part 1, Part 3 <strong>and</strong> Part 4.<br />

1<br />

州 市 Suzhou 华 佳 苏<br />

2 3 4<br />

照 明 器 材 Light<strong>in</strong>gAppliance 有 限 公 司 Co.Ltd. Huajia<br />

translation KB<br />

divisions<br />

a m<strong>in</strong>istrative<br />

term<strong>in</strong>ology bil<strong>in</strong>gual<br />

category company


234 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

3.2. Pattern match<strong>in</strong>g for CN segmentation <strong>and</strong> translation<br />

Based on the structure analysis <strong>of</strong> CN, the basic translation units should match the four<br />

parts <strong>of</strong> an CN. It is crucial for the CN translation to correctly segment a CN <strong>in</strong>to four parts.<br />

We implement the Max-Match<strong>in</strong>g segmentation approach. As the two ends <strong>of</strong> a CN, i.e. Part<br />

1 <strong>and</strong> Part 4 have a closed-set <strong>of</strong> match<strong>in</strong>g elements, exact segmentations can be achieved<br />

<strong>in</strong> these two parts. So the strategy <strong>of</strong> “two-ends to center” is employed dur<strong>in</strong>g segment<strong>in</strong>g,<br />

i.e. start<strong>in</strong>g the match<strong>in</strong>g progress from the beg<strong>in</strong>n<strong>in</strong>g <strong>and</strong> the end <strong>of</strong> a CN str<strong>in</strong>g <strong>and</strong> then<br />

mov<strong>in</strong>g to the center <strong>of</strong> the str<strong>in</strong>g. The process <strong>of</strong> pattern match<strong>in</strong>g is listed <strong>in</strong> the below:<br />

(i) To segment Part 1 <strong>and</strong> Part 4 by match<strong>in</strong>g translation KBs for the adm<strong>in</strong>istrative divisions<br />

<strong>and</strong> the company categories, respectively, <strong>and</strong> return the translations <strong>of</strong> two<br />

parts.<br />

(ii) To segment Part 3 by match<strong>in</strong>g a term translation KB whose size is exp<strong>and</strong>ed <strong>in</strong> iterations,<br />

<strong>and</strong> return the translations.<br />

(iii) To check the left str<strong>in</strong>g exclud<strong>in</strong>g Part 1, 3 <strong>and</strong> 4: if the length <strong>of</strong> the left str<strong>in</strong>g is<br />

larger than 4, an additional translation KB is used for identify<strong>in</strong>g the local region name<br />

which is under a county <strong>in</strong> the hierarchy <strong>of</strong> the adm<strong>in</strong>istrative divisions. Such a name<br />

is generally followed to Part 1.<br />

(iv) Otherwise, the left str<strong>in</strong>g as the given name <strong>of</strong> a company is translated by P<strong>in</strong>y<strong>in</strong> <strong>of</strong> its<br />

characters; specially, 4 characters will be divided <strong>in</strong>to 2 words <strong>in</strong> the output to satisfy<br />

the two-syllable convention <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong>.<br />

Each translation rule <strong>in</strong> KBs is <strong>in</strong> the follow<strong>in</strong>g form:<br />

MatchUnit → TransAssign + UnitMark [+ PositionAssign] (1)<br />

In Form 1, MatchUnit is a matched <strong>Ch<strong>in</strong>ese</strong> word <strong>in</strong> every part while TransAssign<br />

gives its translation. UnitMark <strong>in</strong>dicates the class <strong>of</strong> matched word, e.g. a city or a county,<br />

a company or a branch, etc. The mark is helpful to recognize sequential region names <strong>and</strong><br />

company names <strong>in</strong> one CN.PositionAssign is the sequence number <strong>in</strong> <strong>English</strong> output <strong>and</strong><br />

is used <strong>in</strong> the address translation for reorder<strong>in</strong>g <strong>of</strong> every translation units.<br />

4. The translation <strong>of</strong> addresses<br />

4.1. The structures <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> addesses<br />

Usually, a <strong>Ch<strong>in</strong>ese</strong> address <strong>in</strong>cludes 3 parts as follows with a sequence from a larger<br />

region-unit to a small one:<br />

(i) the region names with the order that is <strong>in</strong> the hierarchy <strong>of</strong> the adm<strong>in</strong>istrative divisions,<br />

i.e. the name <strong>of</strong> a prov<strong>in</strong>ce or an autonomous region or a municipality; then the name<br />

<strong>of</strong> a city or a county belongs to the former part; furthermore, the name <strong>of</strong> a district<br />

belongs to the city or a town belongs to a county.<br />

(ii) the street name <strong>and</strong> the doorplate number, E.g. “ 三 香 路 ”(Sanxiang Rd.) <strong>and</strong> “159<br />

号 ”(No. 159) <strong>in</strong> Ex. 1.


D, Floor 14, Jiali Mansion Tower<br />

No. 159, Sanxi ang Road Suzhou<br />

Fig. 3. The address structure <strong>and</strong> correspond<strong>in</strong>g translation resource<br />

KB translation<br />

<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

235<br />

(iii) As most <strong>Ch<strong>in</strong>ese</strong> <strong>in</strong> cities live <strong>in</strong> apartments, a doorplate does not mean only a house<br />

but may refer to a build<strong>in</strong>g with many apartment units. So the address should further<br />

<strong>in</strong>dicate the unit number with the floor number, even with the build<strong>in</strong>g name or number<br />

if there are several build<strong>in</strong>gs shar<strong>in</strong>g the same number <strong>in</strong> the street. In Ex. 1, “ 嘉 理<br />

大 厦 14 楼 D 座 ”(Unit D, Floor 14, Jiali Mansion) is a typical <strong>in</strong>stance.<br />

The three parts <strong>of</strong> the address <strong>in</strong> Ex. 1 are illustrated by Fig. 3. Similar to Fig. 2, there<br />

are several translation KBs displayed <strong>in</strong> the bottom <strong>of</strong> the figure. Among these KBs, the<br />

translation KB for the adm<strong>in</strong>istrative divisions is same as the one for CN translation. Mark<br />

KB is closed-set <strong>of</strong> those location words, e.g. “ 街 ”(street), “ 路 ”(road), “ 大 厦 ”<br />

(mansion), “ 小 区 ”(quarters), etc. Some new names for special adm<strong>in</strong>istrative regions<br />

such as “ 经 济 技 术 开 发 区 ”(Economy <strong>and</strong> Technology Development Zone) <strong>of</strong>ten appear<br />

<strong>in</strong> addresses <strong>and</strong> they are clustered <strong>in</strong> one translation KB.<br />

1 2 3<br />

香 路 159 号 嘉 理 大 厦 14 楼 D 座 三 市 州 苏<br />

divisions translation KB Mark KB & special location adm<strong>in</strong>istrative<br />

For some non-st<strong>and</strong>ard addresses <strong>in</strong> NACAO database, there exists such an address<br />

structure: an CN will be conta<strong>in</strong>ed <strong>in</strong> it. In such case, the identification process for CN<br />

should be fired if an organization category name is found. Then the CN translations will be<br />

returned from the sub-process <strong>and</strong> merged <strong>in</strong>to the whole translation <strong>of</strong> the address.<br />

4.2. Pattern match<strong>in</strong>g for address segmentation <strong>and</strong> translation<br />

Accord<strong>in</strong>g to the convention <strong>of</strong> a <strong>Ch<strong>in</strong>ese</strong> address, the pattern match<strong>in</strong>g from left to<br />

right is used <strong>and</strong> the order from large to small address unit is accepted as normal sequence.<br />

The process <strong>of</strong> pattern match<strong>in</strong>g is also the process to segment the basic translation units<br />

<strong>of</strong> an address with Max-Match<strong>in</strong>g approach. Once a correct segmentation <strong>of</strong> units is got,<br />

their <strong>English</strong> translation will have a reverse order, i.e. from small to large address unit. The


236 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

pattern match<strong>in</strong>g for an address is represented as follow:<br />

(i) The segmentation start<strong>in</strong>g from the leftmost with match<strong>in</strong>g <strong>of</strong> the adm<strong>in</strong>istrative divisions.<br />

It is similar to the match<strong>in</strong>g <strong>of</strong> Part 1 <strong>in</strong> CN translation.<br />

(ii) The match<strong>in</strong>g <strong>of</strong> local region names if exist<strong>in</strong>g. An address <strong>of</strong>ten conta<strong>in</strong>s the region<br />

name which is under a county or a district <strong>in</strong> a city <strong>in</strong> the hierarchy <strong>of</strong> the adm<strong>in</strong>istrative<br />

divisions. These small region names can be identified with location words. Here is an<br />

example: “ 上 海 市 松 江 区 新 桥 镇 莘 松 路 1115 弄 145 号 ”(No. 145, Lane 1155,<br />

X<strong>in</strong>song Road, X<strong>in</strong>qiao Town, Songjiang District, Shanghai). “ 新 桥 镇 ”(X<strong>in</strong>qiao<br />

Town) is a town belong<strong>in</strong>g to the Songjiang District which can be recognized by the<br />

function word “ 镇 ”(town).<br />

(iii) The match<strong>in</strong>g <strong>of</strong> specific region names if exist<strong>in</strong>g. Some new names <strong>of</strong> special zones<br />

may appear <strong>in</strong> an address, for <strong>in</strong>stance: “ 天 津 新 技 术 产 业 园 区 华 苑 产 业 区 鑫<br />

茂 科 技 园 综 合 楼 5 层 511 室 ”(Room 511, Floor 5, Comprehensive Build<strong>in</strong>g,<br />

X<strong>in</strong>mao Sci-Tech Park, Huayuan Industrial Zone, Tianj<strong>in</strong> New Technology Industrial<br />

Park). Here, “ 新 技 术 产 业 园 区 ”(New Technology Industrial Park), “ 产 业 区 ”<br />

(Industrial Zone), <strong>and</strong> “ 科 技 园 ”(Sci-Tech Park) are such types <strong>of</strong> names. Both (ii)<br />

<strong>and</strong> (iii) will get translations from related KBs <strong>and</strong> P<strong>in</strong>y<strong>in</strong> for given names.<br />

(iv) The identification <strong>and</strong> translation <strong>of</strong> streets <strong>and</strong> doorplates. This is one easy step.<br />

Sometimes they are skipped to the next step directly if there is exact local region name<br />

<strong>in</strong> the front, e.g. “ 北 京 市 海 淀 区 五 道 口 东 升 园 华 清 嘉 园 7 楼 1804 号 ”(No. 1804,<br />

Floor 7, Huaq<strong>in</strong>g Quarters, Dongshengyuan, Wudaokou, Haidian District, Beij<strong>in</strong>g).<br />

In this example, two small region names, “Wudaokou” <strong>and</strong> “Dongshengyuan” have<br />

given the detail address.<br />

(v) The match<strong>in</strong>g <strong>and</strong> translation <strong>of</strong> the build<strong>in</strong>g name <strong>and</strong> apartment number. An <strong>in</strong>stance<br />

has been <strong>in</strong>troduced <strong>in</strong> (iv): “ 华 清 嘉 园 7 楼 1804 号 ”(No. 1804, Floor 7, Huaq<strong>in</strong>g<br />

Quarters). Sometimes this part appears rather complex: “ 广 州 市 天 河 区 五 山 路 五<br />

山 科 技 广 场 A 座 第 六 层 自 编 619 号 ”(No. 619, Self-arrang<strong>in</strong>g, Floor 6, Tower A,<br />

Wushan Science & Technology Plaza, Wushan Road, Tianhe District, Guangzhou).<br />

For this <strong>in</strong>stance, “ 五 山 科 技 广 场 A 座 ”(Tower A, Wushan Science & Technology<br />

Plaza) is the build<strong>in</strong>g name <strong>and</strong> number; “ 第 六 层 自 编 619 号 ”(No. 619, Selfarrang<strong>in</strong>g,<br />

Floor 6) is the exact location <strong>in</strong> the build<strong>in</strong>g.<br />

There are two difficulties <strong>in</strong> the pattern match<strong>in</strong>g <strong>of</strong> addresses: one is that an address<br />

conta<strong>in</strong>s a CN name, e.g. “ 天 津 市 北 辰 区 津 围 公 路 西 汽 车 检 测 站 内 ”(In Automobile<br />

Test<strong>in</strong>g Station, West <strong>of</strong> J<strong>in</strong>wei Highway, Beichen District, Tianj<strong>in</strong>). To identify the<br />

CN name “ 汽 车 检 测 站 ”(Automobile Test<strong>in</strong>g Station) <strong>in</strong> this example, the key word<br />

“Test<strong>in</strong>g Station”is found <strong>and</strong> then the CN translation system should be called as a subrout<strong>in</strong>e.<br />

The other is that an address is <strong>in</strong> abnormal order i.e. larger region name follows<br />

small region name, e.g. “ 高 桥 镇 大 同 北 路 ”(North Datong Rd., Gaoqiao Town). Here<br />

“ 大 同 ”(Datong) is a city but “ 高 桥 镇 ”is a small town. When a city name appears <strong>in</strong><br />

such a case, it must be followed by a street mark, i.e. “ 街 ”, “ 路 ”etc. An order-control<br />

list works with UnitMark <strong>in</strong> rules to guarantee the correct process<strong>in</strong>g <strong>of</strong> abnormal orders.


<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

237<br />

5. Tests for the translation system<br />

5.1. Evaluation st<strong>and</strong>ard<br />

For the large-scale application <strong>of</strong> MT system, fixed-size test set <strong>of</strong> BLEU-style evaluation<br />

criterion is not proper for its limited coverage. In our case, it is necessary to test the<br />

coverage <strong>of</strong> translation KBs <strong>and</strong> the robustness <strong>of</strong> the rule-based approach with stochastic<br />

sampl<strong>in</strong>g from large amount <strong>of</strong> the system output for human evaluation . A popular MT<br />

evaluation criterion for human evaluation on the system output is adopted <strong>in</strong> our task. This<br />

criterion is designed <strong>and</strong> adopted by the State High-Tech Development Plan <strong>of</strong> Ch<strong>in</strong>a (863<br />

Program) 12 . It provided an adequacy score <strong>and</strong> a fluency score for each translation accord<strong>in</strong>g<br />

to the scor<strong>in</strong>g criterion, rang<strong>in</strong>g from 0 to 5. From its def<strong>in</strong>ition on the scor<strong>in</strong>g criterion, a<br />

new criterion to our evaluation task is def<strong>in</strong>ed <strong>and</strong> a centesimal grade is mapped for manual<br />

scor<strong>in</strong>g. As the orders are fixed for CN <strong>and</strong> address translations, the criterion <strong>of</strong> the fluency<br />

is omitted. We focused on whether <strong>in</strong>formation is correctly translated, i.e. accuracy. Table<br />

1 gives the descriptions on the criterions <strong>of</strong> 5-score <strong>and</strong> 100-score. Note: the “error” <strong>in</strong> Table<br />

1 means that there are obvious <strong>in</strong>correct translations caused by segmentation error or<br />

translation error but the mach<strong>in</strong>e-style translations are tolerant.<br />

5-Score<br />

Descriptions on<br />

Table 1. Descriptions on MT evaluation criterions.<br />

Adequacy <strong>and</strong> fluency<br />

100-Score<br />

5 All <strong>in</strong>formation, flawless 100 No error<br />

4 Most <strong>in</strong>formation, good 80 One error<br />

3 More <strong>in</strong>formation, non-native 60 Two errors<br />

2 Little <strong>in</strong>formation, disfluent 40 – –<br />

1 Almost no <strong>in</strong>formation 20 Otherwise<br />

Difficult to comprehend<br />

0 No output or <strong>in</strong>comprehensible 0 – –<br />

Description on accuracy<br />

The human judgments were completed by the cooperation between 2-3 system developers<br />

<strong>and</strong> 2-5 bil<strong>in</strong>gual eng<strong>in</strong>eers with <strong>English</strong> major master degrees. The f<strong>in</strong>al evaluation<br />

scores for the system output came from their average scores.<br />

5.2. Test results<br />

After a series <strong>of</strong> experiments had been done on the CN <strong>and</strong> address translation system <strong>in</strong><br />

the early development phase <strong>of</strong> the system, the translation KBs are exp<strong>and</strong><strong>in</strong>g <strong>and</strong> the translation<br />

performance is be<strong>in</strong>g improved gradually. When the translation KBs have reached a<br />

certa<strong>in</strong> size, the strategy <strong>of</strong> active learn<strong>in</strong>g was applied to extract a large amount <strong>of</strong> translation<br />

error c<strong>and</strong>idates from millions <strong>of</strong> the system output. Such scale tests were executed<br />

several times dur<strong>in</strong>g the late phase. The error c<strong>and</strong>idates were checked <strong>and</strong> confirmed; for<br />

real errors, the correspond<strong>in</strong>g translation knowledge would be added <strong>in</strong>to KBs manually.


238 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

How to f<strong>in</strong>d translation error c<strong>and</strong>idates automatically Two types <strong>of</strong> error c<strong>and</strong>idates<br />

are def<strong>in</strong>ed.<br />

(i) a <strong>Ch<strong>in</strong>ese</strong> character or str<strong>in</strong>g without any translation or process<strong>in</strong>g mark;<br />

(ii) 2 or more sequential segmented characters at some positions where should be occupied<br />

by a multi-character word <strong>in</strong> general.<br />

Any element satisfy<strong>in</strong>g the above criteria is denoted with Cx, <strong>and</strong> it is found by the<br />

debugg<strong>in</strong>g program. As the translation system ran on millions <strong>of</strong> records <strong>in</strong> tests, the number<br />

<strong>of</strong> error c<strong>and</strong>idates was too large. A control threshold which based on Cx frequency should<br />

be used to limit the number <strong>of</strong> c<strong>and</strong>idate output. The control threshold is represented <strong>in</strong><br />

Form 2. Only the output that conta<strong>in</strong><strong>in</strong>g Cx with the frequency larger than T value (default<br />

= 2) is extracted as an error c<strong>and</strong>idate.<br />

Freq(C x ) > T (2)<br />

The c<strong>and</strong>idate number about 10,000 ∼ 30,000 for human verify<strong>in</strong>g was controlled <strong>in</strong><br />

each iteration step. Dur<strong>in</strong>g one <strong>and</strong> half year, it had been done several iterations <strong>of</strong> Test<br />

→ KB Enlargement → System Improvement. The output errors <strong>in</strong> the last iteration were<br />

erased <strong>in</strong> the next circle through the process <strong>of</strong> stepwise ref<strong>in</strong>ement. Meanwhile, we checked<br />

new output on the same records which are correct <strong>in</strong> the last iteration to avoid rule conflict<br />

between two iterations. Table 2 gives the test results <strong>of</strong> the translation system. Dur<strong>in</strong>g such<br />

iterations, more than 5,000 terms had been added to the ma<strong>in</strong> KB–term translation KB for<br />

CN translation. For the ma<strong>in</strong> KBs <strong>of</strong> address translation, translation rule set <strong>and</strong> local region<br />

KB, the entry numbers <strong>in</strong>creased from about 300 to 1,050 <strong>and</strong> from less than 200 to 1,600<br />

respectively.<br />

Table 2. Test results.<br />

Test time Size <strong>of</strong> sampl<strong>in</strong>g records <strong>Translation</strong> accuracy<br />

Jan. 2009 CN #: 35,000 CN: 93%<br />

Addr. #: 34,500 Addr.: 87%<br />

Oct. 2009 CN #: 30,000 CN: 94%<br />

Addr. #: 30,000 Addr.: 89%<br />

Jun. 2010 CN #: 10,000 CN: 97%<br />

Addr. #: 3,000 Addr.: 92%<br />

In our tests, there are two types for most errors: for CN records, Part 3 <strong>of</strong>ten output<br />

null as miss<strong>in</strong>g translations <strong>in</strong> term translation KB; for address records, there are many<br />

segmentation errors <strong>in</strong> Part 2 <strong>and</strong> 3 because <strong>of</strong> difficulties mentioned <strong>in</strong> Section 4.2 which<br />

made the accuracy <strong>of</strong> address translation lower than the one <strong>of</strong> CN translation.


<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

239<br />

6. Conclusion<br />

There are different challenges for C-E translation for CN names <strong>and</strong> addresses <strong>in</strong> a largescale<br />

database compared to general NE translation: correctly analyz<strong>in</strong>g complex structures<br />

<strong>of</strong> these specific NEs, <strong>and</strong> accurately translat<strong>in</strong>g the large amount <strong>of</strong> special phrases <strong>in</strong>clud<strong>in</strong>g<br />

terms, local region names, etc. In this paper we put forward a rule-based approach with<br />

novel applications <strong>of</strong> active learn<strong>in</strong>g <strong>and</strong> stepwise ref<strong>in</strong>ement, <strong>and</strong> carry out co-operation<br />

between the system developers <strong>and</strong> language eng<strong>in</strong>eers, all which have achieved promis<strong>in</strong>g<br />

results.<br />

The lesson we has learned from this translation eng<strong>in</strong>eer<strong>in</strong>g is that the rule-based approach<br />

plays a very important role for some MT tasks, especially for those cases such as:<br />

without enough bil<strong>in</strong>gual resources <strong>and</strong> with high requirements on the accuracy <strong>of</strong> the system<br />

output. The rule-based approach at least has two advantages here: exact translations<br />

through translation rules provided by human; easy <strong>and</strong> flexible control mechanism for the<br />

output quality.<br />

For application, the CN <strong>and</strong> address C-E translation system still needs further improvement.<br />

Cont<strong>in</strong>ued checks to the system output <strong>and</strong> extensions to translation KBs will be<br />

scheduled <strong>in</strong> the near future.<br />

Acknowledgments<br />

The authors give s<strong>in</strong>cere thanks to the master students from Harb<strong>in</strong> Institute <strong>of</strong> Technology<br />

who worked on the development <strong>of</strong> the NE <strong>and</strong> address translation system s<strong>in</strong>ce<br />

the autumn <strong>of</strong> 2007. They are Song Wang, Kaim<strong>in</strong>g L<strong>in</strong>, Q<strong>in</strong>gxuan Chen, Wenyan Miao,<br />

Xianggen Zhou <strong>and</strong> Shasha Liu. Their cont<strong>in</strong>uous efforts contributed to the success <strong>of</strong> the<br />

system. The fundamental research related to this work is supported partly by the National<br />

Natural Science Foundation <strong>of</strong> Ch<strong>in</strong>a (Grant No. 60736014 <strong>and</strong> 60773069) <strong>and</strong> the State<br />

High-Tech Development Plan <strong>of</strong> Ch<strong>in</strong>a (Grant No. 2006AA010108).<br />

References<br />

1. Y. Al-Onaizan <strong>and</strong> K. Knight. Named Entity <strong>Translation</strong>: Extended Abstract, In Proceed<strong>in</strong>gs <strong>of</strong> the Second<br />

International Conference on Human Language Technology Research (HLT-02), pp.122-124, 2002.<br />

2. L. Jiang, M. Zhou, L-C. Chien, C. Niu. Named Entity <strong>Translation</strong> with Web M<strong>in</strong><strong>in</strong>g <strong>and</strong> Translatiteration, In<br />

Proceed<strong>in</strong>gs <strong>of</strong> N<strong>in</strong>eteenth International Jo<strong>in</strong>t Conference on Artificial Intelligence (IJCAI-07), pp.1629-1634,<br />

2007.<br />

3. Y-C. Wang, R. T-H. Tsai, W-L. Hsu. Learn<strong>in</strong>g Patterns from the Web to Translate Named Entities for Cross<br />

Language Information Retrieval, In Proceed<strong>in</strong>gs <strong>of</strong> the Third International Jo<strong>in</strong>t Conference on Natural Language<br />

Process<strong>in</strong>g, pp.677-684, 2008.<br />

4. L-X. Tang, S. Geva, A. Trotman, Y. Xu. A Vot<strong>in</strong>g Mechanism for Named Entity <strong>Translation</strong> <strong>in</strong> <strong>English</strong>-<strong>Ch<strong>in</strong>ese</strong><br />

Question Answer<strong>in</strong>g, In Proceed<strong>in</strong>s <strong>of</strong> the 4th International Workshop on Cross L<strong>in</strong>gual Information Access<br />

at COLING 2010, pp.43-51, 2010.<br />

5. T-J. Zhao, D-Q. Zheng, C-Q. Zong, The Progress <strong>of</strong> Computational L<strong>in</strong>guistics Research <strong>in</strong> Ch<strong>in</strong>a, In Ch<strong>in</strong>a<br />

Computer Federation Proceed<strong>in</strong>gs CCFP 0010, pp.150-192, 2009. (<strong>in</strong> <strong>Ch<strong>in</strong>ese</strong>)<br />

6. F. Huang, S. Vogel, A. Waibel. Improv<strong>in</strong>g Named Entity <strong>Translation</strong> Comb<strong>in</strong><strong>in</strong>g Phonetic <strong>and</strong> Semantic Similarities,<br />

In Proceed<strong>in</strong>gs <strong>of</strong> HLT-NAACL’04, pp. 281-288, 2004<br />

7. W. Lam, S-K. Chan, R. Huang. Named Entity translation Match<strong>in</strong>g <strong>and</strong> Learn<strong>in</strong>g: with Application for M<strong>in</strong><strong>in</strong>g<br />

Unseen <strong>Translation</strong>s, ACM Trans. on Information Systems, 25(1), pp.31-62, 2007.


240 T. Zhao, C. Zhu, M. Yang, Y. Gu, G. Zhou <strong>and</strong> Y. Hong<br />

8. S. R. Maskey, M. Cmejrek, B. Zhou, Y. Gao. Class-Based Named Entity <strong>Translation</strong> <strong>in</strong> A Speech to Speech<br />

<strong>Translation</strong> System, In Proceed<strong>in</strong>gs <strong>of</strong> IEEE Spoken Language Technology Workshop (SLT 2008), pp.253-256,<br />

2008.<br />

9. A. B. Hamadou, O. Piton, H. Fehri. Recognition <strong>and</strong> translation Arabic-French <strong>of</strong> Named Entities: case <strong>of</strong> the<br />

Sport places, Comput<strong>in</strong>g Research Repository (CoRR), Electronic Edition, 1002.0481, 2010.<br />

10. F. Yang, J. Zhao, B. Zou, K. Liu. A <strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> Organization Name <strong>Translation</strong> System Us<strong>in</strong>g Heuristic<br />

Web M<strong>in</strong><strong>in</strong>g <strong>and</strong> Asymmetric Alignment. In Proceed<strong>in</strong>gs <strong>of</strong> ACL-IJCNLP 2009, pp. 387-395, 2009.<br />

11. Y. Chen, C. Zong. A Structure-based Model for <strong>Ch<strong>in</strong>ese</strong> Organization Name <strong>Translation</strong>. ACM Transactions<br />

on Asian Language Information Process<strong>in</strong>g, 7(1), pp.1-30, 2008<br />

12. The 2005 HTRDP Evaluation Guidel<strong>in</strong>es for Mach<strong>in</strong>e <strong>Translation</strong>,<br />

http://nlp.ict.ac.cn/demo/cwmt/year pages/2005.html<br />

Tiejun Zhao (Member)<br />

He received the Ph.D. degree <strong>in</strong> 1997 from Dept. <strong>of</strong> Computer<br />

Science <strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g, Harb<strong>in</strong> Institute <strong>of</strong> Technology (HIT),<br />

Ch<strong>in</strong>a. He worked at School <strong>of</strong> Computer Science <strong>and</strong> Technology,<br />

HIT s<strong>in</strong>ce 1990 as a lecturer, an associate pr<strong>of</strong>essor, <strong>and</strong> a<br />

pr<strong>of</strong>essor. His research <strong>in</strong>terests <strong>in</strong>clude Natural Language Process<strong>in</strong>g,<br />

Applied Artificial Intelligence. He is the Council member<br />

<strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> Information Process<strong>in</strong>g Scociety Ch<strong>in</strong>a (CIPSC),<br />

the vice director <strong>of</strong> Mach<strong>in</strong>e <strong>Translation</strong> Subject Committee <strong>of</strong><br />

CIPSC, the member <strong>of</strong> editorial board <strong>of</strong> Journal <strong>of</strong> <strong>Ch<strong>in</strong>ese</strong> Information<br />

Process<strong>in</strong>g (JCIP).<br />

Conghui Zhu (Member)<br />

He received the Ph.D. degree <strong>in</strong> 2009 from Dept. <strong>of</strong> Computer<br />

Science <strong>and</strong> Eng<strong>in</strong>eer<strong>in</strong>g, Harb<strong>in</strong> Institute <strong>of</strong> Technology (HIT),<br />

Ch<strong>in</strong>a. He worked at School <strong>of</strong> Computer Science <strong>and</strong> Technology,<br />

HIT s<strong>in</strong>ce 2009 as a lecturer. His research <strong>in</strong>terests <strong>in</strong>clude<br />

Natural Language Process<strong>in</strong>g <strong>and</strong> Mach<strong>in</strong>e Learn<strong>in</strong>g.<br />

Muyun Yang (Member)<br />

He received the Ph.D. degree <strong>in</strong> computer science <strong>and</strong> eng<strong>in</strong>eer<strong>in</strong>g<br />

from Harb<strong>in</strong> Institute <strong>of</strong> Technology <strong>in</strong> 2003. He is an<br />

associate pr<strong>of</strong>essor <strong>in</strong> the School <strong>of</strong> Computer Science <strong>and</strong> Technology,<br />

Harb<strong>in</strong> Institute <strong>of</strong> Technology. He is a member <strong>of</strong> IEEE,<br />

CCF (<strong>Ch<strong>in</strong>ese</strong> Computer Federation) <strong>and</strong> CIPSC (<strong>Ch<strong>in</strong>ese</strong> Information<br />

Process<strong>in</strong>g Society Ch<strong>in</strong>a). His research <strong>in</strong>terests <strong>in</strong>clude<br />

mach<strong>in</strong>e translation, <strong>in</strong>formation retrieval <strong>and</strong> e-learn<strong>in</strong>g.


<strong>Ch<strong>in</strong>ese</strong>-<strong>English</strong> <strong>Translation</strong> <strong>of</strong> <strong>Company</strong> <strong>Names</strong> <strong>and</strong> <strong>Addresses</strong><br />

241<br />

Y<strong>in</strong>gjian Gu (Member)<br />

As a research fellow <strong>and</strong> the director <strong>of</strong> the National Adm<strong>in</strong>istration<br />

for Code Allocation to Organizations (NACAO), he is responsible<br />

for establish<strong>in</strong>g <strong>and</strong> implement<strong>in</strong>g the network system<br />

<strong>of</strong> the National Organization Code (NOC) with almost 3000 local<br />

branches <strong>in</strong> Ch<strong>in</strong>a. He devotes himself as the group leader <strong>of</strong><br />

the project <strong>of</strong> the Information Database <strong>of</strong> Legal Entities (IDLE),<br />

contribut<strong>in</strong>g to the development <strong>of</strong> national credit system. He has<br />

successfully promoted the NOC’s application among more than<br />

20 governmental agencies. Books he wrote <strong>and</strong> compiled <strong>in</strong>clude<br />

the Practical Course <strong>of</strong> IC Card <strong>of</strong> the NOC Certificate, the System<br />

Eng<strong>in</strong>eer<strong>in</strong>g <strong>of</strong> NOC, the Guidel<strong>in</strong>es for the Preparation <strong>of</strong><br />

the Quality Management Documents <strong>in</strong> NOC Management Departments<br />

etc.<br />

Gang Zhou (Member)<br />

He achieved the graduate degree from the Social Research Institute<br />

<strong>of</strong> Ch<strong>in</strong>a. He is a senior eng<strong>in</strong>eer <strong>of</strong> NACAO, works as the<br />

director <strong>of</strong> the Department <strong>of</strong> Application <strong>and</strong> Strategic Development<br />

<strong>of</strong> NACAO. He <strong>in</strong>volves ma<strong>in</strong>ly <strong>in</strong> the promotion <strong>and</strong><br />

strategic research <strong>in</strong> fields <strong>of</strong> E-government, E-commerce <strong>and</strong><br />

<strong>in</strong>ternational cooperation, <strong>and</strong> he is also <strong>in</strong> charge <strong>of</strong> promot<strong>in</strong>g<br />

the IDLE project.<br />

Youyou Hong (Member)<br />

She received graduate degree <strong>in</strong> project management <strong>in</strong> 2008<br />

from the Beihang University; <strong>and</strong> she received bachelor degree<br />

<strong>in</strong> <strong>English</strong> Literature <strong>in</strong> 2002 from the Sichuan University.<br />

She works as an eng<strong>in</strong>eer <strong>in</strong> the Department <strong>of</strong> Application <strong>and</strong><br />

Strategic Development <strong>in</strong> NACAO, be<strong>in</strong>g responsible for <strong>in</strong>ternational<br />

cooperation, <strong>in</strong>volv<strong>in</strong>g <strong>in</strong> fields such as NOC’s application<br />

<strong>and</strong> promotion, C-E translation project, IDLE etc.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!