18.02.2013 Views

Word Expert Translation from German into Chinese in - Knowledge ...

Word Expert Translation from German into Chinese in - Knowledge ...

Word Expert Translation from German into Chinese in - Knowledge ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Word</strong> <strong>Expert</strong> <strong>Translation</strong> <strong>from</strong> <strong>German</strong> <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong><br />

<strong>in</strong> the Slow Intelligence Framework<br />

Abstract—This paper presents a novel approach to translat<strong>in</strong>g<br />

<strong>German</strong> sentences <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong> us<strong>in</strong>g word expert translators,<br />

thereby extend<strong>in</strong>g the application area of the slow <strong>in</strong>telligence<br />

architecture. The word expert perspective to natural language<br />

understand<strong>in</strong>g is reviewed. The motivation to word expert<br />

translation is presented – It is shown <strong>in</strong> detail that the <strong>Ch<strong>in</strong>ese</strong><br />

language depends crucially on the topic-comment relation and is<br />

more suitable to be understood <strong>from</strong> the word expert perspective.<br />

Five pr<strong>in</strong>ciples for communication among <strong>Ch<strong>in</strong>ese</strong> word experts<br />

are proposed. Ma<strong>in</strong> activities of word expert translators consist<br />

of enumerat<strong>in</strong>g possible <strong>Ch<strong>in</strong>ese</strong> lexemes of a <strong>German</strong> lexeme,<br />

determ<strong>in</strong><strong>in</strong>g l<strong>in</strong>ear order<strong>in</strong>g among <strong>Ch<strong>in</strong>ese</strong> lexemes with<strong>in</strong> one<br />

word expert translator, determ<strong>in</strong><strong>in</strong>g topic-comment relations<br />

among word experts, construct<strong>in</strong>g nested topic-comment relations<br />

among word expert translators, choos<strong>in</strong>g the possible <strong>Ch<strong>in</strong>ese</strong><br />

lexemes. All these fit well with<strong>in</strong> the slow <strong>in</strong>telligence framework.<br />

I. INTRODUCTION<br />

Mach<strong>in</strong>e translation (MT) is to transform texts <strong>from</strong> one<br />

natural language <strong><strong>in</strong>to</strong> another by computers. The worldwide<br />

market for mach<strong>in</strong>e translation is large and steadily <strong>in</strong>creas<strong>in</strong>g<br />

with a growth rate of around 20% per year, <strong>in</strong> part because<br />

the manual translation by human translators is expensive and<br />

slow. Ma<strong>in</strong> technologies for MT are the <strong>in</strong>terl<strong>in</strong>gua method, the<br />

statistical method, and the rule-based approach. The rule-based<br />

method assumes that languages can be governed by rules.<br />

However, every rule has exceptions; and the mapp<strong>in</strong>g between<br />

rules of different languages can be complicated, sometimes<br />

even impossible. The statistical method [1] focuses on the<br />

probability of translat<strong>in</strong>g a sentence <strong>in</strong> the source language<br />

<strong><strong>in</strong>to</strong> a sentence <strong>in</strong> the target language, therefore, it requires a<br />

highly qualified and extensive sample translation corpus. Even<br />

if such sample corpus is available, what the statistical method<br />

guarantees is only a probability. The <strong>in</strong>terl<strong>in</strong>gua approach<br />

ideally assumes a common semantic representation for all<br />

natural languages. <strong>Translation</strong> is a process of transform<strong>in</strong>g<br />

text <strong>in</strong> the source language <strong><strong>in</strong>to</strong> the <strong>in</strong>terl<strong>in</strong>gua, and a process<br />

of transform<strong>in</strong>g the <strong>in</strong>terl<strong>in</strong>gua <strong><strong>in</strong>to</strong> the target language. The<br />

difficulty is that such <strong>in</strong>terl<strong>in</strong>gua is hard to develop, at least at<br />

the time when the ALPAC report was made. Recent research<br />

on the semantic representation of natural language has made<br />

fruitful progress. It is time now to reconsider MT with the<br />

<strong>in</strong>terl<strong>in</strong>gua approach, e.g. [8]. The present paper adopts the<br />

MultiNet representation [5] as the <strong>in</strong>terl<strong>in</strong>gua for the semantic<br />

representation of natural languages, and applies the word<br />

expert perspective for mach<strong>in</strong>e translation.<br />

Tiansi Dong and Ingo Glöckner<br />

Department of Mathematics and Computer Science<br />

University of Hagen<br />

Email: {tiansi.dong|<strong>in</strong>go.gloeckner}@fernuni-hagen.de<br />

The rest of the paper is structured as follows: Section<br />

2 reviews the word expert perspective, and the successful<br />

application of the WOCADI parser developed by [6]; Section 3<br />

presents the topic-comment structure of the <strong>Ch<strong>in</strong>ese</strong> language,<br />

and motivates the method of word expert translator; Section<br />

4 presents the ma<strong>in</strong> activities of word expert translators<br />

and shows how these activities are carried out <strong>in</strong> the slow<br />

<strong>in</strong>telligence framework.<br />

II. WORD EXPERT PERSPECTIVE<br />

A. The Perspective<br />

The traditional perspective views words as passive data<br />

(with knowledge of part of speech and mean<strong>in</strong>g), and languages<br />

as an <strong>in</strong>f<strong>in</strong>ite set of sequences of words (satisfy<strong>in</strong>g<br />

grammatical rules). The word expert perspective pioneered by<br />

Rieger [9] views <strong>in</strong>dividual words as active procedures – Each<br />

word of language is seen as an active lexical agent called a<br />

word expert, which participates <strong>in</strong> the overall control of the<br />

pars<strong>in</strong>g process by its <strong>in</strong>ternal actions and its <strong>in</strong>teractions with<br />

other such agents [10, p.1]. The word expert view advocates<br />

the <strong>in</strong>tegrated syntax-semantics coupl<strong>in</strong>g approach to language<br />

understand<strong>in</strong>g. The traditional syntax is viewed as an artifact<br />

describ<strong>in</strong>g patterns of lexical <strong>in</strong>teractions, and cannot be used<br />

to model comprehension due to the rich semantic particularities<br />

of lexemes [10, p.3]. The word expert perspective<br />

views text understand<strong>in</strong>g as a process of <strong>in</strong>teractions among<br />

word exports that results <strong>in</strong> a disambiguation. The work of<br />

pars<strong>in</strong>g is designed as the decision-mak<strong>in</strong>g process of each<br />

word expert to choose one suitable <strong>in</strong>terpretation for the word<br />

that it represents. Comprehension is therefore simulated as<br />

an activity of look<strong>in</strong>g for the best possible fit among word<br />

experts. This view not only differs <strong>from</strong> the traditional rulebased<br />

approach (<strong>in</strong> which words are passive), but also <strong>from</strong><br />

the statistical-based approach (<strong>in</strong> which best fit is guessed by<br />

look<strong>in</strong>g back <strong><strong>in</strong>to</strong> the past exist<strong>in</strong>g sample corpus).<br />

B. <strong>Word</strong> <strong>Expert</strong> Parser for English<br />

Follow<strong>in</strong>g Wilks’ pars<strong>in</strong>g system [13], Small [10] developed<br />

one of the most <strong>in</strong>fluential word expert parsers for English. His<br />

theoretical position is that words have no mean<strong>in</strong>g per se, but<br />

rather that fragments of lexical items mean someth<strong>in</strong>g through<br />

their <strong>in</strong>terrelationships [11, pp.70]. That is, each lexical item<br />

is viewed as hav<strong>in</strong>g certa<strong>in</strong> <strong>in</strong>teractions with its neighborhood<br />

items, and produces mean<strong>in</strong>gs.


C. <strong>Word</strong> Class <strong>Expert</strong> Parser for <strong>German</strong><br />

Based on the ideas of <strong>Word</strong> Class Functions [7], Helbig<br />

and Hartrumpf [6] developed the first semantically oriented<br />

word class expert parser for the <strong>German</strong> language – WOCADI.<br />

In contrast to Small’s distributed-<strong>in</strong>teraction approach, WCFA<br />

describes the grammatical functions of whole classes of words<br />

[6, pp.313]. The parser transforms <strong>German</strong> sentences <strong><strong>in</strong>to</strong><br />

the MultiNet formalism [5]. It has been tested with all of<br />

the texts <strong>in</strong> the <strong>German</strong> Wikipedia. The MultiNet formalism<br />

and the WOCADI parser have been successfully applied<br />

<strong>in</strong> the LogAnswer question answer<strong>in</strong>g (QA) system which<br />

scored second among non-English QA systems <strong>in</strong> the CLEF<br />

competition [4].<br />

D. <strong>Word</strong> <strong>Expert</strong>s for <strong>Translation</strong><br />

As po<strong>in</strong>ted out by Small, the word expert perspective<br />

suggests a new way to look at translation [10, p.15]. The<br />

generation step, <strong>in</strong> particular, requires word experts to arrange<br />

themselves <strong><strong>in</strong>to</strong> a mean<strong>in</strong>gful sequence by communicat<strong>in</strong>g<br />

with each other. 1 For example, let the three word experts<br />

represent drive, Joanie and car. The drive expert would send<br />

out the message: <strong>in</strong> front of me there shall be someone, beh<strong>in</strong>d<br />

me there shall be a vehicle, who fits? The Joanie expert would<br />

reply: I can stay <strong>in</strong> front of you; the car expert would reply:<br />

I can stay beh<strong>in</strong>d you. A sequence of Joanie drives car is<br />

therefore formed.<br />

<strong>Translation</strong> <strong>from</strong> the word expert perspective consists of<br />

two processes: one process is word expert pars<strong>in</strong>g <strong>from</strong> the<br />

source language <strong><strong>in</strong>to</strong> a mean<strong>in</strong>g representation, and the second<br />

process is word expert generation <strong>from</strong> the mean<strong>in</strong>g representation<br />

<strong><strong>in</strong>to</strong> the target language. We focus on the translation<br />

<strong>from</strong> <strong>German</strong> <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong>. The first process is carried out<br />

by the WOCADI parser. The second process transforms the<br />

MultiNet semantics representation [5] <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong>.<br />

III. THE CHINESE LANGUAGE AND CHINESE WORD<br />

EXPERTS<br />

The grammar for <strong>Ch<strong>in</strong>ese</strong> is totally different <strong>from</strong> the<br />

grammars of <strong>German</strong> or English: Most parts of speech can<br />

serve as both the subject and the predicate <strong>in</strong> a <strong>Ch<strong>in</strong>ese</strong><br />

sentence. Therefore, an attempt to map grammar rules between<br />

<strong>German</strong> and <strong>Ch<strong>in</strong>ese</strong> only makes th<strong>in</strong>gs complicated. We will<br />

show that the word expert perspective is a very suitable way<br />

to expla<strong>in</strong> the <strong>Ch<strong>in</strong>ese</strong> language.<br />

A. Subject and Predicate as Topic and Comment<br />

Chao [3] studied the <strong>Ch<strong>in</strong>ese</strong> language and concluded that<br />

the relation between the subject and the predicate <strong>in</strong> a <strong>Ch<strong>in</strong>ese</strong><br />

sentence is a topic-comment relation. This relation holds <strong>in</strong> all<br />

<strong>Ch<strong>in</strong>ese</strong> dialects [12], e.g., Mandar<strong>in</strong> (used <strong>in</strong> Pek<strong>in</strong>g area),<br />

Wu (used around Shanghai area), Cantonese (used <strong>in</strong> and<br />

around Canton area), and WenYan (used <strong>in</strong> ancient Ch<strong>in</strong>a).<br />

For example, <strong>in</strong> ����(��/John, ��/dead), ��/John<br />

is the topic, ��/dead is the comment, which presents a<br />

1 For simplicity, we adopt Small’s <strong>in</strong>dividual word experts view here though<br />

an abstraction of word class experts would aga<strong>in</strong> make sense.<br />

comment on ��, which means he is dead. In�����<br />

�(��/John, ��/dead, ��/father), ��/John is the topic,<br />

����(��/dead, ��/father) is the comment, which<br />

presents a comment on ��, which means his father is dead.<br />

In ���(�/water,��/boil), �/water is the topic, ��/boil<br />

comments on the water; <strong>in</strong> ����(�/book, �/read, �<br />

�/f<strong>in</strong>ish), �(book) is the topic, ���(�/read, ��/f<strong>in</strong>ish)<br />

is a comment – (I) have f<strong>in</strong>ished read<strong>in</strong>g the book.<br />

The topic-comment relation <strong>in</strong>troduces a question-answer<br />

relation between the subject and the predicate [3, p.81].<br />

Imag<strong>in</strong>e a man returns home after work, and asks his wife,<br />

��? (where (is the) rice?) his wife answers, ����(all<br />

eaten). The man <strong>in</strong>troduces a topic �/rice <strong>in</strong> a question; his<br />

wife comments on the topic by answer<strong>in</strong>g all eaten.<br />

B. Full <strong>Ch<strong>in</strong>ese</strong> Sentences<br />

A full <strong>Ch<strong>in</strong>ese</strong> sentence has a topic and a comment.<br />

1) nom<strong>in</strong>al expressions as comments: In ����(he is<br />

an American), �/he is a pronoun, ���/American is a<br />

noun. There is no l<strong>in</strong>k verb �/is between �/he and �<br />

��/American. The <strong>Ch<strong>in</strong>ese</strong> <strong>Word</strong> <strong>Expert</strong> � (shortened as<br />

‘CWE�’) asks its surround<strong>in</strong>g experts, for example, who can<br />

be my property? The CWE��� answers: I can be your<br />

property of nationality. If we view <strong>Ch<strong>in</strong>ese</strong> words as such<br />

active agents, <strong>in</strong>stead of passive data as <strong>in</strong> the traditional rulebased<br />

grammar, the l<strong>in</strong>k verb �/be is not necessary. In ���<br />

���(��/<strong>in</strong>side of the room, ��/many, ��/mosquito),<br />

the nom<strong>in</strong>al expression ���� is the predicate. The CWE�<br />

� asks: what is <strong>in</strong>side of the room? The CWE����<br />

answers: many mosquitoes. The mean<strong>in</strong>g of the sentence is<br />

there are many mosquitoes <strong>in</strong>side of the room.<br />

2) active verbs as comments: In �������(this<br />

matter has long been published), ���(this matter) is the<br />

topic, ��/publish is a verb <strong>in</strong> the active form. In <strong>Ch<strong>in</strong>ese</strong> the<br />

passive form ���(be published) may not be used to mark<br />

the passive action – a construction taken for granted <strong>in</strong> English<br />

or <strong>German</strong>. However, <strong>from</strong> the word expert perspective, <strong>Ch<strong>in</strong>ese</strong><br />

has more pragmatic efficiency <strong>in</strong> express<strong>in</strong>g mean<strong>in</strong>gs:<br />

the CWE�� asks: what can be published? the CWE���<br />

answers: this matter. So, <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong>, the passive form is <strong>in</strong>deed<br />

not necessary. In ���,��(w<strong>in</strong>e, (I) do not dr<strong>in</strong>k, tobacco,<br />

(I) smoke), �/w<strong>in</strong>e and �/tobacco are subjects, �/dr<strong>in</strong>k and<br />

�/smoke are verbs <strong>in</strong> the active form. The CWE�� asks:<br />

what not to dr<strong>in</strong>k? the CWE� answers: w<strong>in</strong>e. The CWE�<br />

asks: what to smoke? the CWE� answers: tobacco.<br />

3) adjective as comments: In � �(�/I, �/poor), the<br />

whole comment is one adjective word �. The mean<strong>in</strong>g of the<br />

sentence is I am poor. The CWE� asks: how about me? The<br />

CWE� answers: poor. In ��(�/dish, �/salty), the whole<br />

comment is one adjective word �. The CWE� asks: who can<br />

serve my property? The CWE� answers: salty. The mean<strong>in</strong>g<br />

of the sentence is the dish is salty.<br />

4) full sentences as comments: In ������(��/this,<br />

�/man, ��/ear, �/soft), the comment is a full sentence �<br />

��. The direct translation is (as for) this man, the ear is soft.<br />

The CWE��� and the CWE�� ask: what is my property?


The CWE� answers to both: soft. As the CWE� is nearer<br />

to the CWE�� than to the CWE���, its answer was first<br />

accepted by the CWE��. As a result a new word expert<br />

CWE��� is formed which means gullible. This new expert<br />

answers the question raised by CWE���. The mean<strong>in</strong>g of<br />

the sentence is this man is gullible. From this example we<br />

propose two pr<strong>in</strong>ciples for communications among <strong>Ch<strong>in</strong>ese</strong><br />

word experts as follows:<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 1: Neighborhood word experts have<br />

the priority <strong>in</strong> communication.<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 2: New word experts may appear<br />

after successful communications and play roles <strong>in</strong> communication<br />

with other experts.<br />

In � ��� ���(� �/this, �/man, � �/m<strong>in</strong>d, �<br />

�/simple), the comment is a full sentence ����. The<br />

direct translation is this man (is such that his) m<strong>in</strong>d is simple.<br />

The mean<strong>in</strong>g of the sentence is the m<strong>in</strong>d of this man is simple.<br />

With the two pr<strong>in</strong>ciples, the CWE�� first communicates<br />

with the CWE��, and forms a new word expert CWE���<br />

�, which answers the question of the CWE���. The whole<br />

process can be simulated by two question-answer rounds: -<br />

how about the m<strong>in</strong>d? -simple -how about this man? -m<strong>in</strong>d is<br />

simple.<br />

5) verbal expressions as topics: In ��,����(�/go,<br />

�/all right, �/not, �/also), verbal expressions � and ��<br />

are topics. The mean<strong>in</strong>g of the sentence is to go is all right, not<br />

to go is also all right. Primitive verbs as subjects are neither<br />

allowed <strong>in</strong> English nor <strong>in</strong> <strong>German</strong>, but very normal <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong>,<br />

and can be easily expla<strong>in</strong>ed <strong>from</strong> the word expert perspective.<br />

The CWE� asks: shall I perform? The CWE� answers: all<br />

right. The CWE�� asks: can I not perform? The CWE�<br />

answers: all right.<br />

6) Spatial-temporal expressions as topics: In ���(�<br />

�/today,�/cold), the temporal expression ��is the topic.<br />

The mean<strong>in</strong>g of the sentence is today is cold. The CWE��<br />

asks: how is today? The CWE� answers: cold. In ���<br />

��? (��/here,�/is,��/where), the spatial expression �<br />

�is the whole topic. The mean<strong>in</strong>g of the sentence is where<br />

is here? The CWE�� asks: where is here? The CWE��<br />

will communicate with word experts <strong>in</strong> the next sentence for<br />

an answer.<br />

7) conditional expressions as topics: In �������<br />

�����(�/he,��/dead,��/if,��/simply,����<br />

�/unth<strong>in</strong>kable), the topic is the conditional expression ��<br />

���. The mean<strong>in</strong>g of the sentence is the supposition that<br />

he should die is simply unth<strong>in</strong>kable. The CWE� asks: how<br />

is he? The CWE�� answers: dead. The CWE�� asks:<br />

what will be the result under what condition? The CWE��<br />

� serves the condition, the CWE���� serves the result.<br />

8) prepositional expressions as topics: In � � � � �<br />

��(�/through,��/chairman,��/convene,��/meet<strong>in</strong>g),<br />

the topic is a prepositional expression ���. The mean<strong>in</strong>g of<br />

the sentence is the meet<strong>in</strong>g is convened through the chairman.<br />

The CWE� asks: through what? by whom? The CWE�<br />

� answers: chairman. A new word expert CWE��� is<br />

formed. The CWE�� asks: who convenes what? how to<br />

convene? The CWE��� answers the how question, the<br />

CWE� � answers the who question, and the CWE� �<br />

answers the what question.<br />

9) full sentences as topics and comments: In �����<br />

��(�/he, ��/dead, �/I, �/awfully, ��/feel bad), the<br />

topic is a full sentence ���(he is dead), the comment is<br />

also a full sentence ����(I feel awfully bad). The direct<br />

translation is he is dead, I feel awfully bad. The mean<strong>in</strong>g of<br />

the sentence is that he is dead is someth<strong>in</strong>g about which I<br />

feel awfully bad. The CWE� asks: how is he? the CWE�<br />

� answers: dead. The CWE��� asks: how? what is the<br />

result? The CWE� asks: how about myself? The CWE�<br />

�� answers: awfully bad. The CWE���� answer the<br />

question: what is the result? If the sentence is ���, �<br />

�(�/he, ��/dead, ��/traffic accident), the CWE�� will<br />

answer the question of the CWE���: how come?<br />

We conclude that structures of full <strong>Ch<strong>in</strong>ese</strong> sentences violate<br />

many important grammar rules of Western languages, and<br />

that understand<strong>in</strong>g full <strong>Ch<strong>in</strong>ese</strong> sentences can be achieved by<br />

communications among <strong>Ch<strong>in</strong>ese</strong> word experts follow<strong>in</strong>g some<br />

communication pr<strong>in</strong>ciples. The mean<strong>in</strong>g of a full <strong>Ch<strong>in</strong>ese</strong><br />

sentence can be represented by a dialog process among the<br />

word experts.<br />

C. M<strong>in</strong>or Sentences<br />

In conversations, it is normal that one speaker <strong>in</strong>troduces<br />

a topic and the other makes a comment. This <strong>in</strong>troduces the<br />

term of m<strong>in</strong>or sentence [3, p.60]. In contrast to full sentence,<br />

a m<strong>in</strong>or sentence does not have both topics and comments.<br />

The above conclusion is supported by the structure of <strong>Ch<strong>in</strong>ese</strong><br />

m<strong>in</strong>or sentences <strong>in</strong> that a speaker <strong>in</strong> a conversation may only<br />

say a few words to answer the questions raised by word experts<br />

of the other speaker.<br />

D. Compound and Complex Sentences<br />

By paralleliz<strong>in</strong>g two or more sentences, we can construct<br />

compound <strong>Ch<strong>in</strong>ese</strong> sentences. For example, �����, �<br />

����(you do not know me and I do not know you) is a<br />

compound sentence by paralleliz<strong>in</strong>g two ‘A ��� B’ (A<br />

does not know B) sentences. By nest<strong>in</strong>g one full sentence<br />

either <strong>in</strong> topic or <strong>in</strong> comment, we can construct complex<br />

sentences. For example, �������(�/I,��/dead,�<br />

�/funeral,��/simple)is a complex sentence, mean<strong>in</strong>g when<br />

I die, the funeral should be simple, constructed by us<strong>in</strong>g a full<br />

sentence ���(I die) as the topic. Both of the two structures<br />

can be easily expla<strong>in</strong>ed <strong>in</strong> the word expert framework with the<br />

second communication pr<strong>in</strong>ciple (<strong>Ch<strong>in</strong>ese</strong> WE Communication<br />

2) and the third communication pr<strong>in</strong>ciple as follows.<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 3: A newly formed word expert has<br />

priority over old ones <strong>in</strong> communication.<br />

With this pr<strong>in</strong>ciple, although both CWE�� and CWE��<br />

� answer the CWE����’s conditional question, CWE�<br />

�� is newly formed and has the priority.<br />

E. Pivotal Constructions<br />

A <strong>Ch<strong>in</strong>ese</strong> sentence with a pivotal construction has normally<br />

two verbs and a nom<strong>in</strong>al expression. This nom<strong>in</strong>al expression


serves as the object of the first verb and as the subject of the<br />

second verb [3, p.124-125]. For example, <strong>in</strong> ������<br />

�(��/we, �/order, �/he, �/serve, ��/representative),<br />

there are two verbs �/send and �/serve, a pronoun�/he is<br />

the object of �/send and the subject of �/serve (<strong>in</strong> <strong>Ch<strong>in</strong>ese</strong><br />

�/he has the same form as nom<strong>in</strong>al case and as dative case).<br />

This direct translation of the sentence is that ‘we order he<br />

serve as representative’. The mean<strong>in</strong>g is that we delegate<br />

him to be representative. With<strong>in</strong> the word expert perspective,<br />

we can expla<strong>in</strong> this special construction without <strong>in</strong>troduc<strong>in</strong>g<br />

any new term<strong>in</strong>ologies (as done <strong>in</strong> the rule-based grammar<br />

theories). CWE� asks: whom is sent? CWE� asks: who<br />

serves? CWE� answers to both: �/he.<br />

IV. GERMAN-CHINESE WORD EXPERT TRANSLATORS<br />

Given a sentence <strong>in</strong> <strong>German</strong>, the WOCADI parser can be<br />

used to generate a correspond<strong>in</strong>g semantic representation <strong>in</strong><br />

the MultiNet formalism. We need to design word expert translators<br />

that communicate with each other, l<strong>in</strong>earize themselves<br />

to form a (nested) topic-comment structure, and transform the<br />

MultiNet representation <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong> sentences.<br />

We start with a simple example to <strong>in</strong>troduce the ma<strong>in</strong> idea.<br />

Suppose the <strong>German</strong> sentence is Er ist e<strong>in</strong> Deutscher (He is a<br />

<strong>German</strong>). The WOCADI parser delivers the MulitNet semantic<br />

representation as illustrated <strong>in</strong> Figure 1: er.1.1 is the word<br />

sense2 of the lexeme er/he; c275 represents the word expert<br />

for a concrete <strong>in</strong>dividual3 subord<strong>in</strong>ate (SUB) to er.1.1;<br />

similarly, c287 represents the word expert whose concept<br />

is subord<strong>in</strong>ate to deutsche.1.1/<strong>German</strong>. The word expert<br />

c278 has two arguments: one is the topic, c275, po<strong>in</strong>ted by<br />

ARG1; the second is the comment, c287, po<strong>in</strong>ted by ARG2;<br />

the temporal status (TEMP)ofc278 is present (present.0).<br />

The word expert c275 posts a message: if I am the topic,<br />

who can be my comment? The word expert c278 answers:<br />

as far as I know, the word expert c287 is your comment. If<br />

word expert c275 knows that his <strong>Ch<strong>in</strong>ese</strong> lexeme is �/he,<br />

and word expert c287 knows that his <strong>Ch<strong>in</strong>ese</strong> lexeme is �<br />

��/<strong>German</strong>, they will know that <strong>in</strong> the l<strong>in</strong>earization of the<br />

<strong>Ch<strong>in</strong>ese</strong> sentence c275 is before c287. The simplest case<br />

is: ����(He <strong>German</strong>), which is <strong>in</strong>deed a valid <strong>Ch<strong>in</strong>ese</strong><br />

sentence with the same mean<strong>in</strong>g as Er ist e<strong>in</strong> Deutscher. If<br />

word expert c278 knows that his <strong>Ch<strong>in</strong>ese</strong> lexeme is �/is, and<br />

knows that I shall stay between the topic and the comment<br />

<strong>in</strong> the <strong>Ch<strong>in</strong>ese</strong> sentence, the <strong>Ch<strong>in</strong>ese</strong> sentence will be ��<br />

���(he is <strong>German</strong>). If c278 knows that ��/now is<br />

the <strong>Ch<strong>in</strong>ese</strong> lexeme mean<strong>in</strong>g present and decides that his<br />

temporal knowledge shall also be encoded <strong>in</strong> the <strong>Ch<strong>in</strong>ese</strong><br />

sentence, the <strong>Ch<strong>in</strong>ese</strong> sentence could be �������,<br />

�������, �������, �������. Ifhe<br />

knows that his temporal label shall be <strong>in</strong> front of him, the<br />

two l<strong>in</strong>earizations will be ������� and �����<br />

��, both are valid <strong>Ch<strong>in</strong>ese</strong> sentences. These word experts<br />

are now also experts for translat<strong>in</strong>g. Activities of each word<br />

2 Our lexicon uses a double <strong>in</strong>dex<strong>in</strong>g scheme to dist<strong>in</strong>guish word senses<br />

3 Entities mentioned <strong>in</strong> the text are represented by constants c1, c2 etc.<br />

Fig. 1. MultiNet representation of the sentence Er ist e<strong>in</strong> Deutscher (he is<br />

a <strong>German</strong>)<br />

expert translator comprise: to transform its <strong>German</strong> lexeme<br />

<strong><strong>in</strong>to</strong> possible <strong>Ch<strong>in</strong>ese</strong> lexemes, to determ<strong>in</strong>e a l<strong>in</strong>ear order<strong>in</strong>g<br />

relation among <strong>Ch<strong>in</strong>ese</strong> lexemes of a s<strong>in</strong>gle word expert<br />

translator, to determ<strong>in</strong>e whether its <strong>Ch<strong>in</strong>ese</strong> lexemes shall<br />

appear <strong>in</strong> the <strong>Ch<strong>in</strong>ese</strong> translation, to establish a l<strong>in</strong>ear order<strong>in</strong>g<br />

relation with other word expert translators by communication,<br />

and to choose most appropriate <strong>Ch<strong>in</strong>ese</strong> lexemes.<br />

A. Transform<strong>in</strong>g <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong> Lexemes<br />

This task is to f<strong>in</strong>d <strong>Ch<strong>in</strong>ese</strong> lexemes for a given <strong>German</strong><br />

lexeme such that they represent the same concept. This may<br />

not be always feasible. Some <strong>German</strong> lexemes may not<br />

have a correspond<strong>in</strong>g native <strong>Ch<strong>in</strong>ese</strong> lexeme, e.g. names of<br />

Cheeses, beers, and chocolates – <strong>in</strong> <strong>German</strong>, Franziskaner<br />

can refer to a k<strong>in</strong>d of beer, and there is no correspond<strong>in</strong>g<br />

native <strong>Ch<strong>in</strong>ese</strong> lexeme. For those hav<strong>in</strong>g correspond<strong>in</strong>g native<br />

<strong>Ch<strong>in</strong>ese</strong> lexemes, these may be different <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> dialects.<br />

For example, <strong>German</strong> lexeme wir (we) can be mapped to ��<br />

<strong>in</strong> Mandar<strong>in</strong>, �� <strong>in</strong> Shanghai dialect, �� <strong>in</strong> Canton dialect.<br />

Chao [3] suggest that a complete lexicon shall be constructed<br />

to make selection applicable <strong>in</strong> grammar. For the translation<br />

<strong>from</strong> <strong>German</strong> <strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong>, we need to embed the <strong>Ch<strong>in</strong>ese</strong><br />

lexical ontologies system <strong><strong>in</strong>to</strong> the <strong>German</strong> lexical ontologies<br />

system, and mark each <strong>Ch<strong>in</strong>ese</strong> lexeme with its dialect group,<br />

while neglect<strong>in</strong>g f<strong>in</strong>e <strong>Ch<strong>in</strong>ese</strong> ontologies lexemes which do<br />

not have correspond<strong>in</strong>g <strong>German</strong> lexemes. <strong>German</strong> lexemes<br />

which do not have native <strong>Ch<strong>in</strong>ese</strong> lexemes will be translated<br />

separately.<br />

B. Lexeme Order<strong>in</strong>g with<strong>in</strong> a <strong>Word</strong> <strong>Expert</strong> Translator<br />

A word expert <strong>in</strong> the analysis of a <strong>German</strong> sentence may<br />

have a temporal property represent<strong>in</strong>g the tense. In <strong>Ch<strong>in</strong>ese</strong><br />

tense is achieved by particles: � � � � �, such as �<br />

���� (er hat gegessen/he has eaten). The word expert<br />

essen.1.1/eat has a temporal property past.0, which shall<br />

be translated <strong><strong>in</strong>to</strong> the <strong>Ch<strong>in</strong>ese</strong> particle � and �. The correspond<strong>in</strong>g<br />

<strong>Ch<strong>in</strong>ese</strong> lexemes of word expert essen.1.1 are ��<br />

���, whose l<strong>in</strong>ear order<strong>in</strong>g <strong>in</strong> <strong>Ch<strong>in</strong>ese</strong> sentence is stated<br />

as follows.<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 4: Let L be a word expert which<br />

may have � or �as particle. (1) if � is the only particle,<br />

it must occur directly after L; (2) if � is the only particle, it


Fig. 2. MultiNet representation of the sentence Das K<strong>in</strong>d hat Angst und<br />

fängt an zu we<strong>in</strong>en (The Child is scared and beg<strong>in</strong>s to cry)<br />

shall be after L; (3) if both � and �are particles, � shall be<br />

before �besides obey<strong>in</strong>g rules (1) and (2); (4) if � is used<br />

twice, besides obey<strong>in</strong>g rule (2), they must be separated by<br />

lexemes of another word expert, and one must directly follow<br />

L.<br />

With the above pr<strong>in</strong>ciple, the follow<strong>in</strong>g order<strong>in</strong>gs are all<br />

understandable translations: ���– (1), ���– (2), ��<br />

�– (2), ����– (1)(2)(3), ����– (1)(2)(3), ���<br />

�– (2)(4).<br />

C. Determ<strong>in</strong><strong>in</strong>g to Appear or Not<br />

Not all the word experts of a <strong>German</strong> sentence shall<br />

appear <strong>in</strong> the translated <strong>Ch<strong>in</strong>ese</strong> sentence. For example, the<br />

<strong>German</strong> sentence Das K<strong>in</strong>d hat Angst und fängt an zu we<strong>in</strong>en<br />

(The Child is scared and beg<strong>in</strong>s to cry) shall be translated<br />

<strong><strong>in</strong>to</strong> <strong>Ch<strong>in</strong>ese</strong> �����(��/K<strong>in</strong>d/Child, �/Angst/scare,<br />

�/we<strong>in</strong>en/cry, �/particle), as illustrated <strong>in</strong> Figure 2. The follow<strong>in</strong>g<br />

word experts appear <strong>in</strong> the translated <strong>Ch<strong>in</strong>ese</strong> sentence:<br />

k<strong>in</strong>d.1.1/child (c97), we<strong>in</strong>en.1.1/cry (c138), angst.1.1/scare<br />

(c101), and haben.1.1/have (c98). The carrier (SCAR) of<br />

haben.1.1/have is k<strong>in</strong>d.1.1/child, which is the actor (AGT)<br />

of both we<strong>in</strong>en.1.1/cry and anfangen.1.2/beg<strong>in</strong>. The affected<br />

object (AFF) of anfangen.1.2/beg<strong>in</strong> is we<strong>in</strong>en.1.1/cry, which<br />

is the second argument (ARG2) of anfangen.1.1. 4<br />

Consider<strong>in</strong>g the topic-comment relations among the word<br />

experts, we f<strong>in</strong>d that the comment of angst.1.1/scare is<br />

we<strong>in</strong>en.1.1/cry, and that the two word experts anfangen.1.2/beg<strong>in</strong><br />

and anfangen.1.1/beg<strong>in</strong> simply duplicate this<br />

topic-comment relation. Therefore, they shall not appear <strong>in</strong><br />

the translated <strong>Ch<strong>in</strong>ese</strong> sentence – the redundancy is only an<br />

artifact of our deep semantic analysis. A pr<strong>in</strong>ciple is stated as<br />

follows.<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 5: A word expert shall not appear<br />

<strong>in</strong> the translated <strong>Ch<strong>in</strong>ese</strong> sentence, if it duplicates an exist<strong>in</strong>g<br />

topic-comment relation.<br />

4 This analysis reflects that if someone starts someth<strong>in</strong>g (anfangen.1.1), then<br />

this start<strong>in</strong>g actions causes someth<strong>in</strong>g to start (anfangen.1.2).<br />

D. Order<strong>in</strong>g of <strong>Word</strong> <strong>Expert</strong> Translators<br />

For those word experts that will appear <strong>in</strong> the translated<br />

<strong>Ch<strong>in</strong>ese</strong> sentence, a nested topic-comment order<strong>in</strong>g shall<br />

be constructed. As we have semantic representations <strong>from</strong><br />

WOCADI and lexemes tagged with semantic roles, topiccomment<br />

relations among word experts are not difficult to<br />

obta<strong>in</strong> – we only need to exam<strong>in</strong>e the description of the lexeme<br />

and the possible MultiNet constructions, to see which elements<br />

can be the comment of which other elements <strong>in</strong> the mean<strong>in</strong>g<br />

representation. In Figure 2, angst.1.1/scare is the topic for<br />

we<strong>in</strong>en.1.1/cry, so we have a list (WE-angst.1.1/scare WEwe<strong>in</strong>en.1.1/cry),<br />

k<strong>in</strong>d.1.1/child is the topic for angst.1.1/scare,<br />

so we have (WE-k<strong>in</strong>d.1.1 (WE-angst.1.1 WE-we<strong>in</strong>en.1.)). By<br />

flatten<strong>in</strong>g this nested structure, we have a l<strong>in</strong>ear order<strong>in</strong>g of the<br />

word experts: (WE-k<strong>in</strong>d.1.1 WE-angst.1.1 WE-we<strong>in</strong>en.1.1).<br />

<strong>Ch<strong>in</strong>ese</strong> sentences can be obta<strong>in</strong>ed by replac<strong>in</strong>g each word expert<br />

with their <strong>Ch<strong>in</strong>ese</strong> lexeme(s). For example, �����,<br />

�����, and ������ are all understandable translations,<br />

where � is the particle of �, whose order<strong>in</strong>g follows<br />

<strong>Ch<strong>in</strong>ese</strong> WE Pr<strong>in</strong>ciple 4.<br />

E. Choos<strong>in</strong>g <strong>Ch<strong>in</strong>ese</strong> Lexemes<br />

A word expert may have more than one correspond<strong>in</strong>g<br />

<strong>Ch<strong>in</strong>ese</strong> lexeme even <strong>in</strong> one <strong>Ch<strong>in</strong>ese</strong> dialect. Communications<br />

among word experts are required to select the most suitable<br />

one, or to delete <strong>in</strong>compatible ones. For example, the word<br />

expert e<strong>in</strong>.1.1/a <strong>in</strong> the <strong>German</strong> phrase e<strong>in</strong> Baum (a tree) can be<br />

mapped to �����������.... By communicat<strong>in</strong>g<br />

with the word expert Baum.1.1/tree, word expert e<strong>in</strong>.1.1/a<br />

knows that it can only be mapped to ��. This requires word<br />

experts of countable objects to have measurement <strong>in</strong>formation.<br />

F. A Slow Intelligence (SIS) Workflow<br />

The word expert translation system can be organized <strong>in</strong><br />

the slow <strong>in</strong>telligence architecture [2]. Each word expert that<br />

occurs <strong>in</strong> the <strong>German</strong> parse is a unit slow <strong>in</strong>telligence system.<br />

It enumerates possible <strong>Ch<strong>in</strong>ese</strong> lexemes, determ<strong>in</strong>es possible<br />

l<strong>in</strong>ear order<strong>in</strong>gs among themselves, and the topic-comment<br />

relation of its arguments. <strong>Word</strong> experts communicate among<br />

themselves and form a larger slow <strong>in</strong>telligence system: nested<br />

topic-comment structures are firstly enumerated, then duplicated<br />

topic-comment relation will be removed, <strong>in</strong>correct<br />

lexemes will be pruned, order<strong>in</strong>g of particles will be l<strong>in</strong>earized<br />

with lexemes of other word experts. At last possible <strong>Ch<strong>in</strong>ese</strong><br />

sentences will be produced.<br />

For the <strong>German</strong> sentence Ich fällte e<strong>in</strong>en Baum mit e<strong>in</strong>er Axt<br />

(I cut down a tree with an ax), the MultiNet analysis results<br />

<strong>in</strong> four word experts: WE-ich.1.1/I (abbreviated WE-I <strong>in</strong> the<br />

follow<strong>in</strong>g), WE-baum.1.1/tree (WE-tree), WE-axt.1.1/ax (WEax),<br />

and WE-fällen.1.1/cut-down (WE-cut-down). The system<br />

first creates one CWE for each word expert WE. All the<br />

CWEs are unit SIS, search<strong>in</strong>g for possible <strong>Ch<strong>in</strong>ese</strong> lexemes<br />

and enumerat<strong>in</strong>g l<strong>in</strong>ear orders based on CWE pr<strong>in</strong>ciples. In our<br />

current <strong>German</strong>-<strong>Ch<strong>in</strong>ese</strong> dictionary CWE-I has one <strong>Ch<strong>in</strong>ese</strong><br />

lexeme: �; CWE-cut-down has five <strong>Ch<strong>in</strong>ese</strong> lexemes: ��<br />

� � � � � � � � � � and two particles � and �;


Fig. 3. <strong>Word</strong> expert translation results for the <strong>in</strong>put sentence Ich fällte e<strong>in</strong>en Baum mit e<strong>in</strong>er Axt<br />

CWE-tree has one <strong>Ch<strong>in</strong>ese</strong> lexeme: ���; CWE-ax has<br />

one <strong>Ch<strong>in</strong>ese</strong> lexeme: �. Only CWE-cut-down need to carry<br />

out an enumeration-elim<strong>in</strong>ation process to l<strong>in</strong>earize orders of<br />

each <strong>Ch<strong>in</strong>ese</strong> lexeme and particles. The orders among the<br />

five CWEs will be enumerated based on the topic-comment<br />

relations as follows: (1) CWE-I is the actor of CWE-cutdown,<br />

therefore, CWE-I is the topic, CWE-cut-down is the<br />

comment; (2) CWE-tree is the affected object of CWE-cutdown,<br />

therefore, CWE-cut-down is the topic and CWE-tree is<br />

the comment; (3) CWE-ax is analysed as the association of<br />

CWE-tree, whose features delivered <strong>from</strong> WOCADI parser are<br />

different: CWE-ax is an <strong>in</strong>strument, while CWE-tree is a non<strong>in</strong>strument.<br />

Therefore, CWE-ax is the <strong>in</strong>strument commented<br />

both on the action and on the actor. This <strong>in</strong>troduces the particle<br />

�/us<strong>in</strong>g before the lexeme of CWE-ax: ��(us<strong>in</strong>g an ax).<br />

Two nested topic-comment structures are (CWE-I ((CWEcut-down<br />

CWE-tree) CWE-ax) and (CWE-I (CWE-ax (CWEcut-down<br />

CWE-tree)). In the adaptation phase CWEs will<br />

communicate with each other to prune lexemes. �� and �<br />

� of CWE-cut-down are removed, as they <strong>in</strong>compatible with<br />

the <strong>in</strong>strument CWE-ax. A demo translation system has been<br />

developed. Its translation results are illustrated <strong>in</strong> Figure 3.<br />

V. CONCLUSIONS<br />

The work reported <strong>in</strong> this paper is ma<strong>in</strong>ly based on a classic<br />

research work on <strong>Ch<strong>in</strong>ese</strong> grammar [3]. We argue that the<br />

<strong>Ch<strong>in</strong>ese</strong> language structure is more suitable to be understood<br />

<strong>from</strong> the word expert perspective than <strong>from</strong> the traditional rulebased<br />

perspective. The mach<strong>in</strong>e translation <strong>from</strong> <strong>German</strong> <strong><strong>in</strong>to</strong><br />

<strong>Ch<strong>in</strong>ese</strong> is outl<strong>in</strong>ed <strong>from</strong> the word expert perspective. A slow<br />

<strong>in</strong>telligence workflow for mach<strong>in</strong>e translation is illustrated<br />

with an example.<br />

ACKNOWLEDGMENTS<br />

Hermann Helbig <strong>in</strong>troduced us the topic and critically<br />

commented on the draft paper; Peil<strong>in</strong>g Cui criticized the<br />

<strong>Ch<strong>in</strong>ese</strong> translations. F<strong>in</strong>ancial support <strong>from</strong> DFG is gratefully<br />

acknowledged.<br />

REFERENCES<br />

[1] P. F. Brown and S. A. Della Pietra J. Cocke. A Statistical Approach to<br />

Mach<strong>in</strong>e <strong>Translation</strong>. Computational L<strong>in</strong>guistics, 16(2):79–85, 1990.<br />

[2] S. K. Chang. A general framework for slow <strong>in</strong>telligence systems. International<br />

Journal of Software Eng<strong>in</strong>eer<strong>in</strong>g and <strong>Knowledge</strong> Eng<strong>in</strong>eer<strong>in</strong>g,<br />

20(1):1–18, 2010.<br />

[3] Y. R. Chao. A Grammar of Spoken <strong>Ch<strong>in</strong>ese</strong>. University of California<br />

Press, 1968.<br />

[4] I. Glöckner and B. Pelzer. The LogAnswer project at ResPubliQA 2010.<br />

In CLEF 2010 Work<strong>in</strong>g Notes, September 2010.<br />

[5] H. Helbig. <strong>Knowledge</strong> <strong>Knowledge</strong> Representation and the Semantics of<br />

Natural Language. Spr<strong>in</strong>ger-Verlag, 2006.<br />

[6] H. Helbig and S. Hartrumpf. <strong>Word</strong> class functions for syntactic-semantic<br />

analysis. In Proceed<strong>in</strong>gs of the 2nd International Conference on Recent<br />

Advances <strong>in</strong> Natural Language Process<strong>in</strong>g, pages 312–317. Tzigov<br />

Chark, Bulgaria, 1997.<br />

[7] H. Helig. Syntactic-semantic analysis of natural language by a new<br />

word-class controlled functional analysis. Computers and Artificial<br />

Intelligence, 5(1):53–59, 1986.<br />

[8] V. Mihailevschi. Mach<strong>in</strong>e <strong>Translation</strong> Interl<strong>in</strong>gua Based on MultiNet.<br />

VDM Verlag Dr. Müller, 2008.<br />

[9] C. Rieger. View<strong>in</strong>g Pars<strong>in</strong>g as <strong>Word</strong> Sense Discrim<strong>in</strong>ation. In D<strong>in</strong>gwall,<br />

editor, A Survey of L<strong>in</strong>guistic Science. Greylock Publishers, 1977.<br />

[10] S. Small. <strong>Word</strong> <strong>Expert</strong> Pars<strong>in</strong>g: A Theory of Distributed <strong>Word</strong>-Based<br />

Natural Language Understand<strong>in</strong>g. PhD thesis, Department of Computer<br />

Science, University of Maryland, 1980.<br />

[11] S. Small. View<strong>in</strong>g <strong>Word</strong> <strong>Expert</strong> Pars<strong>in</strong>g as L<strong>in</strong>guistic Theory. In<br />

Proceed<strong>in</strong>gs of the 7th International Jo<strong>in</strong>t Conference on Artificial<br />

Intelligence, pages 70–76. Erlbaum, Hillsdale, 1981.<br />

[12] L. Wang. History of <strong>Ch<strong>in</strong>ese</strong> Language. Zhonghua Book Co. Press,<br />

1980. ��, ����, ������.<br />

[13] Y. Wilks. Mak<strong>in</strong>g Preferences More Active. AI Memo 206, Artificial<br />

Intelligence Laboratory, Stanford University, 1973.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!