CNGL Annual Report 2012
CNGL Annual Report 2012
CNGL Annual Report 2012
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>CNGL</strong> ANNUAL REPORT <strong>2012</strong>
<strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
Director: Prof. Josef Van Genabith<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Deputy Director: Prof. Vincent Wade<br />
School of Computer Science and Statistics<br />
Trinity College Dublin<br />
Dublin 2<br />
Associate Director: Dr. Páraic Sheridan<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
INFO@<strong>CNGL</strong>.IE<br />
WWW.<strong>CNGL</strong>.IE<br />
Dublin City University Trinity College Dublin University College Dublin University of Limerick
Preface<br />
THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>) IS A CENTRE FOR SCIENCE ENGINEERING<br />
AND TECHNOLOGY (CSET) FUNDED BY SCIENCE FOUNDATION IRELAND (SFI) AND INDUSTRY PARTNERS.<br />
Centres for Science, Engineering and Technology (CSETs) help link scientists and engineers in partnerships across<br />
academia and industry to address crucial research questions, foster the development of new and existing Irish-based<br />
technology companies, attract industry that could make an important contribution to Ireland and its economy, and<br />
expand educational and career opportunities in Ireland in science and engineering. CSETs are expected to exhibit<br />
outstanding research quality, intellectual breadth, active collaboration, flexibility in responding to new research<br />
opportunities, and integration of research and education in the fields that SFI supports. Science Foundation Ireland<br />
(SFI) is a key organisation in the implementation of Ireland’s National Development Plan (NDP 2007-2013) and the<br />
Strategy for Science, Technology and Innovation 2006-2013. A sum of €8.2 billion has been allocated for scientific<br />
research under the NDP and SSTI of which SFI has responsibility to invest €1.4 billion. SFI will continue to invest in<br />
academic researchers and research teams who are most likely to generate new knowledge, leading edge technologies<br />
and competitive enterprises in the fields of science and engineering.<br />
SFI Vision<br />
Ireland will be a global knowledge leader that places scientific and engineering research at the core of its society<br />
to power economic development and social progress.<br />
This centre is supported by Science Foundation Ireland (grant 07/CE/I1142)<br />
and the National Development Plan 2007–2013.<br />
Science Foundation Ireland<br />
National Development Plan<br />
2007-2013
Table of Contents<br />
Executive Summary 5<br />
CSET Leadership 7<br />
Management Team Biosketches 9<br />
<strong>CNGL</strong> Overview 17<br />
Integrated Language Technologies 27<br />
Digital Content Management 45<br />
Next Generation Localisation 57<br />
Systems Framework 71<br />
Year 5 Demonstrator Programme 81<br />
Industry Partnerships and Technology Transfer 89<br />
Management and Governance 99<br />
Education and Outreach 107<br />
Appendix 1: People and Partnerships 115<br />
Appendix 2: Outputs 124
Executive Summary
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 5<br />
Executive Summary<br />
“Our work is guided by the vision of enabling people to interact with content, products, services and other people<br />
in their own language, according to their own culture, and according to their own personal needs.”<br />
Localisation is the process of adapting digital content<br />
to culture, locale and linguistic environment. It is<br />
a key enabling multiplier technology of the global<br />
manufacturing, software, services and content creation<br />
and distribution industries, unlocking markets otherwise<br />
unavailable. Localisation has a social dimension as<br />
many communities find themselves on the wrong side<br />
of the “digital divide” with vital information (health,<br />
hygiene, food, education etc.) not available in the local<br />
languages, with potentially disastrous consequences.<br />
Localisation technologies and processes can make<br />
a significant contribution to bridging this divide.<br />
The <strong>CNGL</strong> partnership has focused on both the<br />
commercial and the societal dimensions of localisation,<br />
concentrating on the challenges of volume, access and<br />
personalisation. Volume: the amount of content to be<br />
localised massively outstrips human translation capacity.<br />
Access: mobile devices enable ubiquitous access to<br />
perishable and frequently updated information on the<br />
go, involving interaction modalities such as speech and<br />
image, corporate as well as user-generated content.<br />
Personalisation: information is most useful if adapted to<br />
the user, device, background information, knowledge<br />
and task at hand. In terms of a slogan: “the person is the<br />
ultimate locale”.<br />
Over the last five years (2007-<strong>2012</strong>) <strong>CNGL</strong> has made<br />
strident progress connecting the localisation industry<br />
with cutting-edge research in language technologies,<br />
content management, workflow, community and human<br />
factors and software engineering: today the question is<br />
no longer whether or not to use machine translation but<br />
how best to. Today the question is no longer whether or<br />
not to use user-generated content in customer support,<br />
but how best to. Today the question is no longer whether<br />
or not to use collaborative community-based localisation<br />
models, but how best to. These step-changes are based<br />
on scientific progress. Over its first funding period <strong>CNGL</strong><br />
has produced more than 400 peer-reviewed research<br />
papers, 21 PhD students, 39 innovation and software<br />
disclosures, 9 patent applications and secured €15.8m<br />
additional research income growing the <strong>CNGL</strong> research<br />
eco-system.<br />
Key to the success of <strong>CNGL</strong> is close collaboration with<br />
the <strong>CNGL</strong> industry partners, focusing and sharpening the<br />
research. Without this, the step-change in localisation<br />
would not have been possible. Taking research out of the<br />
lab is a core objective of <strong>CNGL</strong>: to date 4 <strong>CNGL</strong> start-up<br />
and spin-out companies including Xcelerator Machine<br />
Translations, Digital Linguistics, Scream Technologies<br />
and Emizar and the not-for-profit social localisation<br />
Rosetta Foundation are strong testimony to this.<br />
Additionally, spinout candidate Wripl is preparing for<br />
launch in 2013.<br />
<strong>CNGL</strong> is preparing for the future: <strong>2012</strong> saw the successful<br />
<strong>CNGL</strong>II application coordinated and led by <strong>CNGL</strong> Deputy<br />
Director Prof. Vincent Wade secure core SFI funding<br />
of €10.5M for the next 30 months. <strong>CNGL</strong>II focuses on<br />
Global Intelligent Content based on the concept of the<br />
Global Content Value Chain, where services interact<br />
with content to make it self-describing, self-aware<br />
and self-adapting across language barriers, modalities<br />
and interaction platforms, tuned to context and user.<br />
Prof. Wade will take over as <strong>CNGL</strong> Director in March<br />
2013. Prof. Wade is an experienced and accomplished<br />
international research leader. Please give him all your<br />
support.<br />
To conclude, I would like to say to all our research<br />
students, postdoctoral researchers, principal<br />
investigators, technical, operations and education and<br />
outreach team staff, to all our industry partners, all our<br />
start-up companies and the researchers and staff in our<br />
extended <strong>CNGL</strong> research eco-system: thank you! You<br />
make this happen!<br />
Prof. Josef van Genabith<br />
Director, <strong>CNGL</strong>
CSET Leadership
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 7<br />
CSET Leadership<br />
CSET Contact Information<br />
<strong>CNGL</strong><br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6700<br />
Fax: +353 1 700 6702<br />
Email: info@cngl.ie<br />
Management Team<br />
Director, Co-Leader: Integrated<br />
Language Technologies Track<br />
Prof. Josef van Genabith<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6700<br />
Fax: +353 1 700 6702<br />
Email: josef@computing.dcu.ie<br />
Deputy Director, Track Leader:<br />
Digital Content Management<br />
Prof. Vincent Wade<br />
Department of Computer Science and Statistics<br />
Trinity College Dublin<br />
Dublin 2<br />
Phone: +353 1 896 1765<br />
Fax: +353 1 677 2204<br />
Email: vincent.wade@cs.tcd.ie<br />
Associate Director<br />
Dr. Páraic Sheridan<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6706<br />
Fax: +353 1 700 6702<br />
Email: psheridan@computing.dcu.ie<br />
Track Leaders<br />
Co-Track Leader:<br />
Integrated Language Technologies<br />
Prof. Nick Campbell<br />
Centre for Language and Communication Studies<br />
Trinity College Dublin<br />
Dublin 2<br />
Phone: +353 1 896 1626<br />
Fax: +353 1 896 2941<br />
Email: nick.campbell@tcd.ie<br />
Track Leader:<br />
Systems Framework<br />
Dr. Saturnino Luz<br />
School of Computer Science and Statistics<br />
Trinity College Dublin<br />
Dublin 2<br />
Phone: +353 1 896 3686<br />
Fax: +353 1 677 2204<br />
Email: luzs@cs.tcd.ie<br />
Track Leader:<br />
Next Generation Localisation<br />
Mr. Reinhard Schäler<br />
Department of Computer Science<br />
and Information Systems<br />
University of Limerick<br />
Limerick<br />
Phone: +353 61 202 881<br />
Fax: +353 61 202 734<br />
Email: reinhard.schaler@ul.ie<br />
OPERATIONS TEAM<br />
Commercial Development Manager<br />
Mr. Steve Gotz<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6710<br />
Fax: +353 1 700 6702<br />
Email: sgotz@computing.dcu.ie
8<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
CSET LEADERSHIP<br />
LRC Administrator<br />
Ms. Geraldine Harrahill<br />
Department of Computer Science<br />
and Information Systems<br />
University of Limerick<br />
Limerick<br />
Phone: +353 61 202 881<br />
Fax: +353 61 202 734<br />
Email: geraldine.harrahill@ul.ie<br />
Financial Administrator<br />
Ms. Fiona Maguire<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6708<br />
Fax: +353 1 700 6702<br />
Email: fmaguire@computing.dcu.ie<br />
Centre Administrator<br />
Ms. Sophie Matabaro<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6707<br />
Fax: +353 1 700 6702<br />
Email: smatabaro@computing.dcu.ie<br />
Centre Secretary<br />
Ms. Eithne McCann<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6700<br />
Fax: +353 1 700 6702<br />
Email: emccann@computing.dcu.ie<br />
Project Manager<br />
Ms. Hilary McDonald<br />
School of Computer Science and Statistics<br />
O’Reilly Institute<br />
Trinity College Dublin<br />
Dublin 2<br />
Phone: +353 1 896 4244<br />
Fax: +353 1 677 2204<br />
Email: mcdonah@scss.tcd.ie<br />
Intellectual Property Manager<br />
Mr. Stephen Roantree<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6720<br />
Fax: +353 1 700 6702<br />
Email: sroantree@computing.dcu.ie<br />
Systems Administrator<br />
Mr. Joachim Wagner<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6915<br />
Fax: +353 1 700 6702<br />
Email: jwagner@computing.dcu.ie<br />
Education and Outreach Team<br />
Education and Outreach Manager<br />
Ms. Cara Greene<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6704<br />
Fax: +353 1 700 6702<br />
Email: cgreene@computing.dcu.ie<br />
Marketing and Communications Officer<br />
Ms. Laura Grehan<br />
School of Computing<br />
Dublin City University<br />
Dublin 9<br />
Phone: +353 1 700 6705<br />
Fax: +353 1 700 6702<br />
Email: lgrehan@computing.dcu.ie<br />
LRC Manager<br />
Mr. Karl Kelly<br />
Department of Computer Science<br />
and Information Systems<br />
University of Limerick<br />
Limerick<br />
Phone: +353 61 202 748<br />
Fax: +353 61 202 734<br />
Email: karl.kelly@ul.ie
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 9<br />
Management Team Biosketches<br />
publications (including publications in the journals of<br />
Computational Linguistics, Machine Translation, Artificial<br />
Intelligence, Research on Language and Computation,<br />
Natural Language Engineering and the ACL, EACL,<br />
COLING, EMNLP and IJCNLP conferences).<br />
Research Interests<br />
Prof. van Genabith works on localisation, machine<br />
translation, multilingual treebank-based deep grammar<br />
acquisition, and statistical parsing and generation.<br />
Career Highlights<br />
Centre Director and Co-Leader,<br />
Integrated Technologies Track:<br />
Prof. Josef van Genabith<br />
Department: School of Computing<br />
University: Dublin City University<br />
Brief Biography<br />
Prof. Josef van Genabith is the founder and Director<br />
of the Centre for Next Generation Localisation (<strong>CNGL</strong>)<br />
and an Associate Professor in DCU School of Computing.<br />
He graduated in Electronic Engineering and English<br />
at RWTH Aachen (Germany) in 1988 and received his<br />
PhD in Linguistics from the University of Essex (U.K.)<br />
in 1993. He worked as a researcher at the University of<br />
Essex (1991–1992) and at the Institut für Maschinelle<br />
Sprachverarbeitung IMS, Universität Stuttgart (Germany)<br />
(1992–1996). He joined the School of Computing at<br />
DCU as Lecturer in 1996, became Senior Lecturer in<br />
1999 and Associate Professor in 2002. He was Chair<br />
of the Programme Board for the B.Sc. in Applied<br />
Computational Linguistics (DCU) 1997–2001. In 2001<br />
he became Director of the National Centre for Language<br />
Technology (NCLT) and developed the NCLT to its<br />
current 40+ members, and research grant income of<br />
over €5M since 2001 (excluding <strong>CNGL</strong>). He has been<br />
leading Science Foundation Ireland (SFI), Enterprise<br />
Ireland (EI) and European Union (EU) funded research<br />
projects and was awarded an SFI Principal Investigator<br />
award in 2004. He became a Visiting Researcher at IBM’s<br />
Dublin Center for Advanced Studies (CAS) in 2003 and<br />
a Faculty Fellow in 2004. He has graduated 18 PhD<br />
and 6 M.Sc. by Research students. He is (joint) author<br />
of more than 150 peer-reviewed international research<br />
} <strong>2012</strong>: General Chair COLING 2014, Dublin, Ireland<br />
} <strong>2012</strong>: Recipient of the DCU <strong>2012</strong> President’s<br />
Research Award for Science and Engineering<br />
} 2010–present: META-NET (Multilingual Europe<br />
Technology Alliance EU Network of Excellence)<br />
Executive Board and Technology Council member<br />
} 2007–present: Advisory Board, European Association<br />
for Computational Linguistics (EACL)<br />
} 2007–present: Director and Lead-PI of SFI <strong>CNGL</strong><br />
CSET Award €16.8M<br />
} 2005–present: Faculty Fellow, IBM Center for<br />
Advanced Studies (CAS), Dublin<br />
} 2004–2005: Visiting Scientist, IBM Center for<br />
Advanced Studies (CAS), Dublin<br />
} 2004–2009: SFI Principal Investigator, Science<br />
Foundation Ireland, GramLab, €839K<br />
} 2001–2008: Director, National Centre for Language<br />
Technology (NCLT), DCU<br />
} 1997–2001: Chair of Programme Board, B.Sc. in<br />
Applied Computational Linguistics (ACL), DCU<br />
School of Computing
10<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT TEAM BIOSKETCHES<br />
Deputy Centre Director, Track Leader:<br />
Digital Content Management:<br />
Prof. Vincent P. Wade<br />
Department: Discipline of Intelligent Systems,<br />
School of Computer Science and Statistics<br />
University: Trinity College Dublin<br />
Brief Biography<br />
Prof. Vincent Wade is Deputy Director of the Centre<br />
for Next Generation Localisation (<strong>CNGL</strong>) and Head of<br />
the Discipline of Intelligent Systems at the School of<br />
Computer Science and Statistics, Trinity College Dublin.<br />
The Discipline of Intelligent Systems comprises four<br />
research groups: the Knowledge and Data Engineering<br />
Group, the Computational Linguistics Group, the<br />
Graphics Vision and Visualisation Group, and the<br />
Artificial Intelligence Group. The Discipline comprises<br />
21 academics and more than 150 full-time postgraduate<br />
(PhD) students and research fellows.<br />
Prof. Wade graduated from UCD with a B.Sc. (Hons)<br />
in Computer Science (1987) and received his M.Sc.<br />
and PhD postgraduate degrees in Computer Science<br />
from TCD. He holds the position of Associate Professor<br />
in the School of Computer Science and Statistics and<br />
in 2002 was awarded Fellowship of Trinity College for<br />
his contribution to research in the areas of knowledge<br />
management and adaptive technologies. In 1999 he<br />
founded the Centre for Learning Technology, which<br />
has pioneered the innovation and development of<br />
eLearning technologies in the University. He was also<br />
awarded the position of Visiting Scientist in the Center<br />
for Advanced Studies at IBM for his research in adaptive<br />
hypermedia and knowledge management (2005-2008).<br />
He was Research Director of the Knowledge and Data<br />
Engineering Research Group (1995-2007).<br />
Prof. Wade is author of over 150 scientific papers<br />
in peer-reviewed research journals and international<br />
conferences and has received eight ‘best paper’ awards<br />
for publications in IEEE, IFIP and AACE Conferences<br />
within the last nine years. He has been guest editor of<br />
IEEE Communications as well as a reviewer for many<br />
IEEE and ACM journals including IEEE Communications,<br />
IEEE Network, IEEE Intelligent Systems, ACM Transaction<br />
on the Web, and IEEE Transactions on Learning<br />
Technologies. Prof. Wade is a scientific programme<br />
member for many prestigious international conferences<br />
including IEEE’s IM and NOMS, ACM Hypertext and<br />
WWW Conference series. He was co-chair of the<br />
Adaptive Hypermedia Conference (AH2006) that<br />
was held in Dublin in June 2006, and General Cochair<br />
for IEEE IM 2011, which was held at TCD in May<br />
2011. He has been responsible for fourteen major EU<br />
research projects under the EU ACTS and IST Research<br />
Programmes as well as national research projects<br />
funded under the SFI PI Programme, HEA PRTLI and<br />
several Science Foundation Ireland/Enterprise Ireland<br />
Technology Innovation Development Awards. He has<br />
been responsible for the commercialisation of research<br />
and is a co-founder of ‘Empower The User’, an innovative<br />
start-up company in the area of personalisation and soft<br />
skills training.<br />
Research Interests<br />
Prof. Wade’s research interests focus on Knowledge<br />
Engineering research, in particular adaptive web systems,<br />
dynamic personalisation, adaptive management and<br />
control systems, and process management. His research<br />
has been applied in several technology application<br />
areas including eLearning and Management Systems<br />
for next generation networks and distributed services.<br />
Since 1991, he has been TCD’s Principal Investigator for<br />
over fifteen EU research projects under the EU RACE,<br />
Telematics, ESPRIT, ACTS, and IST research programmes.<br />
He was also PI for ADAPT (2005–2007) and Pudecas<br />
(2005–2007), funded under the Technology Innovation<br />
Research Programme (Enterprise Ireland) and PI for the<br />
HEA-sponsored MZONES project (2002–2006).
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 11<br />
Associate Director:<br />
Dr. Páraic Sheridan<br />
Department: School of Computing<br />
University: Dublin City University<br />
Brief Biography<br />
Dr. Páraic Sheridan is Associate Director at <strong>CNGL</strong>.<br />
He received his B.Sc. degree (1st class honours) in<br />
Computer Applications from Dublin City University<br />
(DCU) in 1989. He then completed an M.Sc. degree<br />
in Computer Applications at DCU by research in 1991,<br />
studying the use of Natural Language Processing in<br />
Information Retrieval. This was followed in 1994 by an<br />
M.S. degree in Computational Linguistics at Carnegie<br />
Mellon University (CMU) in Pittsburgh, PA. His study<br />
at CMU was funded by Claris Corporation (Dublin) for<br />
whom he researched the use of Translation Memories<br />
in the software localisation process. He completed his<br />
doctoral work in 1998 at the Swiss Federal Institute of<br />
Technology (ETH) Zürich with a dissertation on the topic<br />
of Cross-Language Information Retrieval. While at ETH<br />
he also helped develop the SPIDER information retrieval<br />
system which was commercialised and spun out from<br />
ETH into the EuroSpider company.<br />
Dr. Sheridan then joined TextWise LLC, a start-up<br />
company in Syracuse, NY which was a spin-out from<br />
Syracuse University-based on research by Prof. Elizabeth<br />
Liddy in the area of Natural Language Processing and<br />
Information Retrieval. Over the course of a 10-year career<br />
at TextWise, Dr. Sheridan held a variety of positions in<br />
research management, programme management and<br />
product management, ultimately achieving the position<br />
of Chief Scientist at the company. This reflected his work<br />
on the CINDOR cross-language search system, initially<br />
as a government-funded research project which was<br />
then commercialised and marketed by TextWise in the<br />
enterprise search space. Dr. Sheridan also led the effort<br />
in adapting the CINDOR product to the needs of the<br />
U.S. Intelligence Community; developing a crosslanguage<br />
English-Arabic query translation module to<br />
integrate with standard enterprise search platforms.
12<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT TEAM BIOSKETCHES<br />
Research Interests<br />
Co-Track Leader: Integrated Language Technologies:<br />
Prof. Nick Campbell<br />
Department: Centre for Language<br />
and Communication Studies (CLCS)<br />
University: Trinity College Dublin<br />
Brief Biography<br />
Prof. Nick Campbell is SFI Stokes Professor of Speech<br />
& Communication Technology at Trinity College Dublin.<br />
He received his Ph.D. degree in Experimental Psychology<br />
from the University of Sussex in the U.K., and was<br />
previously engaged at the Japanese National Institute<br />
of Information and Communications Technology, and<br />
as Chief Researcher in the Department of Acoustics<br />
and Speech Research, Advanced Telecommunications<br />
Research Institute International, Kyoto, Japan, where<br />
he also served as Research Director for the JST/CREST<br />
Expressive Speech Processing and the SCOPE “Robot’s<br />
Ears” projects. He was first invited as a Research Fellow<br />
at the IBM U.K. Scientific Centre, where he developed<br />
algorithms for speech synthesis, and later at the AT&T<br />
Bell Laboratories, where he worked on the synthesis of<br />
Japanese. He served as Senior Linguist at the Edinburgh<br />
University Centre for Speech Technology Research before<br />
joining ATR in 1990. His research interests are based on<br />
large speech databases, and include nonverbal speech<br />
processing, concatenative speech synthesis, and prosodic<br />
information modelling. He spends his spare time working<br />
with postgraduate students as Visiting Professor at the<br />
School of Information Science, Nara Institute of Science<br />
and Technology (NAIST), Nara, Japan, and was also<br />
Visiting Professor at Kobe University, Kobe, Japan for<br />
10 years.<br />
Prof. Nick Campbell’s background is in experimental<br />
psychology and linguistics, but most of his experience<br />
is in speech technology. Prof. Campbell is an advocate<br />
of corpus-based approaches and he has pioneered<br />
advanced (and paradigm-shifting) methods of speech<br />
synthesis and natural conversational speech collection<br />
in a multimodal environment. His principal interest is<br />
in speech prosody, extending this research to social<br />
interaction to show how the voice is used in discourse<br />
to express personal relations as well as propositional<br />
content. Most of his previous work has used speech<br />
materials collected in Japan and, through his move to<br />
Ireland, he can confirm the universality of his previous<br />
findings – both for Irish and for Hiberno-English.<br />
Ultimately, Prof. Campbell is working to produce a<br />
friendlier speech-based human-machine interface for<br />
web-based information, customer-services, games,<br />
and robotics, while trying to understand how humans<br />
perform such often perfect communication.<br />
Career Highlights<br />
} 2010-2015: Science Foundation Ireland Principal<br />
Investigator, FastNet Summary Focus on Actions in<br />
Social Talk; Network Enabling Technology (€1.23M)<br />
} Oct. 2011 – Present: Member, Spoken Language<br />
Technical Committee, IEEE Signal Processing Society<br />
} Feb. 2011: Vice President, European Language<br />
Resources Association<br />
} Nov. 2010 – Present: Board Member, European<br />
Language Resources Association (ELRA)<br />
} 2009 – Present: Board member, International<br />
Speech Communication Association<br />
} 2005 – Present: Board member, Japan British<br />
Association of the Kansai<br />
} Member, International Phonetic Association<br />
} Member, Coordinating Committee on Speech<br />
I/O Database Assessment<br />
} Member, International Committee of Acoustic<br />
Society of Japan<br />
} Member, International Speech Communication<br />
Association Institute of Acoustics (adherent) U.K.<br />
} Member, Acoustic Society of America<br />
} Member, Acoustic Society of Japan<br />
} Member, IEEE Signal Processing Society
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 13<br />
since 1990 and has researched different approaches<br />
to Example Based Machine Translation (EBMT) which<br />
contributed to the work now carried out by The Rosetta<br />
Foundation, using translation tools and technologies<br />
for the provision of translation and localisation services,<br />
supported by volunteer translators, project managers and<br />
engineers.<br />
Career Highlights<br />
Track Leader: Next Generation Localisation:<br />
Mr. Reinhard Schäler<br />
Department: Department of Computer<br />
Science and Information Systems<br />
University: University of Limerick<br />
Brief Biography<br />
Reinhard Schäler has been involved in the localisation<br />
industry in a variety of roles since 1987. He is the founder<br />
and editor of Localisation Focus – The International<br />
Journal of Localisation, a founding editor of the Journal<br />
of Specialised Translation (JosTrans), a former member of<br />
the editorial board of Multilingual Computing (October<br />
1997 to January 2007, covering 70 issues), a founder<br />
and CEO of The Institute of Localisation Professionals<br />
(TILP), and a member of OASIS. He has attracted more<br />
than €5.5M in research funding and has published more<br />
than 50 articles, book chapters and conference papers<br />
on language technologies and localisation. He has been<br />
an invited speaker at EU and international governmentorganised<br />
conferences in Africa, the Middle East, South<br />
America and Asia. In 2009, he founded The Rosetta<br />
Foundation, a non-profit organisation and charity aiming<br />
to make knowledge available in every language. He is<br />
a lecturer at the Department of Computer Science and<br />
Information Systems (CSIS), University of Limerick (UL),<br />
and the founder and director of the Localisation Research<br />
Centre (LRC) at UL, established in 1995.<br />
Research Interests<br />
Schäler’s main research area is the automation of<br />
localisation workflows and the application of tools<br />
and technologies to the localisation of digital content,<br />
including translation, engineering and testing. He has<br />
been researching approaches to Machine Translation<br />
(MT) and Computer Assisted Translation (CAT) systems<br />
} Establishment of the Localisation Research Centre<br />
(LRC), 1995, £250K.<br />
} Establishment of the Grad. Dip./M.Sc. in Software<br />
Localisation at University of Limerick in 1997.<br />
} EU-funded IGNITE project on Linguistic Infrastructure<br />
for Localisation: Language Data, Tools and Standards,<br />
together with four European industrial partners, total<br />
budget: €3.5M, 2005-2007.<br />
} Invited keynotes: Localisation and<br />
Internationalisation of Software for Export,<br />
Florianópolis, Brazil (November 2004);<br />
Manufacturers’ Association for Information<br />
Technology (MAIT), New Delhi, India (December<br />
2004); The First International Conference on Persian<br />
Script & Language Localisation, Supreme Council of<br />
ICT and Iran Telecom Research Centre, Tehran, Iran<br />
(May 2005); The IEEE Professional Communication<br />
Society, International Professional Communication<br />
Conference, Limerick, Ireland (July 2005); LISA<br />
Forum Cairo, The Localisation Industry Standards<br />
Association, Cairo, Egypt (December 2005);<br />
Multilingual Web, Madrid, Spain (October 2010).<br />
} Establishment of The Rosetta Foundation in the<br />
summer of 2009, a not-for-profit organisation<br />
(charity) promoting equality via language and<br />
cultural diversity through access to digital knowledge<br />
and information independent of language.<br />
} Establishment of the Dynamic Coalition for a Global<br />
Localisation Platform: Localisation4all, under the<br />
umbrella of the United Nations Internet Governance<br />
Forum (IGF) in 2009.
14<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT TEAM BIOSKETCHES<br />
Research Interests<br />
Dr. Luz’s research focuses on the theoretical bases of<br />
computer-supported collaboration, more specifically<br />
processes related to information structuring and<br />
retrieval, in scenarios encompassing multimedia data<br />
and multimodal interaction. He is also interested in<br />
natural language parsing, text classification, and dialogue<br />
systems, particularly human-factors research.<br />
Career Highlights<br />
Track Leader: Systems Framework:<br />
Dr. Saturnino Luz<br />
Department: School of Computer Science and Statistics<br />
University: Trinity College Dublin<br />
Brief Biography<br />
Dr. Saturnino Luz has worked on the development of<br />
novel technologies for human-computer interfaces in the<br />
areas of computer-supported cooperative work, spoken<br />
language systems, natural language processing, dialogue<br />
management, and design support tools for multimodal<br />
systems. He has been a Lecturer in Computer Science<br />
at Trinity College since 2001, where he supervises PhD<br />
and M.Sc. students in the areas of natural language<br />
processing, computer supported cooperative work,<br />
human-computer interaction and machine learning.<br />
Dr. Luz has participated in a number of Irish- and<br />
EU-funded research projects, working on computing<br />
support for connected communities, dialogue systems<br />
engineering, technology for medical team meetings, as<br />
well as various topics in machine learning. He has served<br />
on the programme committees of several international<br />
conferences and the editorial boards of international<br />
journals. He has been a member of the Association for<br />
Computing Machinery (ACM) since 1994 and contributes<br />
regularly to the ACM Computing Reviews.<br />
} Acted as Principal Investigator ECOMMET<br />
project on Enhanced Computing Support for<br />
Multidisciplinary Medical Team Meetings, funded<br />
by Science Foundation Ireland.<br />
} Principal Investigator of a Basic Research project<br />
on content indexing for multimedia meeting<br />
recordings, funded by Enterprise Ireland.<br />
} Review selected as a Computing Review highlight;<br />
featured as profiled reviewer in acknowledgement<br />
of his contributions to that publication (2004).<br />
} Invited talks at the University of Ulster (2002),<br />
at the German Research Centre for Artificial<br />
Intelligence (2003), at the University of South Africa<br />
(2004), at the Seminar on New Trends in Corpus<br />
Linguistics for Language Teaching and Translation<br />
Studies (Granada, Spain, 2008), and at KTH<br />
(Stockholm, Sweden, 2010).<br />
} Chaired the programme committee of the Irish<br />
Human-Computer Interaction Conference (2009)<br />
and co-chaired the Special Track on Supporting<br />
Collaboration among Healthcare Workers at the<br />
IEEE International Symposium on Computer-Based<br />
Medical Systems (2008-2010).<br />
} Served as member of the Editorial Board of<br />
Information from 2000 to 2003.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 15<br />
Research Interests<br />
Education and Outreach Manager:<br />
Ms. Cara Greene<br />
Department: School of Computing<br />
University: Dublin City University<br />
Greene has a B.Sc. in Applied Computational Linguistics<br />
from Dublin City University. She then became a learning<br />
support and resource teacher before returning to DCU<br />
to undertake a PhD in Information Communication<br />
Technology (ICT). She is currently writing up her PhD<br />
thesis part-time on integrating ICT into the secondary<br />
school curriculum. Grene’s PhD thesis investigates<br />
whether integrating ICT into the curriculum can produce<br />
inclusive curricula that cater to the needs of all students<br />
(with and without learning difficulties). Post-PhD, Cara<br />
wants to carry out research on the impact of education<br />
programmes provided by large research centres on the<br />
numbers of students taking up these subjects at third<br />
level.<br />
Brief Biography<br />
Cara Greene is Education and Outreach (E&O) Manager<br />
in the Centre for Next Generation Localisation (<strong>CNGL</strong>).<br />
The Education and Outreach Programme is split into<br />
two areas: Education and Outreach. The Education<br />
Programme aims to provide educational and training<br />
opportunities at all levels of education in key areas in the<br />
localisation industry. These range from primary school<br />
courses to localisation professional training courses. It<br />
also provides professional development and research<br />
support to <strong>CNGL</strong> students and staff as well as others<br />
in the localisation industry. The Outreach Programme<br />
encompasses developing public-facing projects, hosting<br />
conferences and industry events, and promoting <strong>CNGL</strong><br />
research in the media.<br />
Career Highlights<br />
} Nominated for the DCU President’s Award for Civic<br />
Engagement 2010.<br />
} Member of the Third Level Education and Outreach<br />
(TREO) Communications and Evaluation working<br />
groups.<br />
} Research paper selected to be presented at the<br />
Young Researchers Consortium at ICCHP 2006.<br />
} Awarded the DCU Chancellor’s Medal at Graduation<br />
2002.
<strong>CNGL</strong> Overview
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 17<br />
<strong>CNGL</strong> Overview<br />
Localisation: Global Challenges<br />
and Opportunities<br />
Localisation is the industrial process of adapting (digital)<br />
content to culture, locale and linguistic environment,<br />
at high quality, speed and low cost. It is a key enabling<br />
multiplier technology of the global manufacturing,<br />
software, services and content distribution industries,<br />
unlocking markets otherwise unavailable. Importantly,<br />
the true potential of localisation goes well beyond<br />
opening up business opportunities across the globe:<br />
many communities find themselves on the wrong side<br />
of the “digital divide” with vital information (hygiene,<br />
health, food, education etc.) not available in the local<br />
languages, with potentially disastrous consequences.<br />
Localisation technologies and processes can make a<br />
considerable contribution to bridging this divide. The<br />
<strong>CNGL</strong> partnership and research focus on both the<br />
commercial and the societal dimensions of localisation.<br />
as speech and image, as well as corporate and usergenerated<br />
content. Personalisation: while traditional<br />
localisation is coarse-grained (focusing on a geographic<br />
locale and language: e.g. the Middle East), information<br />
is most useful if adapted to the user, device, background<br />
information/knowledge and task at hand. In terms of a<br />
slogan, “the person is the ultimate locale”.<br />
The three axes Volume, Access and Personalisation<br />
define the “Localisation Cube” (Figure 1). The <strong>CNGL</strong><br />
mission (derived from its vision) is to develop processes<br />
and technologies that can address each point in the cube<br />
at configurable quality and speed.<br />
Figure 1. The Localisation Cube (and traditional<br />
Enterprise Localisation technologies)<br />
The Centre for Next Generation Localisation (<strong>CNGL</strong>,<br />
2007-<strong>2012</strong>) is an Industry-Academia partnership funded<br />
jointly by Science Foundation Ireland (SFI) and industry<br />
partners. The university partners are DCU (Dublin City<br />
University, lead institution), TCD (Trinity College Dublin),<br />
UCD (University College Dublin) and UL (University of<br />
Limerick). Industry partners include Microsoft Ireland,<br />
Symantec Ireland, Dai Nippon Printing (Japan), SDL,<br />
Translations.com (Alchemy), CAPITA (Applied Language<br />
Solutions), Welocalize, VistaTEC and SpeechStorm,<br />
assembling some of the world-leading software,<br />
publishing and localisation companies in the <strong>CNGL</strong><br />
partnership.<br />
The <strong>CNGL</strong> vision is to enable people to interact with<br />
content, products, services and each other, in their own<br />
language, culture, context and according to their own<br />
personal needs.<br />
To realise this vision, the <strong>CNGL</strong> research programme<br />
concentrates on the challenges of Volume, Access<br />
and Personalisation. Volume: the amount of content<br />
is growing dramatically and massively outstrips human<br />
translation capacity. Access: while traditional localisation<br />
is text, print and (full) screen/keyboard based, mobile<br />
devices enable ubiquitous access to information on<br />
the go, involving additional interaction modalities such<br />
Traditional enterprise localisation technologies tend to<br />
focus on large and well-managed localisation workflows,<br />
with predictable corporate content, targeting the lower,<br />
front, right-most part of the localisation cube (Figure<br />
1), with large parts of the Localisation Cube remaining<br />
unaddressed.<br />
Next Generation Localisation, by contrast, is based on a<br />
set of flexible and adaptive technologies and processes<br />
that allow us to address each point in the Localisation<br />
Cube, at configurable quality and speed. The <strong>CNGL</strong><br />
research programme concentrates on three focal points<br />
in the Cube (Figure 2): 1<br />
1 Note that volume here refers to a single localisation request: while<br />
traditional bulk or enterprise localisation projects may involve the<br />
translation of millions of words into many languages, a single customer<br />
care interaction may only involve a few hundred words and one or two<br />
languages. However, the total effect is, of course, cumulative: millions of<br />
customer care interactions will generate very large total volumes.
18<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
<strong>CNGL</strong> OVERVIEW<br />
Figure 2: <strong>CNGL</strong> Focal Points in the Localisation Cube<br />
technologies in terms of the flexible and adaptive <strong>CNGL</strong><br />
Components Framework (rather than a single monolithic<br />
one-size-fits-all system) have served us well: it has guided<br />
the <strong>CNGL</strong> research programme and has allowed us to<br />
anticipate and respond flexibly to many of the recent<br />
challenges and opportunities in the localisation space,<br />
including the:<br />
} massive increase in multilingual user-generated (UGC)<br />
content (in addition to professionally edited corporate<br />
content) from user forums and social networking sites<br />
} growing importance of UGC in localisation and<br />
community-based customer support models<br />
The Bulk Localisation Workflow (BLW) scenario targets<br />
large volume localisation tasks with and without human<br />
pre- and post-editing, familiar from large localisation<br />
projects. The focus is on both corporate and NGO<br />
content, automation (translation technologies, in<br />
particular machine translation), the optimal integration<br />
of novel social and collaborative localisation models<br />
(including crowd-sourcing), supported by open standards<br />
and a flexible, open and web-services-based localisation<br />
platform that supports a wide range of workflows<br />
(supporting standard corporate as well as novel<br />
collaborative workflows).<br />
The Personalised Multilingual Customer Care (PMCC)<br />
scenario focuses on supporting global customers<br />
interacting with on-line and perishable corporate and<br />
user-generated multilingual content (e.g. product blogs),<br />
providing for frequent content updates, multi-modal<br />
access (speech and image, in addition to the more<br />
traditional text-based modalities) and increased levels of<br />
personalisation in real time interactions, without (or with<br />
minimal) human pre- and post-processing interventions.<br />
} emergence and impact of novel social and communitybased<br />
localisation in both for-profit and not-for-profit<br />
localisation operations<br />
} increasing number of non-governmental organisations<br />
(NGOs) world-wide targeting the global “digital<br />
divide” striving to provide access to information in the<br />
local language as a basic human right<br />
} increasing number of SMEs (rather than just<br />
Multinationals) targeting global markets with<br />
localisation needs markedly different from those<br />
of the Multinationals<br />
In particular, in <strong>CNGL</strong> project Year 5 (<strong>2012</strong>) we focus on<br />
two related themes, representing the key commercial<br />
and social dimensions of <strong>CNGL</strong> research: (i) Supporting<br />
the Global Customer and (ii) Promoting the Multilingual<br />
Society.<br />
Supporting the Global Customer and Promoting<br />
the Multilingual Society<br />
The Personalised Multilingual Social Networking (PMSN)<br />
scenario focuses fully on user-generated (UGC, in<br />
contrast to corporate) and highly perishable content<br />
prevalent on social networking and messaging sites, with<br />
high levels of personalisation and full use of all access<br />
modalities, developing <strong>CNGL</strong> technologies to monitor<br />
and manage information for customer support and to link<br />
social networking activities across linguistic barriers.<br />
This conceptualisation (the Localisation Cube), the<br />
factoring of challenges and opportunities into three<br />
dimensions (Volume, Access and Personalisation) and<br />
the implementation of the Next Generation Localisation
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 19<br />
Figure 3: Organisation of the <strong>CNGL</strong> Research Programme<br />
Addressing the Challenges and Making the<br />
Most of the Opportunities: Charting the<br />
<strong>CNGL</strong> Research Map<br />
The <strong>CNGL</strong> mission is to develop flexible and adaptive<br />
next-generation localisation technologies and processes<br />
that allow us to address any point in the space defined<br />
by the Localisation Cube (Figure 1), at configurable<br />
quality and speed, realising the <strong>CNGL</strong> vision to enable<br />
people to interact with content, products, services and<br />
other people in their own language, according to their<br />
own culture, and according to their own personal needs.<br />
This mission directly determines the structure of the core<br />
<strong>CNGL</strong> research programme (Figure 3):<br />
The <strong>CNGL</strong> research programme intertwines four major<br />
research tracks (as well as a demonstrator programme):<br />
two of the tracks, Integrated Language Technologies (ILT)<br />
and Digital Content Management (DCM) are basic research<br />
tracks, and the remaining two, Next Generalisation<br />
Localisation (LOC) and Systems Framework (SF) are<br />
more applied, integrating research tracks.<br />
LOC: technological advances from ILT and DCM<br />
need to be integrated into workflows and blue-prints<br />
of Next Generation Localisation. LOC researches the<br />
life-cycle of digital content, including content design<br />
and development, standards; evaluates sophisticated<br />
language and content management technologies for<br />
integration into novel collaborative, community-driven<br />
and social localisation models; and provides technology<br />
support for such models in terms of an open modular,<br />
component and web services-based architecture, based<br />
on the SOLAS technology platform.<br />
SF: SF research focuses on underexplored software<br />
engineering aspects of complex multilingual digital<br />
content management, including requirements analysis,<br />
user interface design, the development of WebWOZ,<br />
a web-based Wizard-of-Oz technology platform, rapid<br />
prototyping systems, semantic interoperability, adaptive<br />
workflows, and web-based service architectures. SF<br />
coordinates the development of an evolution of <strong>CNGL</strong><br />
demonstrator systems.<br />
ILT: ILT research focuses on Machine Translation (MT),<br />
Speech Technology and Text Analytics to provide the<br />
support technologies for translation and interaction<br />
automation across language and modality (text and<br />
speech) barriers, based on the MaTrEX MT and MUSE<br />
Speech Technology platforms.<br />
DCM: DCM research focuses on combining Adaptive<br />
Hypermedia (AH) with Cross-Lingual and Multimodal<br />
(Text, Image and Speech) Information Retrieval (IR)<br />
technologies to find, dice and slice and recompose<br />
content to support the <strong>CNGL</strong> information access and<br />
personalisation agenda in a multilingual setting, based<br />
on the Adaptive Engine technology platform.<br />
<strong>CNGL</strong> Demonstrator Systems<br />
Demonstrator systems are a core part of <strong>CNGL</strong> research.<br />
The demonstrators provide focal points for project<br />
cohesion and collaboration, combining technologies<br />
and teams from across <strong>CNGL</strong> research tracks and<br />
academic and industry partners. The demonstrators are<br />
an essential component in overall project evaluation and<br />
contribute platforms for research and experimentation<br />
across all <strong>CNGL</strong>. They showcase <strong>CNGL</strong> technologies to<br />
the outside world and ground <strong>CNGL</strong> research outputs<br />
in commercial as well as non-profit societal applications.
20<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
<strong>CNGL</strong> OVERVIEW<br />
During <strong>CNGL</strong> project Years 1 to 3, the demonstrator<br />
systems focused on the three core use scenarios in<br />
the space defined by the Localisation Cube (Figure<br />
2): the Bulk Localisation Workflow (BLW) scenario,<br />
the Personalised Multilingual Customer Care (PMCC)<br />
scenario and the Personalised Multilingual Social<br />
Networking (PMSN) scenario. Based on this work, during<br />
project Year 4 (2011), the demonstrators showcased a<br />
broad industry story line around the “Supporting the<br />
Global Customer” theme, while for <strong>CNGL</strong> project Year 5<br />
(<strong>2012</strong>) the focus was on advancing and showcasing those<br />
demonstrators with the most commercial and industry<br />
impact and those demonstrators showing promising<br />
research directions for the future.<br />
<strong>CNGL</strong> Outreach and Technology-Transfer<br />
Activities<br />
Technology transfer is a key <strong>CNGL</strong> objective to convert<br />
research outputs into economic and social impact.<br />
<strong>CNGL</strong> carefully manages IP in close collaboration with<br />
the researchers and industry partners and fosters an<br />
entrepreneurial spirit within the <strong>CNGL</strong> researcher<br />
community.<br />
Fostering interest in science and technology, in particular<br />
information technology, in education (within and outside<br />
<strong>CNGL</strong>) and the public in general is a further key objective<br />
for <strong>CNGL</strong>. We offer a wide range of activities, including<br />
projects for first, second, third and fourth level education;<br />
professional development and communication within<br />
<strong>CNGL</strong>; and communication and dissemination in relevant<br />
professional research and industry sectors as well as the<br />
public in general.<br />
Changes and Developments in the <strong>CNGL</strong><br />
Consortium<br />
<strong>CNGL</strong> operates in a dynamic and fast-changing<br />
environment, both in our research and business sectors,<br />
in particular in the localisation space: <strong>2012</strong> saw a strongly<br />
increased focus on commercialisation of <strong>CNGL</strong> research<br />
expertise, in particular in the form of the growth and<br />
traction of <strong>CNGL</strong> spin-out and start-up companies and<br />
not-for-profit organisations:<br />
ILT technologies underpin three start-up companies:<br />
} Xcelerator Machine Translations, through its<br />
KantanMT product (www.kantanmt.com), operates<br />
in the space of cloud-based and scalable provision of<br />
personalised and adaptive MT services that are easy<br />
to configure, manage and operate<br />
} Scream Technologies (www.screamtechnologies.com)<br />
specialises in creating synthetic voices from human<br />
actors, enabling the end user to create humansounding<br />
synthetic speech and control how it sounds.<br />
Scream’s product enables enterprise customers to<br />
find a voice that represents them, and then to use<br />
that voice for all announcements, interactive voice<br />
response, telephone, or advertising without ever<br />
needing to return to a recording studio<br />
} Digital Linguistics (www.digitallinguistics.com) uses<br />
machine learning based text classification technologies<br />
for quality assurance (QA) for localisation projects<br />
DCM technologies underpin two start-up companies:<br />
} Emizar (www.emizar.com) focuses on customer care<br />
applications based on adaptive and personalised<br />
dicing, slicing and recomposing digital content<br />
} Wripl (www.wripl.com) offers Personalisation-as-a-<br />
Service across websites, improving a user’s experience<br />
as they browse across multiple different CMS systems<br />
to solve a particular task. Wripl is spinout preparation<br />
mode at present.<br />
LOC technologies underpin:<br />
The<br />
R SETTA<br />
Foundation<br />
} The Rosetta Foundation (www.therosettafoundation.<br />
org), a not-for-profit organisation that provides<br />
localisation services to NGOs and social causes based<br />
on novel, community-based localisation models (to<br />
date involving 2,600+ volunteers), supported by the<br />
<strong>CNGL</strong> SOLAS technology platform.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 21<br />
These developments (including the 2,600 volunteers<br />
engaging with The Rosetta Foundation and the €1.25M<br />
in venture capital raised by Xcelerator and Scream<br />
Technologies) clearly show the social and economic<br />
relevance of the <strong>CNGL</strong> research programme.<br />
Commercialisation activities are strongly underpinned<br />
by SFI and Enterprise Ireland funded activities<br />
(including 6 Technology Innovation Development<br />
Award grants, 5 Enterprise Ireland Feasibility Awards,<br />
3 Commercialisation Fund Awards, and 2 Innovation<br />
Partnerships) taking research all the way from the labs<br />
into the market.<br />
On the academic and research side, <strong>CNGL</strong> extends a<br />
strong welcome to Prof. Qun Liu, formerly Director of<br />
the Natural Language Processing Lab of the Chinese<br />
Academy of Sciences in Beijing, as the new Professor<br />
of Machine Translation in DCU. Prof. Liu’s expertise<br />
in cutting-edge machine translation and language<br />
technology research and his international standing in<br />
the research community make a key contribution to<br />
<strong>CNGL</strong> and substantially strengthen <strong>CNGL</strong>’s expertise in<br />
multilingual technologies.<br />
Research Highlights <strong>2012</strong><br />
Due to limits in space, unfortunately, below we can only<br />
provide a sneak preview of a few selected highlights. For<br />
full details please consult the subsequent sections in the<br />
<strong>2012</strong> <strong>CNGL</strong> <strong>Annual</strong> <strong>Report</strong>.<br />
Research Outputs <strong>2012</strong><br />
Research performance and output in <strong>2012</strong> has been<br />
strong: <strong>CNGL</strong> has again substantially outperformed its<br />
research KPI targets (Table 1) with 92 conference and<br />
26 journal, book and book chapter publications, a total<br />
of 118 against a cumulative target of 62 for the reporting<br />
period. Since 2007, <strong>CNGL</strong> has published a total of 411<br />
research publications, against a target of 291 (Table 2),<br />
outperforming overall targets by a factor of 1.5.<br />
Table 1: <strong>CNGL</strong> <strong>2012</strong> Research KPIs against Targets<br />
<strong>CNGL</strong> Research Outputs <strong>2012</strong> Actuals Targets<br />
Journal papers, book chapters<br />
and books<br />
26 12<br />
Conference publications 92 50<br />
Conferences/workshops hosted 17 8<br />
Table 2: <strong>CNGL</strong> 2007–<strong>2012</strong> Cumulative Research KPIs<br />
against Targets<br />
<strong>CNGL</strong> Research Outputs<br />
2007-<strong>2012</strong><br />
Journal papers, book chapters<br />
and books<br />
Actuals<br />
Targets<br />
63 43<br />
Conference publications 348 237<br />
Conferences/workshops hosted 58 39<br />
ILT: highlights include best paper awards (Vogel and<br />
Mamani Sánchez, <strong>2012</strong> and Emms and Franco-Penya,<br />
<strong>2012</strong>), winning the SANCL-<strong>2012</strong> Web Parsing challenge<br />
organised by Google at NAACL-HLT <strong>2012</strong> (Le Roux,<br />
Foster, Wagner, Kaljahi and Bryl, <strong>2012</strong>), strong speech<br />
technology publications with 6 journal papers, 2 book<br />
chapters and 5 conference papers at ICASSP and<br />
Interspeech <strong>2012</strong>, the strong presence of <strong>CNGL</strong> at<br />
COLING <strong>2012</strong>, Mumbai, India with a total of 15 full, short<br />
and workshop MT and Text Analytics papers, and the<br />
award to host COLING 2014 in Dublin to <strong>CNGL</strong> partner<br />
DCU with <strong>CNGL</strong> Director Prof. Josef van Genabith as<br />
General Chair. ILT researchers have worked in close<br />
cooperation with <strong>CNGL</strong> industry partners and startup<br />
companies VistaTEC and Digital Linguistics in text<br />
classification for MT quality assessment, Symantec<br />
in tuning MT to User-Generated Content, and with<br />
Xcelerator and Welocalize on integrating MT and TM<br />
technologies. ILT researchers are involved in 2 new EU<br />
FP7 MT projects (QTLaunchPad and the EXPERT Marie<br />
Curie PhD Graduate School) and lead (Dr. Antonio Toral)<br />
the Abu-MaTran FP7 Academia-Industry partnership<br />
project.
22<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
<strong>CNGL</strong> OVERVIEW<br />
DCM: highlights include the publication of over 30<br />
peer-reviewed papers in international journals (e.g. ACM<br />
CSUR, UMUAI and Journal IR) and at major international<br />
conferences (ACM Hypertext, SIGIR <strong>2012</strong>, CIKM <strong>2012</strong>,<br />
COLING 2013, AAAI <strong>2012</strong>, ACL <strong>2012</strong>, and TPDL 2013).<br />
DCM research has made significant advances in the<br />
personalisation and dynamic aggregation of usergenerated<br />
content, corporate content, and open content<br />
harvested from the open web. This has led to industry<br />
trials in the application areas of Personalised Multilingual<br />
Customer Care and Personalisation as a Service.<br />
Research has progressed on structural content analysis<br />
for web slicing, and the MOODfinger framework for<br />
affective news retrieval has been further developed. DCM<br />
researchers (Prof. Owen Conlan and Prof. Vincent Wade)<br />
won two SFI TIDA grants for research in personalisation,<br />
which have led to the planning of spinout companies<br />
Emizar and Wripl. A third TIDA grant – for work in<br />
automated slicing of content for reuse and repurposing<br />
– has been secured (Prof. Vincent Wade) and work will<br />
commence in early 2013. Trials and evaluations of DCM<br />
technology, including the Personalised Multilingual<br />
Customer Care portal and the Personalised Multilingual<br />
Information Retrieval demonstrator, were conducted in<br />
collaboration with Microsoft and Symantec. TCD (Prof.<br />
Vincent Wade) also established the Enterprise Ireland<br />
Technology Centre for Technology Enhanced Learning<br />
(Learnovate Centre) which is allied to <strong>CNGL</strong>.<br />
LOC: highlights include the continued development of<br />
a flexible, open-source, open-standards-, componentsand<br />
web-services-based platform (SOLAS) supporting<br />
standard but also innovative social, collaborative and<br />
distributed localisation workflows. SOLAS consists of two<br />
main strands: SOLAS Match and SOLAS Productivity.<br />
SOLAS Productivity makes use of a standardised data<br />
container, open web service APIs, and a common<br />
orchestration and process management module, which<br />
connect to any number of component technologies<br />
developed by academic and industrial partners within<br />
<strong>CNGL</strong> as well as with third party technologies and tools.<br />
SOLAS Match provides ground-breaking and intuitive<br />
technology that allows for the seamless and user-friendly<br />
matching of community translation tasks with volunteer<br />
translators. The close collaboration between LOC and<br />
the Rosetta Foundation makes <strong>CNGL</strong> technologies<br />
directly available to social localisation operations and,<br />
in return, tests <strong>CNGL</strong> technologies with currently 2,600+<br />
volunteers.<br />
SF: highlights include strong progress in human<br />
factor and interaction design research, substantial<br />
contributions to standardisation (ITS (W3C) and XLIFF<br />
(OASIS)) and interoperability for systems services<br />
architecture research. Doherty, Karamanis and Luz<br />
(<strong>2012</strong>) investigates the impact of work contexts on the<br />
use of MT in localisation operations. The <strong>CNGL</strong> Wizard<br />
of Oz platform has been made open source and is<br />
available online (www.webwoz.com). A Linked Open<br />
Data approach has been used for end-to-end content<br />
management and localisation integration (Lewis et al.,<br />
<strong>2012</strong>) involving SOLAS and the MaTrEx <strong>CNGL</strong> platform<br />
technologies, provenance tracking and visualisation,<br />
in close collaboration with <strong>CNGL</strong> partners Microsoft<br />
and VistaTEC. Substantial progress has been achieved<br />
in instrumenting CAT tools to capture post-editing of<br />
MT outputs as well as in the visualisation of online<br />
community analytics, closely collaborating with <strong>CNGL</strong><br />
partners Welocalize and Symantec.<br />
Commercialisation<br />
Translating research outputs into economic and social<br />
impact is a key objective for <strong>CNGL</strong>: Table 3 shows a<br />
total of 10 invention and software disclosures, 1 patent<br />
application and 1 spin-out company (against targets<br />
of 20, 4 and 2, respectively) for <strong>2012</strong>. <strong>CNGL</strong> engages<br />
strongly in spin-out and start-up companies as well<br />
as in not-for-profit social operations. The Rosetta<br />
Foundation (www.therosettafoundation.org) focuses<br />
on localisation support for NGOs (and other not-for-profit<br />
organisations) using a novel social and collaborative<br />
localisation platform. Emizar (www.emizar.com) focuses<br />
on digital content and personalisation technologies for<br />
customer support. Xcelerator Machine Translations,<br />
through its KantanMT product (www.kantanmt.com),<br />
provides Cloud-based MT technologies automatically<br />
producing highly scalable custom MT engines by<br />
uploading data resources, requiring minimal technical<br />
expertise on the part of the client. Digital Linguistics<br />
(www.digitallinguistics.com) uses stylometrics and<br />
text classification technologies developed in ILT for<br />
translation quality review. Scream Technologies (www.<br />
screamtechnologies.com) offers custom text-to-speech<br />
systems based on ILT technologies. Additionally, spinout<br />
candidate Wripl (www.wripl.com) offers personalisationas-a-service<br />
across websites, drawing on DCM research.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 23<br />
Table 3: <strong>2012</strong> IP KPIs against Targets<br />
<strong>CNGL</strong> KPIs <strong>2012</strong> Actuals Targets<br />
Patent applications 1 4<br />
Invention and software<br />
disclosures<br />
10 20<br />
Spin-outs 1 2<br />
Outreach<br />
The <strong>CNGL</strong> Education and Outreach Programme<br />
concentrates on first, second, third and fourth level<br />
education, and outreach to industry and the general<br />
public.<br />
<strong>CNGL</strong> founded the All Irish Linguistics Olympiad<br />
(AILO) in 2009 and has organised the competition since<br />
then. In <strong>2012</strong>, more than 400 secondary level students<br />
participated in the national competitions and the top<br />
four individual students represented Ireland at the<br />
International Linguistics Olympiad (ILO) <strong>2012</strong> in Slovenia.<br />
Promoting African languages in the Information<br />
Society, the University of Limerick’s MSc in Multilingual<br />
Computing and Localisation will be delivered through<br />
distance learning and co-hosted by the United Nations<br />
Economic Commission for Africa at its Information<br />
Training Centre for Africa in Addis Ababa, Ethiopia.<br />
Growing the <strong>CNGL</strong> Research Eco-System<br />
<strong>2012</strong><br />
<strong>CNGL</strong> has been highly successful in attracting<br />
competitive research funding nationally and<br />
internationally, rapidly developing a research eco-system<br />
clustering around core <strong>CNGL</strong> based on a large number<br />
of affiliated EU projects (under the FP7 programme),<br />
SFI-funded programmes, <strong>CNGL</strong> business-development<br />
activities funded through Enterprise Ireland programmes<br />
or direct contract research co-operations. Major currently<br />
active projects are listed in Table 4. These provide further<br />
evidence of the rapid development of the international<br />
research standing and recognition of <strong>CNGL</strong>, as well as<br />
of the relevance and commercialisation potential of the<br />
<strong>CNGL</strong> research programme.<br />
Planning for the Future<br />
With the end of project Year 5 in <strong>2012</strong>, <strong>CNGL</strong> has<br />
now completed its original funding cycle (2007-<strong>2012</strong>),<br />
and is completing a number of key on-going research,<br />
commercialisation and outreach projects in a non-costed<br />
extension in 2013. <strong>CNGL</strong> has been a resounding success<br />
generating (to date) more than 400 peer-reviewed<br />
publications, 21 PhD theses, 39 invention and software<br />
disclosures, 9 patent applications, 4 commercial spin-out<br />
and start-up companies, 1 not-for-profit spin-out, strong<br />
industry-academia partnerships and a total of €15.8m<br />
of additional competitive research, development and<br />
commercialisation funding growing the <strong>CNGL</strong> Research<br />
Eco-System.<br />
At the same time, <strong>CNGL</strong> has been successful in winning<br />
further substantial competitive funding from Science<br />
Foundation Ireland for initially 30 months to continue<br />
<strong>CNGL</strong> into the future with a core grant of €10.5M<br />
(<strong>CNGL</strong>II: March 2013 – September 2016). “<strong>CNGL</strong>II” is<br />
based on an evolution of <strong>CNGL</strong>, expanding its remit<br />
from localisation to a broader focus on Digital Content<br />
Management in a Global Intelligent Content setting<br />
based on the concept of a Global Content Value<br />
Chain, where services interact with content to make<br />
it self-describing, self-aware and self-adapting across<br />
language barriers, modalities and interaction platforms,<br />
tuned to context and user. The <strong>CNGL</strong>II application was<br />
successfully led by Prof. Vincent Wade (TCD), who will<br />
take over as Director of <strong>CNGL</strong> on 1 March 2013.
24<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
<strong>CNGL</strong> OVERVIEW<br />
Table 4: <strong>CNGL</strong> Research Eco-System: Income Received from Active Affiliated Research Projects <strong>2012</strong><br />
Project Funding Body €<br />
contribution<br />
(to <strong>CNGL</strong><br />
partner)<br />
QT Launch Pad EC – FP7 477,960<br />
LT-Web – Language Technology in the Web EC – FP7 396,391<br />
CENDARI (Collaborative European Digital Archive<br />
Infrastructure)<br />
EXPERT (EXPloiting Empirical appRoaches to<br />
Translation)<br />
Abu-MaTran (Automatic building of Machine<br />
Translation)<br />
IRCSET Data Mining for Industrial Apps –<br />
PhD Sponsorship<br />
EC – FP7 120,000<br />
EC – FP7 – Marie Currie 481,000<br />
EC – FP7 – Marie Currie 365,966<br />
Phorest 24,000<br />
IRCSET Data Mining for Industrial Apps –<br />
PhD Sponsorship<br />
Irish Research Council for Science,<br />
Engineering and Technology (IRCSET)<br />
48,000<br />
Learning Technology Centre Enterprise Ireland (EI) 3,000,000<br />
EI Commercialisation with Xcelerator Machine<br />
Translations<br />
EI Feasibility Grant – Adaptive Solutions for Patent<br />
Translation<br />
Enterprise Ireland (EI) 152,000<br />
Enterprise Ireland (EI) 15,000<br />
EI Innovation Voucher with Cipherion Translations Enterprise Ireland (EI) 5,000<br />
EI Innovation Voucher with IntelImpact Enterprise Ireland (EI) 5,000<br />
EI Feasibility Study Critical Data Auditor Feasibility<br />
Study<br />
Enterprise Ireland (EI) 8,827<br />
EI Innovation Voucher with FFiG Enterprise Ireland (EI) 5,000<br />
EI Feasibility Grant – Wripl Enterprise Ireland (EI) 15,000<br />
EI Innovation Partnership Programme with Pixalert –<br />
Crital Data Auditor<br />
Enterprise Ireland (EI) 40,400<br />
EI Commercialisation Fund Ata-Bot Enterprise Ireland (EI) 244,381<br />
PoliMon4Cloud Technology Innovation Development Award (TIDA) 76,384<br />
Integrated Software Suite to provide Next Generation<br />
Personalised Multilingual Customer Care<br />
Technology Innovation Development Award (TIDA) 67,748<br />
MT & TM Integration Technology Innovation Development Award (TIDA) 86,427<br />
UNITE (Personalised Cross-site Personalisation) Technology Innovation Development Award (TIDA) 60,000<br />
Iterative Retraining of Machine Translation with<br />
Post-edits to Increase Post-Editing Productivity in<br />
Localisation Workflows<br />
Linguabox: Automated Open Content Repurposing<br />
Service to support Personalized eLearning<br />
iOmegaT – An Instrumented Replayable<br />
Computer-Aided-Translation Tool<br />
Technology Innovation Development Award (TIDA) 99,218<br />
Technology Innovation Development Award (TIDA) 87,768<br />
Technology Innovation Development Award (TIDA) 92,273<br />
5,973,743
Integrated Language<br />
Technologies
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 27<br />
Strand Name: Integrated Language Technologies<br />
AREA CO-ORDINATORS:<br />
PROF. JOSEF VAN GENABITH, DUBLIN CITY UNIVERSITY<br />
PROF. NICK CAMPBELL, TRINITY COLLEGE DUBLIN<br />
Participant Names and Affiliation<br />
Industrial Collaborators<br />
International Collaborators<br />
Mr. Takeshi Fukunaga<br />
Dai Nippon Printing<br />
Prof. Walter Daelemans<br />
Antwerp, Belgium<br />
Mr. Tom Gray<br />
SpeechStorm<br />
Prof. Mikel Forcada<br />
Alicante, Spain<br />
Mr. John Dixon<br />
Dr. Fred Hollowood<br />
Mr. Paul McManus<br />
Mr. Enda McDonnell<br />
Applied Language<br />
Solutions<br />
Symantec<br />
SDL<br />
Alchemy Software<br />
Development<br />
Prof. Bernd Möbius<br />
Prof. Khalil Sima’an<br />
Prof. Eiichiro Sumita<br />
Prof. Antal van den Bosch<br />
Prof. François Yvon<br />
Stuttgart, Germany<br />
Amsterdam, Netherlands<br />
ATR, Japan<br />
Tilburg, Netherlands<br />
Paris, France<br />
Mr. Phil Ritchie<br />
VistaTEC<br />
Dr. Johann Roturier<br />
Symantec<br />
Mr. Dag Schmidtke<br />
Microsoft<br />
Faculty<br />
Prof. Nick Campbell Trinity College Dublin ILT Co-Leader, ILT2 Leader<br />
Dr. Peter Cahill University College Dublin ILT2 Co-Leader<br />
Prof. Julie Carson-Berndsen University College Dublin ILT2 Co-Leader<br />
Dr. Martin Emms Trinity College Dublin ILT3<br />
Dr. Christer Gobl Trinity College Dublin ILT2<br />
Prof. Qun Liu Dublin City University ILT1<br />
Dr. Dorothy Kenny Dublin City University ILT1<br />
Dr. Saturnino Luz Trinity College Dublin ILT3<br />
Prof. Ailbhe Ní Chasáide Trinity College Dublin ILT2<br />
Dr. Sharon O’Brien Dublin City University ILT1<br />
Prof. Josef van Genabith Dublin City University ILT Co-Leader, ILT1 Leader, ILT3<br />
Dr. Carl Vogel Trinity College Dublin ILT3 Leader<br />
Research Integration Officer<br />
Dr. Declan Groves<br />
Dublin City University
28<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Postdoctoral Researchers<br />
Dr. Ergun Biçici Dublin City University ILT1<br />
Dr. Joao Cabral University College Dublin ILT2<br />
Dr. Yvette Graham Dublin City University Affiliated<br />
Dr. Ingmar Steiner University College Dublin ILT2<br />
Dr. Erwan Moreau Trinity College Dublin ILT3<br />
Dr. Sara Morrissey Dublin City University ILT1<br />
Dr. Sudip Kumar Naskar Dublin City University ILT1<br />
Dr. Irena Yanushevskaya Trinity College Dublin ILT2<br />
Dr. Xiaofeng Wu Dublin City University ILT1<br />
Dr. Junhui Li Dublin City University ILT1<br />
PhD Students<br />
Mr. Mohamed Abou-Zleikha University College Dublin ILT2<br />
Mr. Zeeshan Ahmed University College Dublin ILT2<br />
Ms. Hala Al-Maghout Dublin City University ILT1<br />
Mr. Pratyush Banerjee Dublin City University ILT1<br />
Ms. Hanna Béchara Dublin City University ILT1<br />
Mr. Sandipan Dandapat Dublin City University ILT1<br />
Mr. Stephen Doherty Dublin City University ILT1<br />
Ms. Amelie Dorn Trinity College Dublin ILT2<br />
Mr. Hector Hugo Franco Penya Trinity College Dublin ILT3<br />
Mr. John Kane Trinity College Dublin ILT2<br />
Mr. Mark Kane University College Dublin ILT2<br />
Mr. Gerard Lynch Trinity College Dublin ILT3<br />
Mr. Alfredo Maldonado Guerra Trinity College Dublin ILT3<br />
Ms. Liliana Mamani Sanchez Trinity College Dublin ILT3<br />
Ms. Neasa Ní Chiaráin Trinity College Dublin ILT2<br />
Mr. Udochukwu Kalu Ogbureke University College Dublin ILT2<br />
Ms. Maria O’Reilly Trinity College Dublin ILT2<br />
Mr. Ankit Srivastava Dublin City University ILT1<br />
Ms. Eva Szekely University College Dublin ILT2<br />
Mr. Christoph Wendler Trinity College Dublin ILT2<br />
Ms. Amalia Zahra University College Dublin ILT2
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 29<br />
Funding<br />
<strong>2012</strong> Funding from SFI<br />
<strong>CNGL</strong> (07/CE/I1142): €1,064,77<br />
SFI TIDA Award Iterative Retraining of Machine<br />
Translation with Post-edits to increase Post-Editing<br />
Productivity in Localisation Workflows €99,218<br />
SFI TIDA Award MT & TM Integration €86,427<br />
<strong>2012</strong> Funding from Other Sources<br />
van Genabith EU FP7 QT Launch Pad: €477,960<br />
van Genabith: EU FP7 LT Web: €87,290<br />
van Genabith: EU FP7 EXPERT Marie Curie PhD Training<br />
€481,000<br />
Toral: Abu-MaTran EU FP7 PEOPLE €365,966
30<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Research Overview: Integrated<br />
Language Technologies (ILT)<br />
Goals<br />
Human languages are a core medium for representing,<br />
storing and sharing knowledge and information. The<br />
objective of the ILT track is to perform basic and applied<br />
research in language technologies (LTs) supporting<br />
content processing and management across languages<br />
and modalities (text and speech). ILT1 focuses on<br />
advancing machine translation (MT), ILT2 on speech<br />
input and output as well as speech translation, and<br />
ILT3 on text classification and annotation. The three<br />
groups work closely together on integrated technologies<br />
providing core <strong>CNGL</strong> language-based services.<br />
Research Barriers and Methodologies<br />
to Address Them<br />
ILT1: Machine Translation<br />
Statistical Machine Translation (SMT), in particular<br />
Phrase-Based SMT (PB-SMT such as the Moses<br />
platform), has been a game-changer in both research<br />
and commercial applications of MT. At the same time<br />
SMT is reaching a performance plateau, with disruptive<br />
improvements in translation quality requiring massive<br />
increases in training data. Traditional PB-SMT uses<br />
string-based information. Substantial improvements<br />
are expected through the use of richer (linguistically or<br />
distributionally motivated) signals, including syntactic<br />
and semantic information, in machine learning-based<br />
approaches to MT. Mining and translating noisy usergenerated<br />
content (UGC) is becoming increasingly<br />
important in global business intelligence and customer<br />
support operations. However, UGC is highly challenging<br />
for MT trained on “clean” professionally-edited data. MT<br />
is applied to increasing numbers of domains and text<br />
types. Novel domain adaptation techniques are required<br />
to ensure optimal MT output quality. Improvement in<br />
MT components (such as alignment) can improve overall<br />
MT performance. Most system combination and hybrid<br />
MT approaches can profit from better machine learning<br />
technologies. Technologies need to be developed to<br />
support fully language-independent quality estimation/<br />
prediction (without access to a reference translation)<br />
that treats the MT system as a black box. Finally, optimal<br />
integration of translation technologies requires full<br />
consideration of the human in the loop.<br />
Almaghout et al. (<strong>2012</strong>a, b) show how linguisticallymotivated<br />
sophisticated syntactic information enriching<br />
synchronous context free grammars (SCFGs) can improve<br />
state-of-the-art hierarchical phrase-based SMT (HPB-<br />
SMT) systems. Graham and van Genabith (<strong>2012</strong>) present<br />
a statistical, deep syntax, LFG-based decoder and MT<br />
system. Banerjee et al. (<strong>2012</strong>a) develop a translationquality<br />
driven supplementary training data selection<br />
model for tuning MT to user-generated content. Banerjee<br />
et al. (<strong>2012</strong>b) compare normalisation and supplementary<br />
training data based approaches to MT of UGC. Pecina<br />
et al. (<strong>2012</strong>) present approaches to adapting log-linear<br />
weight vectors to achieve optimal translation for different<br />
domains given a generic training set without retraining.<br />
Tu et al. (<strong>2012</strong>) show how compact representations of<br />
alignment alternatives can improve MT. Dandapat et al.<br />
(<strong>2012</strong>) develop an efficient system combination approach<br />
integrating EBMT, SMT, TM and IR-based technologies.<br />
The Second Workshop and Shared Task on Applying<br />
Machine Learning Techniques to Optimise the Division<br />
of Labour in Hybrid MT (ML4HMT-12) was co-organised<br />
by <strong>CNGL</strong> (van Genabith, Badia, Federmann, Melero,<br />
Costa-jussà and Okita, <strong>2012</strong>) and <strong>CNGL</strong> research teams<br />
contributed four submissions (Wu et al., <strong>2012</strong>; Okital et<br />
al., <strong>2012</strong>a; Okita, et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared<br />
task. Bicici et al. (2013 accepted for publication) show<br />
how quality prediction can be performed using language<br />
independent features treating MT systems as a black box.<br />
Doherty et al. (<strong>2012</strong>), Doherty and O’Brien (<strong>2012</strong>) and<br />
Doherty and Moorkens (2013) investigate human factors<br />
in translation technology integration using eye-tracking<br />
experiments as well as studies on SMT integration into<br />
translation professional training syllabi.<br />
ILT2: Speech and Machine Translation<br />
The analysis of voice characteristics¸ synthesis of<br />
expressive voices, linking speech with other modalities<br />
(such as facial expressions) and speech-to-speech<br />
translation are some of the core challenges in speech<br />
research.<br />
Kane et al. (<strong>2012</strong>) develop algorithms for automatically<br />
detecting creaky voice and facilitating its inclusion<br />
in speech synthesis. An Invention Disclosure for a<br />
new method for tracking changes in the voice with<br />
applications in speaker identity tracking and emotion<br />
detection has been filed. Székely et al. (<strong>2012</strong>) detects<br />
voice styles in audiobooks and builds synthetic voices for<br />
those voice styles. Abou-Zleikha et al. (<strong>2012</strong>) presents
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 31<br />
novel work on pitch and duration modelling. Cabral<br />
et al. (<strong>2012</strong>) improve modelling of vocal cord vibration<br />
for better voice quality in speech synthesis. Székely et<br />
al. (<strong>2012</strong>) link synthetic speech voice style and facial<br />
expression. Ahmed et al. (<strong>2012</strong>) develop state-of-theart<br />
phone-based hierarchical phrase-based machine<br />
translation (HPB-SMT) models.<br />
ILT3: Analytics<br />
Language data provide “unstructured” representations of<br />
information. Language technologies (LTs) are required to<br />
automatically extract structure from language data and to<br />
record this structure in the form of mark-up, annotation,<br />
metadata and explicit representations of information<br />
content, across languages and domains. In order to<br />
address these challenges, ILT3 develops sophisticated<br />
classification-based language technologies using a<br />
wide variety of features and approaches in document,<br />
sentence and sub-sentential classification problems in<br />
syntax, semantics and pragmatics, often with a focus on<br />
supporting MT. In addition, ILT3 has a strong focus on<br />
domain adaptation, concentrating in particular on usergenerated<br />
content.<br />
of Dubai (Attia et al. (<strong>2012</strong>a), Attia et al. (<strong>2012</strong>b)<br />
show how a combination of finite state and machine<br />
learning-based technologies can be used to produce<br />
wider coverage lexical resources for Modern Standard<br />
Arabic using the Arabic Giga-Word corpus data, as well<br />
as how spell checking for Arabic can be improved. In<br />
collaboration with the Chinese Academy of Sciences and<br />
New York University, the DCU ILT3 team investigates the<br />
granularity of syntactic information required to improve<br />
sentiment analysis (Tu et al., <strong>2012</strong>).<br />
Text classification developed by Dr. Carl Vogel’s team<br />
has produced two Invention Disclosures and a Patent<br />
Application (application no. 11169673.8-1527) with<br />
the European Patent Office, as well as a commercial<br />
licence for Digital Linguistics, a <strong>CNGL</strong> start-up company.<br />
Moreau and Vogel (<strong>2012</strong>) compare supervised and semisupervised<br />
approaches to MT quality estimation. Lynch,<br />
Moreau and Vogel (<strong>2012</strong>) develop accurate classifiers<br />
to decide whether something is a translation or not. If<br />
it is a translation, Lynch and Vogel (<strong>2012</strong>) predict the<br />
source language. Emms (<strong>2012</strong>), Emms and Franco Penya<br />
(<strong>2012</strong>a, b) explore stochastic tree distance similarity<br />
measures and employ it for semantic role labelling<br />
(Emms and Franco Penya, <strong>2012</strong>c). Maldonado-Guerra<br />
and Emms (<strong>2012</strong>) develop methods to investigate the<br />
complex translation behaviour of multi-word expressions.<br />
Vogel and Mamani Sanchez (<strong>2012</strong>) predict the complex<br />
interplay between emoticons and hedges as social signals<br />
in user fora. The DCU-Paris 13 parsing team won the<br />
Web-Parsing Challenge and Shared Task organised by<br />
Google as part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le<br />
Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />
and domain adaptation techniques. In a collaboration<br />
between DCU, Heinrich Heine University in Düsseldorf,<br />
Charles University Prague and the British University<br />
Hector-Hugo Franco-Penya, Dr. Alexandru Ceausu and Dr. Antonio Toral<br />
were among the many participants in the Hadoop Hackathon run by<br />
<strong>CNGL</strong> in March<br />
Year 5 Progress<br />
The final year of the initial funding cycle of <strong>CNGL</strong><br />
(2007–<strong>2012</strong>) has been dominated by strong research<br />
and publication outputs, writing-up of PhD theses<br />
(leading to six successful PhD completions), increased<br />
commercialisation activities translating research outputs<br />
into IP (Invention Disclosures, Patent Applications and<br />
Licences) and considerable time and effort spent on the<br />
<strong>CNGL</strong> final review and <strong>CNGL</strong>II application preparations.<br />
Despite the loss of some members of the research<br />
team who have taken up new positions in industry and<br />
academia, all research tracks in ILT continue to run ahead<br />
of schedule in close collaboration with <strong>CNGL</strong> industry<br />
partners and increased engagements with additional<br />
commercial entities.
32<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Progress in ILT1: Machine Translation<br />
The ILT1 group continues to have an impressive<br />
publication record. Conference papers have been<br />
accepted at a number of world-renowned conferences<br />
including the Association for Computational Linguistics<br />
(ACL-<strong>2012</strong>, Jeju, Korea), the International Conference<br />
for Computational Linguistics (COLING-<strong>2012</strong>, Mumbai,<br />
India), the European Association for Machine Translation<br />
(EAMT-<strong>2012</strong>, Trento, Italy) as well as the Machine<br />
Translation Summit (AMTA-<strong>2012</strong>, San Diego, CA) and<br />
the Workshop on Statistical Machine Translation (WMT-<br />
<strong>2012</strong>, Montreal, Canada). COLING-<strong>2012</strong> was particularly<br />
successful with a total of 9 MT papers at the main<br />
conference and COLING workshops.<br />
Data-driven statistical MT technologies are able to<br />
provide translations suitable for use in commercial<br />
settings, as evidenced by the dramatic increase in<br />
adoption and provision of MT services in the localisation<br />
industry. The question is no longer whether or not to<br />
use MT technologies, but how best to integrate MT into<br />
localisation and content management workflows. At<br />
the same time, statistical MT, in particular phrase-based<br />
statistical MT (PB-SMT), is reaching a performance<br />
plateau, with disruptive improvements in translation<br />
quality requiring massive increases in training data.<br />
Traditional PB-SMT uses string-based information.<br />
Substantial improvements are expected through<br />
the use of richer (linguistically or distributionally<br />
motivated) signals, including syntactic and semantic<br />
information, in machine learning-based approaches to<br />
MT. ILT1 has made key contributions to this research<br />
challenge, evidenced by <strong>CNGL</strong> publications at EAMT-<br />
<strong>2012</strong>, ACL-<strong>2012</strong> and WMT-<strong>2012</strong> from the DCU MT<br />
group. Almaghout et al. (<strong>2012</strong>) and Li et al. (<strong>2012</strong>a,<br />
b) show how linguistically-motivated sophisticated<br />
syntactic information enriching synchronous context<br />
free grammars (SCFGs) can improve state-of-the-art<br />
hierarchical phrase-based SMT (HPB-SMT) systems.<br />
Graham and van Genabith (<strong>2012</strong>) develop a deep syntax<br />
(Lexical-Functional Grammar)-based statistical MT<br />
system.<br />
With increasing volumes of content being generated<br />
by users (rather than professional writers), the need for<br />
mining and making this content (user fora, blogs, tweets)<br />
available across multiple languages has significantly<br />
increased. Coping with potentially noisy user-generated<br />
content (UGC) presents a major challenge for MT and<br />
novel training data selection models are crucial for tuning<br />
MT models to UGC. Working in close cooperation with<br />
<strong>CNGL</strong> industry partner Symantec, a key DCU MT group<br />
publication at COLING-<strong>2012</strong> (Banerjee et al., <strong>2012</strong>a)<br />
presents a translation-quality driven supplementary<br />
training data selection model for tuning MT to UGC,<br />
while Banerjee et al. (<strong>2012</strong>b) investigate the question<br />
whether text normalisation techniques are more<br />
productive in automatic translation of UGC compared<br />
to adding suitable supplementary training data. Tuning<br />
MT to diverse text types and content domains is a crucial<br />
factor in ensuring optimal quality. In many real world<br />
application scenarios, however, a complete retraining of<br />
the MT system on domain specific training material is not<br />
an option: it may either be too costly or suitable training<br />
material is simply not available. A joint DCU MT group<br />
and Charles University Prague COLING-<strong>2012</strong> publication<br />
(Pecina et al., <strong>2012</strong>) presents approaches to adapting loglinear<br />
weight vectors to achieve improved translation for<br />
different domains given a generic training set without the<br />
need for full retraining.<br />
System combination and hybrid MT can improve MT<br />
quality: in partnership with DFKI (The German Research<br />
Center for Artificial Intelligence) and Barcelona Media<br />
(BM), the DCU <strong>CNGL</strong> MT group organised the Second<br />
Workshop and Shared Task on Applying Machine<br />
Learning Techniques to Optimise the Division of<br />
Labour in Hybrid MT (ML4HMT-12) in Mumbai, India,<br />
as a COLING-<strong>2012</strong> workshop (van Genabith, Badia,<br />
Federmann, Melero, Costa-jussà and Okita, <strong>2012</strong>).<br />
The DCU <strong>CNGL</strong> MT research teams contributed four<br />
submissions (Wu et al., 202; Okita et al, <strong>2012</strong>a; Okita<br />
et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared task. System<br />
combination is usually most effective when the MT<br />
systems involved are quite diverse. Dandapat et al.<br />
(<strong>2012</strong>) develop an efficient system combination approach<br />
integrating EBMT, SMT, TM and IR-based technologies.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 33<br />
Improvements in components of statistical MT systems<br />
can lead to better translation outputs. In a collaboration<br />
with the Chinese Academy of Sciences, Tsinghua<br />
University and New York University, Tu et al. (<strong>2012</strong>) show<br />
how compact representations of alignment alternatives<br />
(rather than using a single alignment) can improve MT.<br />
MT and other translation technologies can only deliver<br />
if full consideration is given to the human in the loop:<br />
Doherty et al. (<strong>2012</strong>) develop and validate a syllabus to<br />
teach translators SMT and related skills. Doherty and<br />
O’Brien (<strong>2012</strong>) examine the usability of MT using eye<br />
tracking and find quality for some target languages to<br />
be as good as the source, but detrimental to the user<br />
experience in others. Doherty and Moorkens (2013)<br />
present an evaluation of teaching translation technology<br />
to translators and identify several hurdles and solutions.<br />
Moorkens et al. (2013) use SMT output to remove<br />
consistencies in TM data and demonstrate resulting<br />
improvements in both TM and SMT quality.<br />
MT quality estimation is the task of predicting the<br />
quality of MT output without access to a reference<br />
translation. Ideally this can be done without access to the<br />
internals of the MT system involved and in a language<br />
independent way, i.e. without relying on languagespecific<br />
resources that may require costly supervised<br />
training. Bicici et al. (2013 accepted for publication) show<br />
how quality prediction can be performed using language<br />
independent features treating MT systems as a black<br />
box. Parts of this research have been submitted as an<br />
Invention Disclosure.<br />
Dr. Sharon O’Brien of DCU and <strong>CNGL</strong> alumnus Dr. Sergio Penkale<br />
of CAPITA pictured at the AMTA-<strong>2012</strong> Workshop on Post-editing<br />
Technology and Practice (WPTP) in San Diego, USA<br />
Progress in ILT2<br />
Although the focus of the PhD students has mainly been<br />
on thesis write-up, the ILT2 Speech Technology research<br />
groups at UCD and TCD have made significant progress<br />
in Year 5. Building on research conducted in previous<br />
years, there was significant further development of<br />
methodologies for analysis of voice characteristics<br />
and for text-to-speech synthesis of expressive voices.<br />
The ILT2 group at TCD has developed algorithms for<br />
automatically detecting creaky voice and provided<br />
mechanisms to facilitate its inclusion in speech synthesis<br />
(Kane et al., <strong>2012</strong>). The progress on this topic is reflected<br />
in two publications at the <strong>2012</strong> Interspeech conference<br />
and in one journal article. John Kane (TCD) has also filed<br />
an Invention Disclosure for a new method for tracking<br />
changes in the voice, which may be deployed in a wide<br />
range of applications from improved speech synthesis to<br />
speaker identity tracking and even emotion detection.<br />
Significant developments on synthesis of expressive<br />
voices were made by researchers at the Speech<br />
Technology Group at UCD. One of the major<br />
contributions looks at exploring the variability in voice<br />
qualities in audiobook corpora by detecting voice styles<br />
in this type of corpora and building synthetic voices for<br />
those voice styles (Székely et al., <strong>2012</strong>). Work on pitch<br />
and duration modelling using novel techniques based<br />
on exemplar-based generation also contributed to the<br />
improvement of the prosodic aspect and expressiveness<br />
of the synthetic speech (Abou-Zleikha et al., <strong>2012</strong>).<br />
Research on modelling other aspects of the voice source<br />
than pitch, using the LF-model to represent the signal<br />
produced by vibration of the vocal cords in human<br />
speech production, has also been further investigated<br />
to permit better control of voice quality in speech<br />
synthesis (Cabral et al., <strong>2012</strong>). One of the outcomes<br />
of the research on expressive speech synthesis is the<br />
WinkTalk system developed at UCD as part of the <strong>CNGL</strong><br />
Demonstrator Programme. This system is a multimodal<br />
speech synthesis platform which links facial expression to<br />
expressive voices (Székely et al., <strong>2012</strong>). It allows the user<br />
to control the voice style of the synthetic speech by facial<br />
expression, with the help of a web camera and tools for<br />
facial expression analysis. Another interesting application<br />
of expressive speech synthesis developed at UCD is its<br />
integration into speech-to-speech translation (Székely<br />
et al., <strong>2012</strong>). The resulting prototype system, FEAST
34<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
(Facial Expression-based Affective Speech) classifies the<br />
emotional state of the user and uses it to render the<br />
translated output in an appropriate voice style.<br />
The successful collaboration between researchers<br />
from UCD (ILT2) and DCU (ILT3) in previous years on<br />
the integration of speech recognition with machine<br />
translation, continued this year with work on phonebased<br />
hierarchical phrase-based machine translation<br />
which results in better performance than conventional<br />
speech translation approaches (Ahmed et al., <strong>2012</strong>).<br />
Progress in ILT3<br />
ILT3 continues to provide mark-up, annotation,<br />
metadata, and knowledge through automatic linguistic<br />
analysis for the discovery, transformation and delivery<br />
of unstructured information across languages. ILT3 has<br />
maintained a strong focus on user-generated and possibly<br />
“noisy” content as found on blogs, forums, tweets and<br />
generally on social media, and continues to expand<br />
close collaboration with industry partners, concentrating<br />
on customer care, event detection and sentiment<br />
tracking scenarios. This strand of research focuses on<br />
text classification and annotation and holds that texts<br />
have infinitely many uses, whereby each sort of use<br />
elicits classification decisions. There is no single form of<br />
annotation that has maximal useful impact across actual<br />
or potential uses. It is frequently useful to make the<br />
annotations within the domain of syntax, in semantics,<br />
and with respect to the pragmatic function; however, it<br />
is not to be expected that each application which has a<br />
need for syntactic labels, for example, will benefit from<br />
the same class of labels or level of detail within a class<br />
(sometimes LFG c-structure with f-structure annotation<br />
is necessary; sometimes part-of-speech tagging of lexical<br />
stems alone is necessary). ILT3 research has addressed<br />
document, sentence and sub-sentential classification<br />
problems in syntax, semantics and pragmatics.<br />
Text classification is a core technology in <strong>CNGL</strong>, the<br />
subject of basic research in extending classification<br />
methods, and applied in various contexts – used in<br />
domain tuning and translation quality assessment (inter<br />
alia). The ILT3 team performs text classification from<br />
the perspective of linguistic theory, testing theories of<br />
language use in conjunction with other strands of ILT<br />
and <strong>CNGL</strong>. Tools developed by Dr. Carl Vogel’s team at<br />
TCD for <strong>CNGL</strong> external purposes and deployed within<br />
our demonstrator activities have formed the basis of two<br />
Invention Disclosures, one collaborative with Mr. Phil<br />
Ritchie of VistaTEC and Dr. David Lewis (<strong>CNGL</strong> SF2).<br />
The IP disclosures have culminated in both a Patent<br />
Application with the European Patent Office (application<br />
no. 11169673.8-1527) “Data processing system and<br />
method for assessing quality of a translation” and a<br />
Commercial Licence of this intellectual property to<br />
Digital Linguistics. This work has been developed further,<br />
first of all by comparing supervised and less-supervised<br />
methods of classification in general for the task of quality<br />
estimation (Moreau and Vogel, <strong>2012</strong>; Moreau and Vogel,<br />
under review) towards identification of parameters<br />
that lead to method preference. Secondly, we have<br />
successfully deployed exactly this method in selecting<br />
items for training MT engines on the basis of similarity<br />
between potential training items and the intended<br />
material for translation. This work is collaborative<br />
with the DCU MT team, and is in the process of being<br />
written for formal peer review. Thirdly, we have studied<br />
base-lines in automated processing of texts produced<br />
by language learners for the identification of particular<br />
error types, such as correct preposition use (Lynch et al.,<br />
<strong>2012</strong>). Finally, we have used automatically discoverable<br />
features in texts to analyse potential translations, with<br />
approximately 80% accuracy in not just the binary<br />
classification problem of deciding whether a text is a<br />
translation or originally written in English, but further,<br />
deciding among potential source languages where the<br />
text is translated (Lynch and Vogel, <strong>2012</strong>). In this case,<br />
the texts were not learner texts but professional literary<br />
translations.<br />
Additional basic advances in text classification methods<br />
have been explored in relation to structural analyses of<br />
sentences comprising texts, and follow-on computation<br />
in relation to the trees that model structural analysis.<br />
Emms (<strong>2012</strong>) explored stochastic tree distances and<br />
their training with expectation-maximisation. Emms and<br />
Franco Penya (<strong>2012</strong>a, <strong>2012</strong>b) establish empirical and<br />
analytical differences between tree-difference metrics<br />
established in the literature for distance and similarity.<br />
Emms and Franco Penya (<strong>2012</strong>c) demonstrate how<br />
mappings between trees can be used for the purposes<br />
of identifying the fillers of semantic roles of predicates.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 35<br />
Multi-word expressions (MWEs) are challenging in<br />
text analytics and automatic translation. In the case of<br />
non-compositional MWEs, the meaning of the MWE<br />
is not a conjunction of the meanings of its constituent<br />
parts. This has a strong impact on translation, for<br />
example. <strong>CNGL</strong> research (Maldonado-Guerra, 2011)<br />
on automatically assessing the compositionality of<br />
meaning in fixed-word expressions – collocations) has<br />
been productive: the system exploits the intuition that<br />
a highly compositional collocation would tend to have<br />
a considerable semantic overlap with its constituents,<br />
whereas a collocation with low compositionality would<br />
share little semantic content with its constituents. This<br />
intuition is operationalised via three configurations that<br />
exploit cosine similarity measures to detect the semantic<br />
overlap between the collocation and its constituents.<br />
The system performed competitively in that task. There<br />
are first-order and second-order approaches to vector<br />
encodings of word meanings. Maldonado-Guerra and<br />
Emms (<strong>2012</strong>) consider these systematically, introducing<br />
a matrix multiplication perspective on the 2nd-order<br />
construction, and exploring both the geometry induced<br />
and the performance on supervised and unsupervised<br />
word sense disambiguation/discrimination tasks. In part<br />
led by the matrix multiplication perspective, work has<br />
been carried out on a variety of matrix consolidation<br />
techniques or dimensionality reduction techniques.<br />
On-going work in ILT3 includes, for example, assessing<br />
whether information about linguistic hedges can be<br />
constructively used as a feature that predicts whether<br />
postings in online fora provided by industry partners are<br />
from individuals who ultimately will be rated as forum<br />
leaders. This is a natural development of our success in<br />
this area using Combinatory Categorial Grammar (CCG)<br />
representations of syntactic structures in combination<br />
with n-grams of sub-lexical (orthography and<br />
morphology) features, as well as sentence-level linguistic<br />
features. This work has been successful (Mamani<br />
Sanchez and Vogel, 2013; Vogel and Mamani Sanchez,<br />
<strong>2012</strong>): firstly, we have noted that emoticon use is a kind<br />
of social signal, and significant positive correlations exist<br />
between the use of positive emoticons and propensity for<br />
posts to be rated as useful (and ultimately the withinforum<br />
rank of posters) and the use of negative emoticons<br />
and un-ranked posters (presumably, individuals posting<br />
queries to expert users); secondly, we have noted<br />
interacting effects of the use of linguistic hedges such as<br />
epistemic qualifiers (technical forum users who rate posts<br />
appear to prefer hedged responses).<br />
Parsing web data is challenging due to the scale and<br />
variety of data. To ascertain the current state-of-the-art<br />
with respect to domain adaptation, Google organised a<br />
shared task at the SANCL-<strong>2012</strong> workshop at NAACL-HLT<br />
<strong>2012</strong> (Montreal, Canada). The DCU-Paris 13 parsing team<br />
won the Web-Parsing Challenge and Shared Task (Le<br />
Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />
and domain adaptation techniques. Lexical resources<br />
are a crucial ingredient of many LT applications and are<br />
challenging to obtain automatically for highly inflecting<br />
languages such as Arabic. Attia et al. (<strong>2012</strong>a, <strong>2012</strong>b)<br />
show how a combination of finite state and machine<br />
learning based technologies can be used to produce<br />
wide coverage lexical resources for Modern Standard<br />
Arabic (MSA) using the Arabic Giga-Word corpus<br />
together with data crawled from the Al Jazeera web site,<br />
as well as how spell checking for MSA can be improved.<br />
Sentiment analysis is a key task in many LT applications.<br />
The DCU ILT3 team investigates the granularity of<br />
syntactic information required to improve sentiment<br />
analysis (Tu et al., <strong>2012</strong>).<br />
Collaborations<br />
Collaboration is at the core of <strong>CNGL</strong>, including<br />
close engagement with <strong>CNGL</strong> industry partners,<br />
university-based <strong>CNGL</strong> researchers, and international<br />
collaborators as well <strong>CNGL</strong> participation in international<br />
research projects (including EU FP7 funded projects).<br />
Collaboration is also particularly visible in our<br />
demonstrator systems, which draw on and combine<br />
research from the four <strong>CNGL</strong> research tracks focusing on<br />
industry partner needs and requirements.
36<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Some of the ILT collaboration highlights in <strong>2012</strong> include:<br />
} In partnership with DCU and the National Centre for<br />
Language Technology (NCLT), <strong>CNGL</strong> was successful<br />
in a Marie Curie Mobility grant application for the<br />
EXPERT PhD Graduate School (€481K, DCU PI Prof.<br />
Josef van Genabith) with a total of 15 PhD Marie<br />
Curie fellowships (two of them at DCU) and three<br />
postdoctoral researchers. Led by the University of<br />
Wolverhampton (UK), EXPERT focuses on empirical<br />
approaches to (machine) translation, and as part of<br />
their training PhD students will spend time at DCU’s<br />
EXPERT university and industry partners.<br />
} In partnership with DCU and the National Centre for<br />
Language Technology (NCLT), <strong>CNGL</strong> was successful<br />
in an FP7 SA (Support Action) application called<br />
QTLaunchPad (€426K, DCU PI Prof. Josef van<br />
Genabith). Led by DFKI (German Research Centre<br />
for Artificial Intelligence), QTLaunchPad targets High<br />
Quality MT and funds two postdoctoral researcher<br />
positions at DCU.<br />
project. DCU’s role is around developing word- and<br />
phrase-aligned data resources (including bilingual<br />
dictionaries and transfer grammars) from the<br />
acquired parallel corpora and using this data to<br />
build MT systems.<br />
META-NET aims to mobilise and build a network<br />
between various language technology research<br />
groups within Europe, including commercial<br />
providers of applications and services and other<br />
relevant stakeholders. DCU is heavily involved<br />
in dissemination activities as well as organising<br />
workshops and the provision of data sets and<br />
annotations for the use of machine learning<br />
techniques for MT system combination. In<br />
this way, the project hopes to bridge the gaps<br />
between the machine learning community and<br />
the MT research community. The network is led<br />
by DFKI (Germany). <strong>CNGL</strong> industry partners<br />
DNP, Microsoft, Symantec and Applied Language<br />
Solutions are members of META-NET.<br />
} In partnership with DCU, the National Centre for<br />
Language Technology and international collaborators,<br />
<strong>CNGL</strong> was successful in attracting €1M funding<br />
as lead partner (DCU Lead PI Dr. Antonio Toral)<br />
in the EU FP7 Abu-MaTran project, focusing on<br />
enhancing industry-academia cooperation as a key<br />
aspect to tackle one of Europe’s biggest challenges:<br />
multilinguality.<br />
} <strong>CNGL</strong> and ILT1 in partnership with DCU and the<br />
National Centre for Language Technology, are<br />
continuing their strong engagement in European<br />
EU FP7 Machine Translation projects PANACEA,<br />
CoSyne, PLuTO, and MultilingualWeb-LT as well<br />
as the META-NET/T4ME Network of Excellence:<br />
The CoSyne project focuses on multilingual<br />
content synchronisation for wikis. The project<br />
is led by the University of Amsterdam. DCU’s<br />
involvement centres on diagnostic linguisticbased<br />
evaluation of MT systems between multiple<br />
European languages.<br />
The PANCEA project aims to develop a platform<br />
for automatic, normalised annotation and costeffective<br />
acquisition of language resources<br />
for human language technologies centred on<br />
interoperable web services. The Universitat<br />
Pompeu Fabra (Spain) is co-ordinating the STREP<br />
Pictured at the launch of the META-NET White Paper on The Irish<br />
Language in the Digital Age are its authors including (second from right)<br />
Prof. Ailbhe Ní Chasaide of <strong>CNGL</strong> and (centre) Mr. Dinny McGinley T.D.<br />
Minister for State for the Gaeltacht<br />
The PLuTO (Patent Translations Online) project is<br />
a PSP project focused on delivering a solution for<br />
online patent translation, including the use of MT<br />
and TM technologies tuned to the patent domain.<br />
This project is co-ordinated by DCU, who also look<br />
after research and development of patent-tuned<br />
MT systems for multiple languages.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 37<br />
<strong>CNGL</strong>, represented jointly through ILT1, LOC<br />
and SF2 (TCD, DCU and UL) and together<br />
with <strong>CNGL</strong> industry partners Microsoft and<br />
VistaTEC, were successful in attracting funding<br />
for the EU FP7 Support Action MultilingualWeb-<br />
LT, an international consortium with senior<br />
representatives of the translation industry as well<br />
as standards bodies coordinated by Prof. Felix<br />
Sasaki, DFKI, Germany, to support research on the<br />
interoperability of language technologies on the<br />
web by defining new metadata standards.<br />
} Funding from DCU‘s Ireland-India Fund and the<br />
Government of India‘s India-Ireland Cooperative<br />
Science Programme is facilitating collaboration<br />
of <strong>CNGL</strong> with IIIT Hyderabad on English-Indian<br />
translation systems and has enabled DCU to coorganise<br />
a COLING-<strong>2012</strong> workshop on Machine<br />
Translating and Parsing Indian Languages<br />
(MTPIL-<strong>2012</strong>) in Mumbai, India.<br />
} ILT teams have forged extensive international<br />
research collaborations and have published widely<br />
with colleagues from prestigious institutions in many<br />
countries, including China, USA, Czech Republic,<br />
Germany, Spain, UAE, Belgium, Italy and Hungary.<br />
} Dr. Carl Vogel has been increasingly engaged in the<br />
EU COST action IS1004, WebDataNet, on conducting<br />
iScience availing of the opportunities that emerge<br />
from access to raw data and participants in research<br />
via the Internet.<br />
} <strong>CNGL</strong> demonstrator systems are combining research<br />
teams across <strong>CNGL</strong> tracks, partner universities and<br />
industry partners:<br />
KantanMT – Moses on the Cloud: involves close<br />
collaboration between ILT and <strong>CNGL</strong> spinout<br />
Xcelerator Machine Translations Ltd.<br />
PLuTO – Facilitating Patent Search with Machine<br />
Translation: involves active collaboration with the<br />
PLuTO FP7 project at DCU<br />
Rapid MT Retraining: involves tight collaboration<br />
between ILT, SF, the FP7 PANACEA project at DCU,<br />
and the Multilingual Web-LT project<br />
WebWOZ – A Wizard of Oz Platform: involves<br />
close collaboration between ILT and SF<br />
The <strong>CNGL</strong> Demonstrators Programme has<br />
promoted strong collaboration between ILT1 and<br />
ILT2 researchers in the demonstrator Personalising<br />
Speech for Interpersonal Communication<br />
(MySpeech). One highlight of this collaboration<br />
was to use the Wizard-of-Oz framework to conduct<br />
a preliminary evaluation of the MySpeech system<br />
for pronunciation training of foreign languages<br />
(Cabral et al., <strong>2012</strong>).<br />
} ILT3 has strongly collaborated with DCM, ILT1, SF and<br />
<strong>CNGL</strong> industry partners (particularly Symantec and<br />
VistaTEC) and affiliates (Digital Linguistics) on text<br />
classification for particular applications.<br />
} ILT1 and ILT3 have been collaborating closely with<br />
researchers at the National Centre for Language<br />
Technology (NCLT) on using the LFG AA output to<br />
improve MT evaluation, extending the German LFG<br />
AA feature set to improve parsing the German side<br />
of the EuroParl data, improving the LFG-inspired<br />
constituency to dependency conversion, integrating<br />
multi-word expressions in the LFG AA, integrating<br />
MWEs into constituency parsing, and tuning a number<br />
of statistical parsing architectures to user-generated<br />
data (including Twitter data and user forum data).<br />
} ILT1 and ILT3, in collaboration with the National<br />
Centre for Language Technology (NCLT), are<br />
continuing their close research cooperation with<br />
<strong>CNGL</strong> industry partner Symantec on tuning MT and<br />
text analytics technologies to analyse user-generated<br />
content: in addition to the existing collaboration<br />
(Pratyush Banjeree, PhD student with ILT1), Symantec<br />
is funding research on tuning language technologies<br />
to user-generated text in partnership with IRCSET<br />
(Irish Research Council for Science, Technology and<br />
Engineering) through a project involving one PhD<br />
student and one postdoctoral researcher in a project<br />
led by Dr. Jennifer Foster.<br />
} ILT3 (Dr. Carl Vogel) is continuing collaborations<br />
with VistaTEC and Digital Linguistics, including<br />
preparations for joint publications. Engagement<br />
with Microsoft has commenced leading to joint<br />
development of text classification methods and<br />
tools detecting offensive content in user fora<br />
(both linguistic and non-linguistic content) for 2013.<br />
Text Classification for Bulk Localisation Review:<br />
involves active collaboration between ILT3, SF2<br />
and industry partner VistaTEC.
38<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
} <strong>CNGL</strong>, through the DCU MT team in ILT1, is<br />
continuing to work with Prof. Mikel Forcada from<br />
Universitat d’Alacant, Spain following the success of<br />
his Walton Fellowship at <strong>CNGL</strong> in 2010. Prof. Forcada<br />
has continued to work with PhD student Sandipan<br />
Dandapat and visited <strong>CNGL</strong> again in <strong>2012</strong>. Prof. Mikel<br />
Forcada and Prof. Khalil Sima’an from the University<br />
of Amsterdam partner DCU in an EU FP7 application<br />
DELIQAT in the area of High Quality MT.<br />
Year 5 also saw the arrival of PhD students who joined<br />
ILT2. Christoph Wendler and Maria O’Reilly joined the<br />
<strong>CNGL</strong> speech group at TCD in April and May <strong>2012</strong><br />
respectively.<br />
People<br />
Year 5 has been a very dynamic year in terms of arrivals<br />
and departures. With Prof. Andy Way’s departure to an<br />
industry appointment in June 2011 and the time it took<br />
to find a new Professor in Machine Translation, Prof. Josef<br />
van Genabith (<strong>CNGL</strong> Director) assumed the position<br />
of ILT co-track leader (along with Prof. Nick Campbell<br />
of TCD) as an interim arrangement in addition to his<br />
position as <strong>CNGL</strong> Director and ILT1 lead.<br />
Prof. Qun Liu has joined DCU, <strong>CNGL</strong> and the NCLT as<br />
Professor of Machine Translation and leader of the MT<br />
group. Prof. Liu was the Director of the Natural Language<br />
Processing Research Group in the Institute of Computing<br />
Technology at the Chinese Academy of Sciences (CAS)<br />
in Beijing. He has over 150 research publications and his<br />
work is widely cited internationally. He has produced<br />
ground-breaking research in many aspects of statistical and<br />
rule-based machine translation as well as in Chinese word<br />
segmentation and NLP. He has successfully led a large<br />
number research projects at CAS. His research interests<br />
span Chinese Natural Language Processing, Machine<br />
Translation and Information Extraction. Prof. Liu has<br />
quickly embedded in <strong>CNGL</strong> and made key contributions<br />
to an EU FP7 application currently under review.<br />
Some of the 11 visiting MSc and PhD scholars who worked with ILT<br />
during <strong>2012</strong> under <strong>CNGL</strong>’s postgraduate internship programme<br />
Eleven visiting MSc and PhD interns joined ILT over five<br />
months in <strong>2012</strong>, under <strong>CNGL</strong>’s postgraduate internship<br />
programme. The programme enables students to gain<br />
valuable experience as part of a highly-regarded and<br />
continually-growing research centre. This year’s<br />
programme attracted interns from institutions across the<br />
globe, including Italy, France, China and India. The<br />
internships covered a wide range of topics in Natural<br />
Language Processing and Machine Translation.<br />
Dr. Ergun Biçici joined <strong>CNGL</strong>, NCLT and the DCU MT<br />
team as a postdoctoral researcher from Koç University<br />
(Turkey) and is working on regression-based approaches<br />
for MT and parse quality estimation. Dr. Biçici has a<br />
strong background in machine learning and is<br />
contributing key expertise to the <strong>CNGL</strong> research teams.<br />
Dr. Ingmar Steiner joined the ILT2 group at UCD in June<br />
<strong>2012</strong> and worked jointly with the Speech Communication<br />
group at TCD. As of December <strong>2012</strong>, he has moved to<br />
the Computational Linguistics and Phonetics department<br />
at DFKI, Saarbrücken, Germany, as a senior researcher, to<br />
set up an Independent Research Group.<br />
Prof. Qun Liu joined <strong>CNGL</strong> at DCU as Professor of Machine Translation<br />
in <strong>2012</strong>
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 39<br />
A number of ILT researchers moved on to roles at other<br />
academic institutions or transitioned to industry during<br />
<strong>2012</strong>.<br />
Achievements<br />
Postdoctoral researcher Dr. Junhui Li (ILT1 DCU) took up<br />
a position as postdoctoral researcher at the University of<br />
Maryland (USA), where he is continuing his research on<br />
machine translation.<br />
Dr. Pavel Pecina (ILT1 DCU) accepted a call as an<br />
Associate Professor in Machine Translation to the Charles<br />
University in Prague (Czech Republic).<br />
Dr. Yifan He (former <strong>CNGL</strong> ILT1 PhD student and<br />
MT postdoctoral researcher) accepted a posdoctoral<br />
researcher position at New York University (USA).<br />
Hala Almaghout, Sandipan Dandapat, Pratyush<br />
Banerjee, Ankit Srivastava, Stephen Doherty and Irena<br />
Yanushevskaya successfully defended their PhD vivas in<br />
<strong>2012</strong>.<br />
Dr. Sandipan Dandapat has taken up a lecturing position<br />
at IIT-Guwahati, Assam, India.<br />
After a dedicated contribution to <strong>CNGL</strong> over the past<br />
five years, Dr. Peter Cahill departed the ILT2 group to<br />
become engaged full-time in his spin-out company,<br />
Scream Technologies. His start-up company develops<br />
speech synthesis technology products which have<br />
valuable applications in areas as diverse as video games,<br />
customer support and advertising.<br />
John Kane from ILT2 submitted his thesis in September<br />
<strong>2012</strong> and he is awaiting his defence. Meanwhile, he<br />
departed <strong>CNGL</strong> in October and started a research<br />
position with the Fastnet project, at TCD. The PhD<br />
fellow Amelie Dorn departed TCD in November <strong>2012</strong>.<br />
Stephen Doherty was one of six ILT1 doctoral students to successfully<br />
defend their PhD theses during <strong>2012</strong><br />
Awards and Prizes<br />
} Prof. Josef van Genabith was recipient of the<br />
DCU President’s Research Award for Science and<br />
Engineering <strong>2012</strong>.<br />
} Prof. Carl Vogel and Liliana Mamani Sanchez (TCD)<br />
were awarded a best paper prize for their work<br />
“Epistemic Signals and Emoticons Affect Kudos”<br />
at 3rd IEEE International Conference on Cognitive<br />
Infocommunications in December <strong>2012</strong>.<br />
} Dr. Martin Emms and Hector Franco-Penya (TCD)<br />
were recipients of a best paper award at the<br />
International Conference on Pattern Recognition<br />
Application and Methods (ICPRAM <strong>2012</strong>) in February<br />
<strong>2012</strong>.<br />
} The DCU-Paris 13 team won the Web-Parsing<br />
Challenge and Shared Task organised by Google as<br />
part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le Roux,<br />
Foster, Wagner, Kaljahi and Bryl <strong>2012</strong>), using the<br />
DCU LORG parser platform and domain adaptation<br />
techniques.<br />
} Prof. Josef van Genabith (<strong>CNGL</strong>/NCLT/DCU) has<br />
been appointed as general chair of COLING 2014, to<br />
be held in Dublin in August 2014.<br />
UCD hosts Innovation and Applications in Speech Technology (IAST)<br />
Workshop in March
40<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Prof. Josef van Genabith delivers his address before accepting the DCU<br />
President’s Research Award for Science and Engineering in February<br />
International Collaborations<br />
The EU FP7 projects are continuing on track, with<br />
PANACEA, PLuTO, CoSyne and T4ME/META-NET<br />
successfully passing their second year reviews. Work<br />
carried out on MT within the PLuTO project on patentlanguage<br />
MT has gained a significant amount of<br />
commercial interest and press coverage. EU FP7 Support<br />
Action MultilingualWeb-LT is running at full strength. The<br />
new EU FP7 support action (SA) project QTLaunchPad<br />
commenced in June <strong>2012</strong> and recruitment for the new<br />
Marie Curie EXPERT PhD programme is under way.<br />
Industry Engagement<br />
During <strong>2012</strong> considerable effort has been placed on<br />
exploring avenues for commercialisation of ILT research<br />
and on developing industrially-relevant prototype and<br />
proof-of-concept systems. In turn, there has been a<br />
significant increase in commercial interest in the research<br />
we are carrying out in <strong>CNGL</strong>.<br />
Technology Innovation Development Award (TIDA)<br />
projects are funded by Science Foundation Ireland to<br />
support the transition of basic research outputs from the<br />
lab to industrial applications, primarily through industry<br />
strength implementations and road-testing in commercial<br />
environments.<br />
ILT1 (MT) has been successful in attracting funding for<br />
two TIDA projects. TMTPrime (Machine Translation<br />
and Translation Memory Integration in a Localisation<br />
Workflow, Dr. Declan Groves, DCU) started in mid-<br />
<strong>2012</strong> and is focusing on developing an industry-strength<br />
application to optimally combine the outputs of Machine<br />
Translation (MT) systems with Translation Memory (TM)<br />
(fuzzy) matches, based on <strong>CNGL</strong> ILT1 basic research<br />
reported in (He et al., 2010). The technology uses<br />
translation quality prediction to recommend either MT<br />
or TM output based on estimated post-editing effort.<br />
The project is particularly important as TMs are still the<br />
main-stay technology in many localisation operations<br />
and pricing models are based on TM reuse. TMTPrime<br />
technology guarantees that the MT/TM combination<br />
will have TM-based pricing as an upper bound, with<br />
potentially substantial savings through the use of MT.<br />
Project partners include DCU, Symantec, VistaTEC<br />
and Welocalize. The second ILT1 TIDA (Dr. Antonio<br />
Toral, DCU) focuses on Iterative Retraining of an MT<br />
System with Post-Edits. This is particularly important as<br />
mistakes in MT output corrected by human professional<br />
translators should be made as available as possible as<br />
additional training material to the MT systems in order<br />
to prevent similar mistakes in future. Two challenges<br />
need to be overcome: (i) full retaining of a statistical MT<br />
system is time consuming and computationally expensive<br />
and (ii) post-edits generally constitute a small amount of<br />
additional data unlikely to sway a substantial statistical<br />
MT model. Both challenges are addressed in the TIDA,<br />
partly based on previous <strong>CNGL</strong> ILT1 basic research<br />
reported in Banerjee et al. (<strong>2012</strong>). Recruitment for the<br />
Retraining TIDA is under way.<br />
Parts of <strong>CNGL</strong>’s MT technology have been successfully<br />
licensed for evaluation to a new spin-out company,<br />
Xcelerator Machine Translations Ltd. Founded by Tony<br />
O’Dowd, previously CEO of <strong>CNGL</strong> industry partner<br />
Alchemy, Xcelerator provides cloud-based MT solutions<br />
to individual translators and mid-sized localisation<br />
service providers through its KantanMT cloud-based<br />
MT platform. The company’s vision is to make machine<br />
translation simple to use for everyone.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 41<br />
A former full-time academic with <strong>CNGL</strong> ILT2 at UCD,<br />
Dr. Peter Cahill headed up the <strong>CNGL</strong> spin-out company,<br />
Scream Technologies. It specialises in creating synthetic<br />
voices from human actors, enabling companies to create<br />
human-sounding synthetic speech and control how<br />
it sounds. Dr. Peter Cahill has been named as one of<br />
Ireland’s top technology and start-up leaders.<br />
Continued collaborations between ILT3 and VistaTEC,<br />
Digital Linguistics and Symantec are planned. 2013 will<br />
also deliver direct engagement with Microsoft, through<br />
the deployment of text classification tools developed in<br />
<strong>CNGL</strong>I in the context of <strong>CNGL</strong>II, particularly in the area<br />
of classifying offensive content (this will address both<br />
linguistic and non-linguistic content).<br />
Plans for 2013<br />
For ILT, <strong>2012</strong>, <strong>CNGL</strong> Year 5, was dominated by high<br />
research and publication output, a large number of ILT<br />
PhD students completing, strong industry engagement<br />
and extensive preparations for the second-cycle <strong>CNGL</strong><br />
(<strong>CNGL</strong>II) application and site-review.<br />
ILT technologies will be spread into three key <strong>CNGL</strong>II<br />
themes supporting the Global Content Value Chainbased<br />
architecture of <strong>CNGL</strong>II: ILT3 (Text Analytics) will<br />
move into the <strong>CNGL</strong>II Curation theme, ILT1 (MT) will<br />
move into the Translation and Localisation Theme, while<br />
ILT2 (Speech) will move to the Delivery and Interaction<br />
theme.<br />
2013 will see the completion of a number of ILT-affiliated<br />
EU FP7 projects including the CoSyne and Panacea<br />
STREPs, the META-NET/T4ME Network of Excellence,<br />
and the PLuTO Public Private Partnership, all with key<br />
involvement and successful contributions from project<br />
partner DCU.<br />
At the same time, the ILT-affiliated EU FP7 Support<br />
Action QTLaunchPad will be under full steam in 2013.<br />
QTLaunchPad is charged to develop research and<br />
innovation scenarios including community mobilisation<br />
and technology support for shared tasks in the area<br />
of high-quality machine translation, focusing on novel<br />
quality metrics, quality estimation and targeting specific<br />
MT quality barriers. QTLaunchPad partner DCU is<br />
contributing key expertise. Likewise, the prestigious<br />
EXPERT EU Marie Curie PhD graduate school and<br />
mobility programme was launched at the end of <strong>2012</strong><br />
and PhD candidates will start in early 2013. EXPERT<br />
partner DCU will host 2 PhD students working on<br />
MT system combination and human-centric aspects<br />
of MT technology development. The EU FP7-funded<br />
MultilingualWeb-LT support action involves <strong>CNGL</strong><br />
partners TCD, DCU, UL, Microsoft and VistaTEC,<br />
and continues to focus on developing important<br />
standards and interoperability for multilingual content<br />
management. The EU FP7 Abu-MaTran project (Dr.<br />
Antonio Toral) will tackle the multilingualism challenge<br />
through an Industry-Academia partnership.<br />
The first <strong>CNGL</strong> funding cycle is going into a non-costed<br />
extension phase (December <strong>2012</strong> – November 2013),<br />
completing a small number of <strong>CNGL</strong> research and PhD<br />
projects and preparing and supporting the transition to<br />
<strong>CNGL</strong>II.<br />
ILT1: Machine Translation<br />
Prof. Qun Liu has fully taken charge of the DCU MT<br />
Group and will drive cooperation with research partners<br />
in particular at the Chinese Academy of Sciences as well<br />
as exploring commercial opportunities in the area of<br />
localisation with Chinese industry partners.<br />
Walid Aransa (LIUM, France), Luong Ngoc Quang (LIG, France),<br />
Dr. Antonio Toral (DCU) pictured at the MT Marathon <strong>2012</strong> in<br />
Edinburgh in September
42<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INTEGRATED LANGUAGE TECHNOLOGIES<br />
Commercial activities will continue to provide a strong<br />
focus for ILT1 in 2013, focusing in particular on extended<br />
collaborations with <strong>CNGL</strong> start-up company Xcelerator<br />
and <strong>CNGL</strong> industry partners Welocalize and Symantec.<br />
The TMTPrime TIDA (Dr. Declan Groves) has produced<br />
mature TM/MT combination technologies based on<br />
automatic quality prediction. TMTPrime technologies<br />
will be showcased in global localisation industry events<br />
including GALA (Miami, 2013). The second TIDA on<br />
efficient MT retraining technologies (Dr. Antonio Toral)<br />
will provide new opportunities to immediately use user<br />
feedback (such as post-editing corrections) to improve<br />
MT.<br />
ILT2: Speech<br />
ILT2 will see the completion of on-going PhD theses<br />
on prosody, speech-to-speech translation and emotive<br />
speech. Due to Dr. Peter Cahill’s (UCD) departure in<br />
order to lead the Scream Technologies <strong>CNGL</strong> spin-out<br />
company, the remaining UCD speech group (Dr. Joao<br />
Cabral) will transition to Prof. Nick Campbell’s Delivery<br />
and Interaction theme at TCD early in 2013.<br />
ILT3: Text Analytics<br />
ILT3 will complete documentation of results from the<br />
use of ILT3 text-classification methods in selecting<br />
appropriate items for training MT systems for data sets<br />
with otherwise little directly appropriate material.<br />
The analysis of epistemic markers and social signals<br />
in expert forum contexts has shown promise. ILT3 will<br />
continue to develop these analyses and seek additional<br />
ways to fund further follow-on study, including through<br />
exploitations of the methods developed and conclusions<br />
drawn from the study of linguistic and pragmatic<br />
behaviours in the Symantec user forum.<br />
Participation in text classification tasks is already planned<br />
in areas of spotting predatory contributions in social<br />
networks and other authorship attribution exercises in an<br />
upcoming CLEF shared task.<br />
Work on text classification methods extends into <strong>CNGL</strong>II,<br />
with collaborations planned with VistaTEC, Digital<br />
Linguistics, Symantec and Microsoft.
Digital Content<br />
Management
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 45<br />
Strand Name: Digital Content Management<br />
AREA CO-ORDINATOR:<br />
PROF. VINCENT WADE<br />
Participant Names and Affiliation<br />
Industrial Collaborators<br />
International Collaborators<br />
Dr. Fred Hollowood<br />
Dr. Johann Roturier<br />
Mr. Jason Rickard<br />
Symantec<br />
Symantec<br />
Symantec<br />
Prof. Helen Ashman<br />
Dr. Prasenjit Majumder<br />
University of Southern<br />
Australia<br />
DAIICT, Gandhinagar India<br />
Mr. Dag Schmidtke<br />
Microsoft<br />
Dr. Alexander Troussov<br />
IBM<br />
Mr. Takeshi Fukunaga<br />
Dai Nippon Printing<br />
Mr. Hideyuki Suzuki<br />
Dai Nippon Printing<br />
Faculty<br />
Dr. Owen Conlan Trinity College Dublin DCM3<br />
Dr. Gareth Jones Dublin City University DCM1 Workpackage Leader<br />
Prof. Declan O’Sullivan Trinity College Dublin DCM2<br />
Dr. Claus Pahl Dublin City University DCM2<br />
Ms. Mary Sharp Trinity College Dublin DCM3<br />
Dr. Tony Veale University College Dublin DCM2 Workpackage Leader<br />
Prof. Vincent Wade Trinity College Dublin DCM3 Workpackage Leader<br />
Postdoctoral Researchers<br />
Dr. Declan Dagger Trinity College Dublin DCM3<br />
Dr. Yanfen Hao University College Dublin DCM2<br />
Prof. Séamus Lawless Trinity College Dublin DCM3<br />
Dr. Johannes Leveling Dublin City University DCM1<br />
Dr. Alexander O’Connor Trinity College Dublin DCM2<br />
Mr. Ian O’Keeffe Trinity College Dublin DCM3<br />
Dr. Melike Sah Trinity College Dublin DCM2<br />
Dr. Dong Zhou Trinity College Dublin DCM1
46<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
DIGITAL CONTENT MANAGEMENT<br />
PhD Students<br />
Mr. Yalemisew Mintesinot Abgaz Dublin City University DCM2<br />
Ms. Yi Chen Dublin City University DCM3<br />
Mr. Mourad El Moueddeb University College Dublin DCM2<br />
Ms. Bo Fu Trinity College Dublin DCM2<br />
Mr. Debasis Ganguly Dublin City University DCM3<br />
Mr. Mohammed Rami Ghorab Trinity College Dublin DCM1<br />
Mr. Brendan Spillane Trinity College Dublin DCM2<br />
Mr. Muhammad Javed Dublin City University DCM2<br />
Mr. Kevin Koidl Trinity College Dublin DCM3<br />
Mr. Killian Levacher Trinity College Dublin DCM2<br />
Mr. Guofu Li University College Dublin DCM2<br />
Ms. Wei Li Dublin City University DCM2<br />
Ms. Alejandra López Fernández University College Dublin DCM2<br />
Mr. Walid Magdy Dublin City University DCM1<br />
Mr. Jinming Min Dublin City University DCM1<br />
Ms. Catherine Mulwa Trinity College Dublin DCM3<br />
Mr. Neil Peirce Trinity College Dublin DCM3<br />
Mr. Ben Steichen Trinity College Dublin DCM3<br />
Research Assistants<br />
Mr. David Foley Trinity College Dublin DCM3<br />
Mr. Brian Gallagher Trinity College Dublin DCM3<br />
Ms. Yang Yang Trinity College Dublin DCM3<br />
Funding<br />
<strong>2012</strong> Funding from SFI<br />
<strong>CNGL</strong> (07/CE/I1142): €680,141<br />
SFI TIDA Award – UNITE: Personalised Cross-site<br />
Personalisation €60,000<br />
<strong>2012</strong> Funding from Other Sources<br />
EC FP7 Cendari TCD €120,000<br />
Enterprise Ireland Learning Technology Centre<br />
€3,000,000<br />
SFI TIDA Award – Linguabox: Automated Open Content<br />
Repurposing Service to support Personalised eLearning<br />
€87,768<br />
SFI TIDA Award – An Integrated Software Suite to<br />
provide Next Generation Personalised Multilingual<br />
Customer Care €67,748
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 47<br />
Research Overview: Digital<br />
Content Management (DCM)<br />
Goals<br />
The key challenge of the DCM research track is to<br />
provide a step change in multilingual digital content<br />
management to enable the delivery of next generation<br />
localisation 2 . DCM focuses on three areas: (i) user query<br />
enhancement; (ii) content metadata and knowledge<br />
model development; and (iii) adaptive content<br />
retrieval and dynamic composition of localised content<br />
(customised for the user’s needs and context of use).<br />
Because users need to gain access to information across<br />
many content boundaries, the DCM research entails not<br />
just traditional corporate content but also open corpus<br />
content, user-generated content (blogs discussion fora,<br />
blogs, wikis) and social networking interactions (tweets,<br />
postings, shared ‘walls’, location check-ins, etc.). The<br />
DCM research track is divided across three work areas,<br />
called DCM1, DCM2 and DCM3:<br />
} Enhancement of user queries based on user context<br />
information and feedback (DCM1)<br />
} Automation and semi-automation of the generation<br />
of knowledge models, metadata and identification of<br />
sentiment required for digital content management<br />
and personalised (re)composition (DCM2)<br />
} Support for dynamic composition of personalised<br />
digital content, customised for the user’s need<br />
and context across such diverse content areas as<br />
corporate, open corpora as well as user-generated<br />
content or content generated via social networking<br />
(DCM3)<br />
This research is integrated across the <strong>CNGL</strong> research<br />
tracks via combined prototypes, experiments and the<br />
<strong>CNGL</strong> Demonstrators. DCM has demonstrated its<br />
ground-breaking technologies within many application<br />
domains such as Personalised Multilingual Customer<br />
Care, Personalised Multilingual Social Networking, and<br />
Personalised Information and Learning Portals, etc.<br />
Such demonstrator systems allow the DCM research<br />
to illustrate the impact of its technology as well as<br />
2 Next Generation Localisation seeks to enable people to interact with<br />
digital content, products, services and each other, in their own language,<br />
according to their own culture, and according to their own personal needs<br />
and preferences.<br />
demonstrate the benefits of integration with all other<br />
<strong>CNGL</strong> research tracks. For example, DCM researchers<br />
collaborate with ILT’s experts on multilingual translation,<br />
speech recognition/synthesis for multimodal operation,<br />
and text analysis for enhanced understanding of the<br />
content).<br />
Research Barriers and Methodologies<br />
to Address Them<br />
With the increasing volume of digital content and<br />
the diversity of sources from which they are created<br />
(e.g. corporate content, user-generated content,<br />
social networking, community content), it is becoming<br />
impossible to discover, manually annotate, slice and<br />
compose appropriate digital content, rendered in the<br />
language and device suited to the intended users.<br />
In addition, next generation localisation is not just<br />
about corporate localisation but must be adapted to<br />
the individual user’s context, languages, preference<br />
and means of access. Therefore, next generation<br />
localisation must not only be adaptive to specific<br />
corporate localisation requirements, but also satisfy<br />
the individual user’s need for information by adapting<br />
it to the context, language, preferences and preferred<br />
delivery device of the individual. DCM research in Year<br />
5 focused increasingly on addressing the problems of<br />
dynamic user-generated (multilingual) content as well<br />
as corporate and open web content. This increased<br />
integration of global social media into the DCM research<br />
is a significant development of <strong>CNGL</strong> research.<br />
The three principal areas of DCM research relate to the<br />
challenges of (i) more accurately identifying and selectively<br />
retrieving appropriate content; (ii) capturing and modelling<br />
knowledge in a structured, reusable way so that the<br />
multilingual, heterogeneous content can be more easily<br />
managed and transformed; and (iii) supporting the user<br />
by harnessing adaptivity/personalisation (based on the<br />
user’s context) to give the user significantly improved<br />
exploration of the information he/she needs. Also<br />
involved in this research is the development of new ways<br />
to evaluate the impact and performance of adaptive<br />
(personalised) systems. A central theme running through<br />
all of these challenges is the need to provide the<br />
information in a form that is tailored to the user’s<br />
requirements, preferences and context, and which<br />
includes not only the direct response to his/her initial<br />
queries, but delivers a unique information presentation<br />
tailored to his/her context, preferences and task.
48<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
DIGITAL CONTENT MANAGEMENT<br />
The approach taken in the DCM research is to enhance<br />
and combine key aspects of Adaptive Hypermedia<br />
(AH) and Information Retrieval (IR) research to provide<br />
techniques, technology and prototype systems to<br />
implement advanced retrieval, slicing and adaptive<br />
composition of multilingual digital content. The DCM1<br />
work package addresses the issues of personalised and<br />
contextualised multilingual IR and, more specifically,<br />
query enhancement. DCM1 research includes the<br />
application of cross-lingual techniques to permit users<br />
to gain access to information which is not in their native<br />
tongues. It also focuses on Personalised IR (PIR) to<br />
incorporate the use of user modelling techniques to<br />
alter the behaviour of IR systems. The approach employs<br />
techniques from IR and AH to produce hybrid Adaptive<br />
IR systems.<br />
integrated with adaptive hypermedia composition and<br />
social media aggregation techniques developed within<br />
DCM3.<br />
The focus of DCM2 is on the metadata and knowledge<br />
models required by systems to provide this more<br />
intelligent behaviour. DCM2 includes work on generating,<br />
managing and linking structured knowledge in the form<br />
of ontologies, content knowledge models and metadata<br />
description. The main focus of this work is on addressing<br />
the shortcomings in current work on creating and sharing<br />
metadata between different intelligent systems, slicing<br />
content so as to be more easily reused and recomposed<br />
(for personalisation) and deriving knowledge models to<br />
determine aspects of the content and user context e.g.<br />
sentiment.<br />
Finally, DCM3 focuses directly on recomposing and<br />
aggregating content and evaluating the quality and<br />
impact of adaptive systems. A key aspect of this<br />
challenge is the source of the content. DCM3 investigates<br />
the automatic re-composition and aggregation of<br />
content from corporate information repositories,<br />
open documents, user fora and discussion lists, blogs,<br />
shared community content (wikis), social networking<br />
interactions and social media (tweets, postings, shared<br />
‘walls’, etc.). DCM3 focuses on the aggregation and recomposition<br />
of these different forms of digital content to<br />
provide personalised responses for a user.<br />
Although presented separately above, the three Work<br />
Packages are highly integrated. For example, the<br />
metadata models and knowledge models produced<br />
in DCM2 are utilised in DCM3 and DCM1. Also, the<br />
techniques developed in DCM1 for multilingual query<br />
enhancement and Personalised IR techniques are<br />
DCM undergraduate intern Ciarán Porter of Trinity College Dublin<br />
(above right) presents his work on ‘Crowd Sourcing for Query<br />
Development and Relevance Judgement’ at the <strong>CNGL</strong> undergraduate<br />
intern showcase<br />
Year 5 Progress<br />
DCM research in Year 5 has achieved significant impact<br />
both in the quality of its scientific breakthroughs and the<br />
demonstration of industrial potential. DCM has published<br />
over 30 peer-reviewed journal and international<br />
conference publications this year. Journal publications<br />
include scientific papers in ACM Computing Surveys,<br />
UMUAI, Journal of IR, Web Semantic Journal, while<br />
international conference papers included publications<br />
in ACM Hypertext, SIGIR, AAAI, UMAP, COLING, CIKM,<br />
and TPDL.<br />
Progress in DCM1<br />
The research conducted in DCM1 has continued to<br />
deliver significant advancements in the area of adaptive<br />
information retrieval (IR). These advancements are<br />
achieved by enhancing both the queries a user submits<br />
to a search engine, and the results that are returned.<br />
The research developed by DCM1 utilise contextual<br />
information about individual users and implicit and<br />
explicit feedback to create more accurate or more<br />
appropriate queries, to improve the relevancy of search<br />
results and to tailor the presentation of results to that<br />
individual.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 49<br />
Continued progress has been made in enhancing the<br />
existing Personalised Multilingual IR (PMIR) framework,<br />
which employs a number of algorithms to perform both<br />
query and result adaptation. This work is part of the<br />
DCM1 focus on intelligent content discovery and delivery<br />
which is both multilingual and personalised. The PMIR<br />
framework allows multilingual resource discovery and<br />
delivery using on-the-fly machine translation of user<br />
queries and content. The Microsoft Bing API is used<br />
to perform multilingual searches on the open web.<br />
Result lists are then personalised for the individual user<br />
before being presented. The framework is designed<br />
to enable new components and approaches to be<br />
easily integrated and tested as part of an overall search<br />
process. In <strong>2012</strong> the development of the framework has<br />
been completed. The completed framework has been<br />
thoroughly evaluated as part of an experiment with real<br />
users in an authentic search setting. This evaluation<br />
demonstrated improvements in multilingual information<br />
retrieval using query and result adaptation based upon a<br />
multilingual user model. This research has been detailed<br />
in the high-impact Journal of User Modeling and User-<br />
Adapted Interaction, UMUAI (Ghorab et al., <strong>2012</strong>a).<br />
The framework has also been successfully showcased<br />
at the 20th International Conference on User Modeling,<br />
Adaptation and Personalization, UMAP <strong>2012</strong>, in<br />
Montréal, Canada (Ghorab et al., <strong>2012</strong>b). An additional<br />
publication has been submitted to the 22nd International<br />
World Wide Web Conference, for which we are awaiting<br />
review confirmation.<br />
metric for evaluation of patent retrieval effectiveness<br />
developed previously by DCM1, was developed for the<br />
speech retrieval domain as an evaluation metric. PRES<br />
has had continued successful take-up in official patent<br />
retrieval benchmarking tasks at international conferences<br />
and competitions, e.g. CLEF <strong>2012</strong> (CLEF-IP).<br />
DCM research has focused to a larger extent on<br />
processing user-generated queries and content such<br />
as tweets and SMS, as well as processing noisy domainspecific<br />
data. DCM researchers have discovered that<br />
information retrieval tasks on such user-generated<br />
content can benefit from error correction (e.g. from<br />
OCR, spelling errors) and handling domain terminology<br />
(e.g. abbreviations, acronyms, and technical terms).<br />
DCM established a simple but strong retrieval baseline<br />
(without domain adaptation) which would have ranked<br />
among the top five participating groups at the<br />
international TRACMed event 2011.<br />
Collaborative research has continued with DCM3 to<br />
enhance techniques for personalising the web search<br />
using social tagging data. Personalised query expansion is<br />
performed which helps to solve the vocabulary mismatch<br />
problem (Zhou et al., <strong>2012</strong>b). A novel query expansion<br />
framework has been developed which generates<br />
individual user models based upon the data mined from<br />
annotations a user has made and resources the user has<br />
bookmarked on the social bookmarking platform Del.<br />
icio.us. This approach has been extensively evaluated<br />
using test collections created by crawling authentic social<br />
media data from the web. This has resulted in a highimpact<br />
publication in the most high-profile IR venue, the<br />
Journal of Information Retrieval (Zhou et al., <strong>2012</strong>a).<br />
DCM1 has continued research in cross-language IR<br />
and IR for low-resourced languages such as Bengali or<br />
Hindim. A variant of PRES, the patent retrieval score<br />
Prof. Séamus Lawless of <strong>CNGL</strong> presents research on Web Search<br />
Personalization Using Social Data at TPDL <strong>2012</strong> in September <strong>2012</strong><br />
in Paphos, Cyprus<br />
DCM1 researchers have also been involved in the<br />
organisation of various important IR events and<br />
workshops – none more so than the 36th <strong>Annual</strong> ACM<br />
SIGIR Conference, which <strong>CNGL</strong> will host in Dublin in July<br />
2013. SIGIR has significant leadership drawn from <strong>CNGL</strong><br />
(DCM) academics and staff:<br />
} General Chair – Dr. Gareth Jones<br />
} Workshops Co-Chair – Prof. Vincent Wade<br />
} Tutorials Co-Chair – Prof. Séamus Lawless<br />
} Local Organising Chair – Prof. Séamus Lawless<br />
} Publications Chairs – Dr. Liadh Kelly and Dr. Lorraine<br />
Goeuriot
50<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
DIGITAL CONTENT MANAGEMENT<br />
(Min et al., <strong>2012</strong>). One experiment explored a novel<br />
method for rewriting textual content to make it fit on<br />
devices with limited screen size (i.e. mobile devices)<br />
while retaining the readability.<br />
DCM has also cooperated with researchers in other<br />
tracks in <strong>CNGL</strong> and commercial partners. DCM1<br />
continued to integrate with research components<br />
produced by DCM2 and 3, as well as components<br />
from ILT, LOC and SF as part of the overall <strong>CNGL</strong><br />
Demonstrator Programme. The ClipArt search demo<br />
system was showcased in the SFI review and at the<br />
Localisation Innovation Showcase in 2011. It was adapted<br />
to mobile devices such as iPhones and iPads to provide<br />
image search on mobile devices and showcased at SIGIR<br />
<strong>2012</strong>.<br />
Dr. Páraic Sheridan and Dr. Gareth Jones of <strong>CNGL</strong> introduce SIGIR 2013<br />
to attendees at SIGIR <strong>2012</strong> in August <strong>2012</strong> in Portland, Oregon, USA.<br />
<strong>CNGL</strong> will host SIGIR 2013 in Dublin in July<br />
In terms of publications, DCM1 has maintained<br />
significant publication success in top journals and<br />
high impact conferences including ACM Computing<br />
Surveys (#1 ranked journal in computer science in the<br />
world), Journal of Information Retrieval, Journal of User<br />
Modeling and User Adapted Interaction, UMAP <strong>2012</strong>,<br />
TPDL <strong>2012</strong>, DocEng <strong>2012</strong>, etc. DCM has continued<br />
research in cross-language IR and IR for low-resourced<br />
languages such as Bengali or Hindim (Ganguly et al.,<br />
<strong>2012</strong>), (Ganguly et al., <strong>2012</strong>b), (Ganguly et al., <strong>2012</strong>c),<br />
(Leveling, <strong>2012</strong>).<br />
A recent research topic in IR are topic models, which can<br />
be used to model topical cohesion in digital content and<br />
to enhance IR effectiveness in general (Ganguly et al.,<br />
<strong>2012</strong>b) (Ganguly et al., <strong>2012</strong>c). On-going work in DCM<br />
aims at improving the user’s search experience, query<br />
formulation, and navigation in search results through<br />
topic model visualisation. DCM research still focuses<br />
on domain adaptation and domain-specific IR. In the<br />
medical domain, we conducted retrieval experiments on<br />
patient records from the TREC medical record retrieval<br />
track.<br />
DCM established a simple but strong retrieval baseline<br />
(without domain adaptation) which would have ranked<br />
among the top five participating groups on 2011 data<br />
(Leveling et al., <strong>2012</strong>). DCM also investigated adaptation<br />
of search to mobile devices (Leveling and Jones, <strong>2012</strong>),<br />
In addition, collaboration with the machine translation<br />
research group in the ILT track investigated the<br />
combination of techniques from information retrieval<br />
and machine translation to speed up fuzzy matching for<br />
machine translation (Leveling et al., <strong>2012</strong>b).<br />
The approaches above have been extensively evaluated<br />
on benchmark data provided by the organisers of TREC,<br />
CLEF, INEX and FIRE as well as on collections created by<br />
crawling the social media data. (Leveling, <strong>2012</strong>), (Ganguly<br />
et al., <strong>2012</strong>) (Leveling et al., <strong>2012</strong>), (Leveling and Jones,<br />
<strong>2012</strong>).<br />
Two of our PhD students have finished their internships<br />
in Microsoft Ireland in the area of multilingual query and<br />
personalisation.<br />
DCM1 researchers also organised various important IR<br />
events and workshops. <strong>CNGL</strong> co-organised the second<br />
workshop on Personalised Multilingual Hypertext<br />
Retrieval (PMHR <strong>2012</strong>) at Web Science <strong>2012</strong>. DCM1<br />
researchers also organised an evaluation task on<br />
personalised and collaborative information retrieval (PIR)<br />
at FIRE <strong>2012</strong>.<br />
A significant number of <strong>CNGL</strong> supervised Masters<br />
dissertations and final year undergraduate projects were<br />
submitted and graded in <strong>2012</strong>. A DCM1-specific Masters<br />
dissertation is currently underway in Trinity College<br />
Dublin under the supervision of Prof. Séamus Lawless<br />
investigating “Selecting Appropriate Verticals for Web<br />
Search Results”.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 51<br />
Progress in DCM2<br />
DCM2, the work package concerned with digital content<br />
knowledge modelling, extraction and organisation,<br />
recorded several key successes during <strong>2012</strong>.<br />
The research in structural content analysis for web<br />
slicing has progressed significantly. This has included<br />
the development and evaluation of a slicing tool that<br />
can extract important textual content from open-corpus<br />
web content. The system was evaluated in a successful<br />
user trial, which demonstrated the applicability of the<br />
technique in the area of language learning. This work<br />
will be further developed in 2013 under the SFI TIDA<br />
programme to support language learning through userrelevant<br />
resources harvested from the open web. It is<br />
expected that this research will result in a completed<br />
PhD in early 2013.<br />
The collaboration between DCM2 and DCM3 has<br />
continued in the area of Personalised Multilingual<br />
Customer Care, resulting in an SFI feasibility project,<br />
which developed a commercial-strength version of the<br />
research software. The Emizar system provides users<br />
with a modern, supportive environment for personalised<br />
access to federated content across several support<br />
repositories.<br />
In terms of other collaboration, DCM2 researchers have<br />
continued to work in the Digital Humanities domain,<br />
collaborating with the CULTURA EU FP7 affiliate project,<br />
co-ordinated at Trinity College Dublin.<br />
DCM2 researchers have published at several key<br />
conferences in areas such as eLearning, Hypertext and<br />
Hypermedia, and have had several successful grant<br />
applications, including an SFI Technology Innovation<br />
Development Award (TIDA).<br />
Research and development in DCM2 on lightweight<br />
subject models matured and coalesced in interesting<br />
ways in <strong>2012</strong>. This cohesion was achieved via the<br />
development of the MOODfinger framework for affective<br />
news retrieval. MOODfinger conducts continuous<br />
gathering and indexing of daily news from major web<br />
news sites, and performs affective analysis of each new<br />
story to facilitate future affective retrieval. Lightweight<br />
stereotypical models of familiar ideas are automatically<br />
acquired from the web, and are used to identify the<br />
most interesting and most affect-rich areas of a news<br />
story. These models support powerful affective query<br />
expansion and subsequent affective summarisation of<br />
any retrieved news. Publications on MOODfinger were<br />
presented at top natural language processing (NLP) and<br />
web conferences in <strong>2012</strong>, including ACL <strong>2012</strong> and WWW<br />
<strong>2012</strong>, while the MOODfinger prototype (and related<br />
natural language technologies developed within DCM2)<br />
was also showcased in public demonstrations at these<br />
conferences. MOODfinger represents both a culmination<br />
of work in DCM2 and a sound foundation for future work<br />
in affective text understanding. MOODfinger continues<br />
to be vigorously maintained and developed.<br />
Much of this (MOODfinger) work in DCM2 has focused<br />
on the challenges posed by creative language use (which<br />
is to say, the non-obvious use of familiar words and<br />
ideas). Several publications showcase our achievements<br />
in this area, such as the monograph Exploding the<br />
Creativity Myth: The Computational Foundations<br />
of Linguistic Creativity (T. Veale, from Bloomsbury<br />
Academic) and the collected volume Creativity and the<br />
Agile Mind (principal co-editor T. Veale, from Mouton<br />
deGruyter). We have helped shape European policy<br />
on computational creativity by contributing to expert<br />
consultation sessions with the European Commission,<br />
and have influenced the latest EU ICT call, which now<br />
explicitly lists Computational Creativity as a fundable<br />
objective. Building on work in DCM2, we have secured<br />
EU funding for an international coordination action<br />
to promote the field of computational creativity<br />
(PROSECCO: PROmoting the Scientific Exploration<br />
of Computational Creativity). The project will run for<br />
three years under the scientific leadership of T. Veale<br />
in UCD, and will – through its organisation of contact<br />
forums, summer schools and code camps – serve as a<br />
force magnifier for disseminating the results of DCM2<br />
research. The leadership role of DCM researchers<br />
in the computational creativity community was<br />
further emphasised by UCD’s organisation of the 3rd<br />
International Conference on Computational Creativity<br />
(ICCC <strong>2012</strong>) Dublin in May <strong>2012</strong>, which received<br />
logistical and financial support from <strong>CNGL</strong>.
52<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
DIGITAL CONTENT MANAGEMENT<br />
Mr. Seán Sherlock, T.D., Minister for Research and Innovation,<br />
launches the <strong>CNGL</strong>-affiliated Learnovate Centre in June <strong>2012</strong><br />
Katrin Drescher of award sponsors Symantec presents the LRC Best<br />
Thesis Award to Prof. Vincent Wade, who accepts the award on behalf<br />
Dr. Ben Steichen. Also pictured is Reinhard Schäler of LRC/<strong>CNGL</strong><br />
Progress in DCM3<br />
DCM3 is responsible for the development of systems<br />
which provide dynamic aggregation and composition<br />
of content, customised for the user’s need and context.<br />
Such content can be sourced from diverse sources i.e.<br />
corporate knowledge bases, open web content and<br />
user-generated content. DCM research in personalised<br />
multilingual content has resulted in significant<br />
publications as well as industry collaboration. DCM<br />
has progressed both the personalisation and dynamic<br />
aggregation of user-generated content (e.g. blogs,<br />
forum posting, messages), corporate content (e.g.<br />
corporate product manuals, how-to guides, technical<br />
documentation), and open content harvested from the<br />
open web. This research has seen the development<br />
of demonstrators and international evaluation of the<br />
technology across multiple languages and countries. This<br />
research and evaluation has resulted in an international<br />
prize for DCM researcher Ben Steichen and his<br />
supervisor Prof. Vinny Wade (LRC Best Thesis Award<br />
<strong>2012</strong>).<br />
Likewise, the ‘Personalisation as a Service’ research in<br />
DCM 3 has reached maturity with Invention Disclosures<br />
being lodged and evaluation of demonstrators across<br />
multiple third party websites being conducted.<br />
A key impact of DCM 3 research has been the industry<br />
engagement in the evaluation of the technology and the<br />
resultant planning for two <strong>CNGL</strong> spinout companies.<br />
These spinout companies are in the area of Multilingual<br />
Personalised Customer Care (Emizar www.emizar.com)<br />
and Personalisation-as-a-service (Wripl www.wripl.com).<br />
The Wripl cross-site personalisation system has<br />
undergone several refinements, and plugins for major<br />
content management systems platforms including<br />
Wordpress have been released. The Wripl team<br />
has concluded its SFI TIDA-funded programme and<br />
is collaborating with Enterprise Ireland on further<br />
developing the company and its product. From the<br />
research perspective, this work has been successfully<br />
evaluated in several experiments, and it is expected that<br />
a PhD will be completed in early 2013.<br />
The Emizar project will complete its SFI TIDA feasibility<br />
study in 2013, and is collaborating with Enterprise Ireland<br />
to develop the product and company. A full launch and<br />
technology licence agreement are expected in early<br />
2013.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 53<br />
Neil Peirce presents his PhD work at the national final of the Thesis<br />
in 3 competition<br />
Industry Engagement<br />
Two of DCM’s PhD students (Rami Ghorab and<br />
Jinming Min) successfully completed their internships<br />
in Microsoft Ireland in area of multilingual query<br />
and personalisation. This Personalised Multilingual<br />
Information Retrieval demonstrator showcased the<br />
enhanced retrieval performance for Microsoft’s Clip Art<br />
collection, and has fully integrated with Microsoft Bing<br />
search and machine translation tools.<br />
Additionally, there is close and on-going collaboration<br />
with Symantec in conducting user trials for the<br />
Personalised Multilingual Customer Care portal. This<br />
has led to <strong>CNGL</strong> DCM researchers making multiple<br />
presentations to senior vice presidents within Symantec<br />
and the planning for a comprehensive trial of <strong>CNGL</strong><br />
technology using Symantec customer care content.<br />
Research in DCM has led to invention disclosures and<br />
one patent application in <strong>2012</strong>. As previously mentioned,<br />
two spinout companies have been planned for <strong>CNGL</strong>,<br />
namely Emizar and Wripl. These spinouts will involve<br />
technologies developed in DCM2 and DCM3.<br />
SFI TIDA funding was sought for a third project<br />
(Linguabox) to investigate the potential for DCM2<br />
technology to support the dynamic slice and rightsizing<br />
of multimedia and user-generated content for learning.<br />
This application was successful and work will begin in<br />
2013.<br />
‘Team wripl’ visits Silicon Valley to connect with local entrepreneurs<br />
and companies. The visit was hosted by the Irish Technology<br />
Leadership Group (ITLG) thanks to wripl’s joint win in the SFI/TIDA<br />
Entrepreneurship course.<br />
Achievements<br />
} DCM research published in over 30 international<br />
journals and conferences in <strong>2012</strong>. Conference<br />
highlights included ACM Hypertext, COLING <strong>2012</strong>,<br />
AAAI <strong>2012</strong>, ACL <strong>2012</strong>, CIKM <strong>2012</strong>, TPDL <strong>2012</strong>, SIGIR<br />
<strong>2012</strong>. Journal highlights include ACM CSUR, UMUAI<br />
and Journal IR publications.<br />
} DCM researchers were involved in the organisation of<br />
various important IR and Personalisation events and<br />
workshops during <strong>2012</strong> including FIRE <strong>2012</strong>, NOMS<br />
<strong>2012</strong>, UMAP <strong>2012</strong>, as well as planning for SIGIR 2013<br />
to be hosted in TCD.<br />
} Prof. Vincent Wade was invited to deliver the keynote<br />
address at ICWL <strong>2012</strong> on Personalisation across Open<br />
Content and Social Media.<br />
} Three PhD students graduated in <strong>2012</strong> in the areas<br />
of Multilingual IR, Adaptive Systems, and Multilingual<br />
Personalisation. A further two students submitted<br />
PhD theses which are currently under examination.<br />
} Two patents are pending from DCM research in the<br />
areas of dynamic content slicing and personalisation<br />
} Significant industry engagement with joint trials<br />
and joint evaluations of multilingual personalisation<br />
technology, e.g. Symantec, Microsoft.
54<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
DIGITAL CONTENT MANAGEMENT<br />
} Two SFI TIDA grants were awarded to DCM Principal<br />
Investigators Prof. Vincent Wade and Prof. Owen<br />
Conlan for research in personalisation.<br />
} Prof. Vincent Wade established the Enterprise Ireland<br />
Technology Centre for Technology Enhanced Learning<br />
called Learnovate Centre. This centre, which is allied<br />
to <strong>CNGL</strong>, focuses on content technologies and<br />
innovate communication tools for informal learning in<br />
schools, university and corporate training. DCM has<br />
established close collaboration with the new Centre<br />
and is a means of exploiting <strong>CNGL</strong> research results in<br />
the vertical sector of Learning and Education.<br />
} Dr. Tony Veale was Local Chair for the 3rd<br />
International Conference on Computational Creativity<br />
(ICCC <strong>2012</strong>) at UCD.<br />
} Dr. Tony Veale delivered: a keynote (on creative<br />
uses of WordNets) at an event in Oslo hosted by the<br />
National Library of Norway, an invited talk on affective<br />
stereotype acquisition at the ILIKS event in Toulouse,<br />
and 1-week invited course on linguistic creativity at<br />
an autumn school on Computational Creativity in<br />
Helsinki.<br />
} Two spinout companies – Emizar (Aggregration and<br />
Personalisation of Multilingual open content, usergenerated<br />
content and corporate content for selfservice<br />
customer care) and Wripl (Personalisation-asa-service)<br />
– were progressed for spinout in 2013.<br />
} A new SFI TIDA award has been won by Prof. Wade<br />
for the DCM research in automated slicing of content<br />
for reuse and repurposing. This award will further<br />
the development of the technology for informal and<br />
automated e-learning content.<br />
} Two industry internships were successfully completed<br />
in Microsoft by PhD students from TCD and DCU.<br />
Plans<br />
<strong>CNGL</strong>II will be led by Prof. Wade and DCM will be<br />
principally separated into three research themes in<br />
the new <strong>CNGL</strong>II, namely Personalisation; Delivery and<br />
Interaction; and Search and Discovery. <strong>CNGL</strong>II will<br />
progress the research topics from DCM and build on the<br />
success of the DCM research.<br />
Prof. Séamus Lawless pitched Emizar’s investor-ready technology to<br />
hundreds of potential investors and business partners at Enterprise<br />
Ireland’s Big Ideas Showcase <strong>2012</strong>. Emizar was subsequently profiled<br />
in the Sunday Business Post newspaper<br />
<strong>CNGL</strong>I has been granted a no-cost extension to<br />
complete and provide rigorous evaluation of the DCM<br />
technology. This work (January – November 2013)<br />
will see the completion of a number of DCM PhDs as<br />
well as the trialling and evaluation of DCM technology,<br />
specifically in the areas of the Personalised Multilingual<br />
Information Retrieval Framework, Multilingual User<br />
Models, Personalised Multilingual Customer Care trial,<br />
and Evaluation Framework and Tools for Adaptive<br />
(Personalised) Systems.<br />
Conclusion<br />
<strong>2012</strong> was an extremely productive year for DCM in<br />
the two aspects crucial to <strong>CNGL</strong>, namely scientific<br />
excellence and industry impact. DCM researchers have<br />
also maintained and strengthened their leadership in<br />
the respective research areas, and we have seen DCM<br />
PhD students complete their Doctorates and progress<br />
to positions in industry and academia.
Next Generation<br />
Localisation
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 57<br />
Strand Name: Next Generation Localisation<br />
AREA CO-ORDINATOR:<br />
MR. REINHARD SCHÄLER<br />
Participant Names and Affiliation<br />
Industrial Collaborators<br />
International Collaborators<br />
Dr. Fred Hollowood<br />
Mr. Enda McDonnell<br />
Mr. Phil Ritchie<br />
Mr. Dag Schmidtke<br />
Symantec<br />
Alchemy Software<br />
Development<br />
VistaTEC<br />
Microsoft<br />
Dr. Lynne Bowker<br />
Mr. José Eduardo de Lucca<br />
Prof. Patrick Hall<br />
University of Ottawa,<br />
Canada<br />
Universidade Federal<br />
de Santa Catarina, Brazil<br />
Professor Emeritus,<br />
Open University, UK<br />
Dr. James Hogan<br />
Queensland University<br />
of Technology, Australia<br />
Mr. Mahesh Kulkarni<br />
CDAC Pune, India<br />
Ms. Stefanie Scheeder<br />
The Rosetta Foundation,<br />
Germany<br />
Mr. Francis Tsang<br />
Adobe, USA<br />
Faculty<br />
Dr. Jim Buckley University of Limerick LOC3 Leader<br />
Ms. Yvonne Cleary University of Limerick LOC1<br />
Mr. J.J. Collins University of Limerick LOC2 Leader<br />
Dr. Chris Exton University of Limerick LOC1 Leader<br />
Dr. Dorothy Kenny Dublin City University LOC2<br />
Dr. Liam Murray University of Limerick LOC2<br />
Dr. Sharon O’Brien Dublin City University LOC2<br />
Mr. Reinhard Schäler University of Limerick LOC1, LOC2, LOC3, PI<br />
Postdoctoral Researchers<br />
Dr. David Filip University of Limerick LOC1<br />
Dr. Eoin Ó Conchúir University of Limerick LOC3<br />
Dr. Ian O’Keeffe University of Limerick LOC2
58<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
PhD Students<br />
Mr. Solomon Gizaw University of Limerick LOC3.2<br />
Mr. Rajat Gupta University of Limerick LOC2.4<br />
Mr. Joss Moorkens University of Limerick LOC2.2<br />
Ms. Lucía Morado Vázquez University of Limerick LOC1.2<br />
Mr. Aram Morera-Mesa University of Limerick LOC3.3<br />
Mr. Naoto Nishio University of Limerick LOC3.1<br />
Mr. Lorcan Ryan University of Limerick LOC1.1<br />
Mr. Asanka Wasala University of Limerick LOC2.1<br />
Funding<br />
Funding from SFI<br />
<strong>CNGL</strong> (07/CE/I1142): €306,611
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 59<br />
Research Strand Overview:<br />
Next Generation Localisation<br />
(LOC)<br />
Since its inception in the 1980s, the localisation industry<br />
has been strongly anchored to the expertise and quality<br />
of both industrial and academic assets in Ireland. Many of<br />
the key, defining, elements of the current industry either<br />
originated with or have strong roots in the pioneering<br />
activities of Irish players in the industry. Indeed, to this<br />
day, research is taking place in Ireland in industry and<br />
academia, as well as industrial academic ventures such as<br />
the <strong>CNGL</strong> that is paving the way for the industry to adapt<br />
and evolve as it moves further, deeper into the 21st<br />
century and the challenges that await it going forward.<br />
The Next Generation Localisation (LOC) track in <strong>CNGL</strong><br />
has a mission to produce world-leading research in (i)<br />
localisation content analysis, (ii) localisation component<br />
technologies evaluation, and (iii) service-oriented<br />
localisation architecture solutions, in collaboration with<br />
the academic and the industrial partners in <strong>CNGL</strong> and<br />
beyond, validated by user communities in (for-profit and<br />
not-for-profit) enterprise localisation.<br />
Taking a view that reaches beyond traditional avenues<br />
for profit and expansion, the LOC track focuses on<br />
using this research to ensure that Ireland retains its<br />
status in the field of localisation as this is, as stated by<br />
the independent international review panel, which<br />
conducted <strong>CNGL</strong>’s Mid-Term Review in July 2011, “a<br />
key industry for Ireland” and Ireland “must remain at the<br />
technological forefront in order to retain and grow this<br />
highly remunerative activity”.<br />
LOC’s view that flexible architectures, as investigated<br />
by LOC researchers in the Service-Oriented Localisation<br />
Architecture Solution (SOLAS), are key to future<br />
innovative technology frameworks supporting emerging<br />
and future localisation scenarios was also confirmed by<br />
another independent international review panel, which<br />
conducted <strong>CNGL</strong>’s Final Review (July <strong>2012</strong>) as they<br />
commented that “the SOLAS architecture offers a solid<br />
reference implementation that addresses integration and<br />
workflow issues that companies like Adobe, Dell, and<br />
Intel are currently trying to address on their own”.<br />
Having recruited four “additional high-end professional<br />
programmers” and allocating “more budgets to<br />
workflow”, as recommended by the reviewers in 2011,<br />
LOC work on the development of the Service-Oriented<br />
Localisation Architecture Solution (SOLAS) has continued<br />
apace. With work splitting the solution into two distinct<br />
frameworks, SOLAS Match and SOLAS Productivity, LOC<br />
is developing, in parallel, solutions that will cover both<br />
the needs of the traditional return on investment-based<br />
localisation industry, and the increasingly important<br />
non-profit and non-market localisation communities.<br />
Indeed, it is this approach that has led the independent<br />
review panel to note in the <strong>CNGL</strong> final review that<br />
LOC and by extension The Rosetta Foundation spinoff<br />
are pioneering “a novel, comprehensive localization<br />
model for organizations seeking to translate content<br />
for underserved communities. The panel feels this<br />
accomplishment has great societal impact that<br />
transcends the boundaries of Ireland and even the EU.”<br />
The views of the international experts on both of these<br />
panels reflect the view of industry thought leaders<br />
consulted by LOC at conferences, such as GALA,<br />
Localisation World and, most recently, the LRC’s 17th<br />
<strong>Annual</strong> International and Localisation Conference.<br />
The following is a brief summary of the LOC track’s vision<br />
and goals agreed and realised in <strong>2012</strong>.<br />
Vision<br />
We empower innovative community and social<br />
localisation efforts driving the most significant growth<br />
opportunity for the industry.<br />
Goals<br />
} Provide content authors with feedback on the quality<br />
(localisability) and re-usability of their content,<br />
demonstrating the impact of good/bad quality source<br />
content on the localisation effort, specifically in the<br />
context of user-generated content<br />
} Assess and evaluate component technologies for<br />
SOLAS, demonstrating the suitability and adaptability<br />
requirements for components, specifically in the<br />
context of community and social localisation<br />
enterprise
60<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
} Develop the Service-Oriented Localisation<br />
Architecture Solution (SOLAS) as<br />
1. A demonstrator and testbed for innovative<br />
localisation solutions, demonstrating the<br />
innovative aspects of this framework in relation to<br />
existing mainstream paradigms, especially in the<br />
context of the emerging collaborative and social<br />
localisation enterprise.<br />
2. A unique suite of open source technologies that<br />
will be made available to service the needs of all<br />
clients that would require flexible, efficient and<br />
fully standards compliant localisation solutions.<br />
3. The de-facto localisation and translation<br />
technology for not-for-profit and development<br />
localisation activities.<br />
} Integrate <strong>CNGL</strong>, third party and open source<br />
components into SOLAS<br />
} Publish research results in world-leading journals, both<br />
related and localisation-specific, according to agreed<br />
targets<br />
} Continue to provide a forum for the publication of<br />
high-impact, innovative and scientific localisation<br />
research through the indexed, peer-reviewed and<br />
dedicated Localisation journal, Localisation Focus –<br />
The International Journal of Localisation<br />
} <strong>Report</strong> on research activities and solicit feedback<br />
at world-leading conferences, both related and<br />
localisation-specific, according to agreed targets<br />
} Actively contribute to and provide leadership<br />
for international localisation initiatives (industry<br />
associations, standards groups, conferences)<br />
} Expand the open source SOLAS code repository<br />
} Build large and significant developer and user<br />
communities around the LOC effort within <strong>CNGL</strong> and<br />
beyond, according to agreed targets<br />
} Demonstrate the industrial value and impact of the<br />
LOC research by active user engagement and trials<br />
with reference to agreed metrics<br />
} Work with <strong>CNGL</strong> towards a re-allocation of budgets to<br />
support a targeted SOLAS demonstrator development<br />
effort<br />
Fundamental Research Barriers and<br />
Methodologies to Address Them<br />
In order to convince large multinational content<br />
publishers to join open standards-based industrywide<br />
initiatives, small and medium-sized publishers<br />
to invest in state-of-the-art technologies, and nonprofit<br />
organisations to take advantage of a localisation<br />
framework, what is required is a solution that is scalable,<br />
modularised, interoperable and affordable. What is<br />
required is a demonstrator framework capable of<br />
delivering proof that the vision of an open localisation<br />
framework can be achieved. The risks involved in<br />
building such a system are considerable. Leading<br />
global management systems have been developed by<br />
companies such as Idiom and GlobalSight (Ambassador).<br />
However, while they aimed to be comprehensive, they<br />
were not; for example, some services such as machine<br />
translation (MT) never became part of the core offering<br />
of these systems; additional service modules required by<br />
customers can generally not be integrated (and even if<br />
they can, then only backed up by significant investment);<br />
and re-configuration of workflows and adaption to<br />
increasingly dynamic localisation environments are often<br />
connected with prohibitive costs. While these systems<br />
attracted significant investment for their development<br />
(in the region of $50 million in some cases), they never<br />
realised their projected market potential and return on<br />
investment.<br />
Although existing systems demonstrate a good<br />
understanding of basic technologies required for a stable<br />
corporate localisation framework, our research has<br />
shown that their overall architecture is not suitable as<br />
the backbone for a modularised, extensible and dynamic<br />
framework, to enable seamless data flows, and to allow<br />
for the automatic configuration and execution of tasks.<br />
During Years 3 and 4 of <strong>CNGL</strong>, the original <strong>CNGL</strong> Bulk<br />
Localisation Workflow (BLW) demonstrator and the work<br />
within the Next Generation Localisation research area<br />
produced a first version of a service-oriented localisation<br />
architecture solution (SOLAS) that addresses the need<br />
for an open, highly-configurable, loosely-coupled<br />
aggregation of heterogeneous services that can meet the<br />
varying demands of the enterprise, SMEs and the nonprofit<br />
sector. At the same time, it facilitates organisations<br />
with software engineering competencies to leverage the<br />
provided infrastructure encapsulated in the demonstrator
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 61<br />
framework and tailor it to their specific needs through<br />
further component development. During Year 5 of<br />
<strong>CNGL</strong>, work on this demonstrator was branched off<br />
into two parallel streams that would allow more rapid<br />
development, deployment and testing and also coverage<br />
of additional use-case scenarios and closer integration of<br />
cross-<strong>CNGL</strong> research area component technologies, as<br />
well as connections with third party technologies, such as<br />
commercial MT and established web technology systems.<br />
This branching allows the two resultant technologies,<br />
SOLAS Productivity and SOLAS Match, to move even<br />
further away from the family of platforms whose large<br />
footprints have often proven cost prohibitive, and refine<br />
the Service-Oriented Architecture (SOA) philosophy that<br />
enables the development of a component marketplace<br />
for the platform. SOLAS Productivity makes use of a<br />
standardised data container, open web service APIs,<br />
and a common orchestration and process management<br />
module, which connect to any number of component<br />
technologies developed by academic and industrial<br />
partners within <strong>CNGL</strong> as well as with third party<br />
technologies and tools. SOLAS Match provides groundbreaking<br />
and intuitive technology that allows for the<br />
seamless, and user friendly, matching of community<br />
translation tasks with volunteer translators. This open<br />
source technology revolutionises the distribution<br />
and management of translation tasks using simplified<br />
web interfaces matched with sophisticated back-end<br />
technologies. The use of SOLAS Match increases speed<br />
and reduces overhead for these translation tasks and<br />
as such is perfectly positioned to be adopted by any<br />
number of not-for-profit and non-market localisation<br />
organisations.<br />
In SOLAS technologies, researchers have gained access<br />
to a common standards-based and interoperable open<br />
source localisation eco-system for their research, similar<br />
to those available to the MT communities with Moses<br />
and to the speech communities with platforms such<br />
as the Festival Speech Synthesis System or the MuSE<br />
speech technology platform. SOLAS is the first working<br />
innovation platform developed in its entirety within<br />
<strong>CNGL</strong>.<br />
Research Strand Overview: Next Generation<br />
Localisation<br />
In LOC, research concentrates on the improvement<br />
of key areas of localisation automation, such as the<br />
construction of a common, standards-based data<br />
model to develop, process and maintain localisation<br />
knowledge (LOC1) (Ryan, 2010; Morado Vázquez<br />
and Mooney, 2010; Anastasiou and Morado Vázquez,<br />
2010; Anastasiou, 2010); the interoperability of suitable<br />
tools and technologies, the assessment of quality<br />
measurement methodologies, and the facilitation of<br />
crowdsourcing and collaboration (LOC2) (Nishio et al.,<br />
2010; Wasala et al., 2010; Gupta and Aouad, 2010; Exton<br />
et al., 2010); and the modelling of intelligent localisation<br />
processes, workflows and process management (LOC3)<br />
(Filip and O’Conchúir, 2011; Lenker et al., 2010; Lenker,<br />
2010; Lenker and Anastasiou, 2010). The availability<br />
of a demonstrator system has been a pre-requisite for<br />
advancing this research and for measuring its success.<br />
The Service-Oriented Localisation Architecture Solution<br />
(SOLAS) has become an important focus for research in<br />
LOC for several reasons (Aouad et al., 2011; Ó Conchúir,<br />
2011). It offers a common standards-based (meta-)<br />
data container, web services API for Next Generation<br />
Localisation communication and connectivity, and an<br />
orchestration and process management module all<br />
shared across the framework (Morado and del Rey, 2011;<br />
Morado et al., 2011). Component technologies from<br />
industrial partners and third parties as well as research<br />
components coming from across <strong>CNGL</strong> (Wasala et<br />
al., 2011) can be integrated into SOLAS with relative<br />
ease, demonstrating in very real terms the benefits of<br />
individual components in an end-to-end localisation<br />
workflow, as well as providing a showcase for cross-<strong>CNGL</strong><br />
industrial and academic collaboration. While SOLAS<br />
origins lie in our research around the development of a<br />
demonstrator system for bulk localisation workflows, it<br />
is transcending this narrow field and is aiming to offer<br />
frameworks for a whole open localisation eco-system,<br />
addressing the needs not just of commercial large and<br />
medium-sized enterprises but also those of non-profit<br />
organisations which require solutions that can easily<br />
adapt to new languages, actors and workflows in a<br />
highly collaborative and dynamic environment.
62<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
Initially, against this background, the main objective<br />
was to develop a heterogeneous loosely coupled<br />
platform. This is achieved through Component-Based<br />
Development (CBD) techniques where SOLAS integrates<br />
components that are connected through web services<br />
to realise a Service-Oriented Architecture (SOA). These<br />
concepts are also capable of operating in a stand-alone<br />
mode, further increasing the flexibility of this approach.<br />
The architecture also permits the easy integration of any<br />
future component developments from across <strong>CNGL</strong>.<br />
(XLIFF TC), GALA, and Localization World, including<br />
diversification of funding (Canada Research Council)”. A<br />
Partner Group with The Rosetta Foundation was created<br />
to raise the visibility and to develop the involvement of<br />
enterprises (for-profit and not-for-profit) collaborating and<br />
supporting community translation efforts, to find ways<br />
to connect this effort to economic criteria that resonate<br />
with industrial partners, to look for new enterprise<br />
partners, and to seek alliances with non-profit and other<br />
organisations to promote these efforts.<br />
In this regard the initial SOLAS technology was<br />
demonstrated during the <strong>CNGL</strong> SFI Mid-Term Review<br />
of 2011, at the Localisation Innovation Showcase 2011<br />
at Croke Park, as well as at the Autumn Scientific<br />
Committee Meeting in TCD. It generated significant<br />
interest from industrial collaborators and invited industry<br />
representatives, multinational publishers, SMEs, and the<br />
non-profit and government sector.<br />
However, as research and development progressed on<br />
SOLAS in <strong>2012</strong>, and as collaboration with The Rosetta<br />
Foundation deepened, leading to the granting of an<br />
exclusive licence for SOLAS Match (aka Translation<br />
eXchange) to The Rosetta Foundation by UL, it became<br />
apparent that there was potential for more than what<br />
was detailed in this initial offering. The decision was<br />
made to branch SOLAS development into two distinct<br />
yet connectable technologies. SOLAS Productivity, which<br />
is a continuation of the initial technology path as detailed<br />
above and SOLAS Match, a new paradigm for enabling<br />
volunteer translation and localisation through intuitive<br />
and user-friendly interfaces backed by dynamic and<br />
powerful backend technologies.<br />
The collaboration with The Rosetta Foundation and the<br />
move of <strong>CNGL</strong> IP generated by LOC researchers to the<br />
Foundation with its 2,600+ volunteers has been very<br />
successful (Wasala et al., 2011). Uptake and trials of<br />
<strong>CNGL</strong> output by the Foundation provide very valuable<br />
feedback to <strong>CNGL</strong> researchers and evidence of the value<br />
of this output to potential commercial parties, especially<br />
in the SME sector. As noted by the international<br />
independent review panel in <strong>CNGL</strong>’s Final Review of<br />
<strong>CNGL</strong> (July <strong>2012</strong>), further evidence of the value of this<br />
collaboration comes from the increase of “International<br />
reach and exposure to government and industry outside<br />
of the usual Ireland and EU-centric bodies: the creation<br />
of AGIS conferences, growing presence at W3C, OASIS<br />
Other Relevant Work in the Field and How<br />
This Compares<br />
There are commercial efforts under way to develop<br />
proprietary automated localisation platforms integrating<br />
process automation and management functionality<br />
with localisation and translation automation, such as<br />
terminology management, translation memory systems<br />
and machine translation. Large multinational content<br />
publishers, among them Oracle, SAP and Microsoft, have<br />
demonstrated the commercial viability of such solutions<br />
with their proprietary in-house solutions. However, they<br />
have also shown the limits of proprietary solutions and<br />
have started exploring ways to connect their proprietary<br />
systems with third party tools and technologies; one<br />
example is that of the open XML-based Localisation File<br />
Format (XLIFF) and the Microsoft proprietary Localisation<br />
Exchange Format (LCX) as reported at the LRC XV<br />
conference in 2010 by Microsoft and LOC researchers<br />
(Wasala et al., 2010). Oracle also presented its usage<br />
of XLIFF in its localisation strategies at the LRC XVI<br />
conference in 2011. At FEISGILT <strong>2012</strong> it became known<br />
that, based in large part upon the research initiated by<br />
Wasala et al. (2010 and <strong>2012</strong>), Microsoft will be adopting<br />
XLIFF as a primary file format going forward.<br />
In this regard <strong>CNGL</strong> research is at the forefront of many<br />
industry concerns with the SOLAS platform representing<br />
a head-start with its highly innovative approach to<br />
addressing a wide variety of localisation requirements<br />
that, as noted by the <strong>2012</strong> international independent<br />
review panel, “companies like Adobe, Dell, and Intel are<br />
currently trying to address on their own”.<br />
SOLAS is the first open, standards-based framework of its<br />
kind in the localisation space anywhere in the world. It<br />
already provides an integrated plug-and-play framework<br />
for configurable component technologies to interoperate,<br />
and as it continues to be developed and refined, SOLAS
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 63<br />
Productivity will allow the seamless connection and<br />
integration of complementary technologies into a core,<br />
functional and industrial-scale platform which itself is<br />
highly modular and extensible, while SOLAS Match will<br />
redefine the technological landscape of volunteer and<br />
development localisation with its open source translation<br />
space.<br />
Achievements<br />
Work Package LOC1<br />
The overall aim of LOC1 is to embed internationalisation<br />
and localisation issues into the design and development<br />
cycle of digital content production (Ryan, 2010), moving<br />
localisation up the value chain. The Work Package<br />
is divided into two sections, LOC1.1 Digital Content<br />
Production for Localisation and LOC1.2 Localisation<br />
Knowledge – Capture, Organisation, Use.<br />
LOC1 has produced highly innovative research results<br />
into an XLIFF-based (meta-)data container tasked<br />
with identifying, classifying and leveraging localisation<br />
knowledge encapsulated in previous processes<br />
(Anastasiou and Morado Vázquez, 2010). The result is<br />
a localisation memory container (LMC), conceptually<br />
similar to the established translation memory technology,<br />
but focused directly on localisation rather than “just” on<br />
translation requirements (Morado Vázquez and Mooney,<br />
2010). The LMC will improve the quality and consistency<br />
of the localisation process itself and minimise errors in<br />
the final product. This work is closely linked to ILT and<br />
SF2 (data access, exchange and integrity issues).<br />
LOC1 researchers have also produced highly innovative<br />
research into the benefits of the development of a<br />
(meta-) data container, the Localisation Knowledge<br />
Repository (LKR) (Ryan, 2010; Ryan, 2011). The highly<br />
innovative LKR developed as part of this research is<br />
based on a localisation taxonomy that allows the storage,<br />
maintenance and reuse of localisation-relevant data<br />
during content development.<br />
Lucía Morado Vázquez (LOC1.2) successfully passed her<br />
PhD viva in September. Lucía completed her PhD under<br />
the supervision of Reinhard Schäler. Lucía has now taken<br />
up a postdoctoral position at the Multilingual Information<br />
Processing Department at the Faculty of Translation and<br />
Interpretation, University of Geneva.<br />
Pictured at the launch of UL’s MSc in Multilingual Computing and<br />
Localisation co-hosted by the UN in Africa are (L-R) Solomon Gizaw,<br />
<strong>CNGL</strong>, Reinhard Schäler, <strong>CNGL</strong>, Prof. Don Barry, President, University<br />
of Limerick and Ms. Aida Opoku-Mensah, United Nations Economic<br />
Commission for Africa (UNECA)<br />
The research into internationalisation and localisation<br />
knowledge leveraging aims to increase the quality,<br />
consistency and accessibility of content throughout the<br />
localisation process. It addresses the needs for standards<br />
and guidelines to content developers. In an environment<br />
that is increasingly dealing with (often) low quality, usergenerated<br />
content, this will facilitate the preparation<br />
of content that is more usable and readable for source<br />
language speakers, and more translatable for localisation<br />
professionals and technologies. The guidelines are<br />
sourced from both academic research and industrial<br />
best practices. LOC1 also has two representatives on the<br />
XLIFF Technical Committee.<br />
Three articles by Lorcan Ryan – on ‘Global Authoring<br />
Techniques’, ‘Global Diversity and Localistion Issues’<br />
and ‘Global Authoring Resources’ – were published in<br />
Communicator during <strong>2012</strong>.<br />
Work Package LOC2<br />
The Work Package is divided into four sections, LOC2.1<br />
Addressing the Problem of Interoperability in Localisation<br />
Process Management, LOC2.2 Technology Evaluation<br />
– The User Perspective, LOC2.3 Service Descriptor<br />
Development (Web Services) and LOC2.4 Collaborative<br />
Localisation Platform: Crowdsourcing.
64<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
LOC2 addresses quality assessment of translations in<br />
a crowdsourced and distributed localisation context<br />
(Gupta and Aouad, 2010; Exton et al., 2010; Anastasiou<br />
and Gupta, 2011). The specification of evaluation metrics<br />
is specifically targeting quantitative and qualitative<br />
evaluation of translation memories (TMs) in order<br />
to verify the existence of inconsistency propagation<br />
(Moorkens, 2011a; Moorkens, 2011b). It also addresses<br />
more general metrics in evaluation methodologies<br />
throughout the localisation process. Joss Moorkens<br />
successfully defended his PhD thesis, entitled “Measuring<br />
Consistency in Translation Memories: A Mixed-Methods<br />
Case Study”, in July. Joss was supervised at DCU by<br />
Dr. Dorothy Kenny and Dr. Sharon O’Brien.<br />
Finally, research is also being carried out in the area of<br />
cultural adaptation, with a particular focus on multimedia<br />
content, and how this might be supported in interchange<br />
formats such as XLIFF (O’Keeffe, 2011b).<br />
Dr. David Filip played a central role in the organisation<br />
and delivery of the inaugural FEISGILTT event, which<br />
took place on 16th-17th October in Seattle, USA. The<br />
FEISGILTT <strong>2012</strong> (Federated Event for Interoperability<br />
Standardization in Globalization, Internationalization,<br />
Localization, and Translation Technologies) brought<br />
together experts from the language services industry,<br />
R&D labs that are exploring new interoperability<br />
solutions, and the various standards bodies instrumental<br />
in making such solutions accessible as conformable<br />
specifications. It offered a neutral venue where these<br />
stakeholders exchanged knowledge and experiences<br />
and discussed future directions for addressing the<br />
interoperability challenges facing the industry. FEISGILTT<br />
incorporated the 3rd International XLIFF Symposium.<br />
Lucía Morado Vázquez, Aram Morera Mesa, Dr. Chris Exton and Karl<br />
Kelly pictured at the LRC Summer School <strong>2012</strong>. The theme of this year’s<br />
Summer School was Mobile Application Development and Localisation<br />
LOC2 is addressing component and data interoperability<br />
in order to allow an efficient information exchange<br />
specifically through the specification and use of<br />
standardised metadata (Wasala et al., 2010). Research<br />
from this work package continues to drive the<br />
development of several components within SOLAS as<br />
well as feeding back into ILT (development of automated<br />
translation technologies) and SF2. LOC2 is also specifying<br />
templates for supporting service descriptions necessary<br />
for Service Level Agreements between localisationoriented<br />
service providers and consumers. Web Services<br />
contract negotiation and agreement protocols will then<br />
be used to map abstract localisation units into concrete<br />
services and components (Nishio et al., 2010).<br />
Dr. David Filip has also led work on Internationalization<br />
Tag Set (ITS) Version 2.0 as co-chair of the<br />
MultilingualWeb-LT (Language Technology) Working<br />
Group. The Working Group aims to develop new W3C<br />
(World Wide Web Consortium) standards to support<br />
the translation and adaptation of Web content to local<br />
needs, from its creation through to its delivery to end<br />
users. By so doing, the new standards will help to remove<br />
language barriers to international trade and facilitate the<br />
free flow of information across language borders.<br />
At the <strong>CNGL</strong> Localisation Innovation Showcase in<br />
Limerick in September, Dr. David Filip demonstrated<br />
the <strong>CNGL</strong> demonstrator system CMS-LIONSolas Integration: Full Content Lifecycle Metadata<br />
Interoperability TestBed. Developed in collaboration with<br />
the SF track, this is a unique platform for testing complex<br />
metadata designs spanning process areas over the full<br />
multilingual content life cycle. David showed how a RDFbased<br />
provenance store is used between Web Content<br />
Management System (CMS) and XLIFF-based translation<br />
workflows. This demonstrates use cases for the roundtripping<br />
of Internationalisation Tag Set (ITS) metadata<br />
between content generation and publication in HTML5/<br />
XML and localisation processes in XLIFF. This therefore<br />
provides direct testable input into current standardisation<br />
working groups developing ITS, XLIFF and HTML5.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 65<br />
Dr. Ian O’Keeffe, postdoctoral researcher, left University<br />
of Limerick during Quarter 3 <strong>2012</strong>. He is now Manager<br />
of Software Engineering/Development at Fidelity<br />
Investments. Also departing University of Limerick<br />
in Quarter 3 was postdoctoral researcher Dr. Eoin Ó<br />
Conchúir. Eoin is now participating in the New Frontiers<br />
entrepreneur development programme.<br />
The overall aim of LOC3 is to focus on localisation<br />
workflow re-engineering and recommendation, in<br />
addition to empirically defining relevant attributes and<br />
terms in generating personalised localised content<br />
(Morera, Aouad et al., 2011a; Morera, Aouad et al.,<br />
2011b). This research has conducted an empirical<br />
evaluation of proposed localisation workflows against<br />
current industry practice (Lenker et al., 2010; Lenker,<br />
2011a; Lenker, 2011b).<br />
Dr. Thomas Arend, International Product Lead at Twitter addresses the<br />
LRC Conference <strong>2012</strong> on the theme “Social Localisation at Twitter –<br />
translating the world in 140 Characters”<br />
Joss Moorkens submitted his thesis in <strong>2012</strong>, reporting<br />
on the outcome of Measuring Consistency in Translation<br />
Memories: A Mixed-Methods Case Study. His work<br />
questioned the widely-held assumption that humanmade<br />
translation memories lead to higher quality, as<br />
well as faster and cheaper translations as they provided<br />
access to a large body of high-quality bilingual or<br />
multilingual language resources produced by professional<br />
human translators. The result of his research involving<br />
an examination of large volumes of authentic translation<br />
memories acquired from <strong>CNGL</strong> partners, as well as<br />
qualitative research involving industry experts, clearly<br />
corrects this view and suggests caution. Joss’s thesis has<br />
already led to enquiries by and significant interest from<br />
academia and industry alike.<br />
Work Package LOC3<br />
The LOC3 Work Package is divided into three<br />
sections, LOC3.1 Localisation Workflow Specifications<br />
for Enterprise Localisation; LOC3.2 Taxonomy of<br />
Personalisation for Generating Personalised Content,<br />
and LOC3.3 Localisation Workflow Mining.<br />
Another focus is the research, design and experimental<br />
implementation of a workflow recommendation system.<br />
This system takes into account a list of the most relevant<br />
tasks in a localisation process, and uses a decision tree<br />
to select those that should be part of the workflow<br />
according to the specific quality requirements, time<br />
constraints, and cost constraints of the project on<br />
hand. Aram Morera has advanced his research on the<br />
identification and description of workflow patterns in<br />
social localisation, leading to a workflow recommender<br />
for specific social localisation scenarios, stretching from<br />
charitable, to non-profit, to for-profit approaches. The<br />
identification of these patterns has led to the discovery<br />
of serious shortcomings in current technologies which<br />
are being addressed by the SOLAS development team<br />
in LOC. It is expected that Aram will submit his thesis<br />
reporting on his research in the first half of 2013.<br />
The final area of research concerns personalisation<br />
issues in localisation. This involves considering individual<br />
preferences, gathered explicitly or implicitly, to go<br />
beyond the traditional ‘locale’ or ‘community interest’.<br />
The aim here is the creation of an empirical definition of<br />
personalisation attributes to demonstrate their feasibility<br />
and relevance for generating adequate personalised<br />
content. Research conducted within this work package<br />
includes the specification and the development of<br />
demonstrator crowdsourcing localisation environments<br />
and platforms (Lenker, 2010; Lenker and Anastasiou,<br />
2010). Solomon Gizaw has focused on the identification<br />
of communication patterns in cross-cultural information<br />
exchange and the application of personalisation<br />
techniques to a community-based translation and<br />
localisation environment. Solomon has analysed a large<br />
amount of actual user data from live communication<br />
exchanges and is planning to use the results of this<br />
analysis for the adaptation of SOLAS to the requirement<br />
and needs of specific users, rather than just locales.<br />
Solomon is planning to submit his thesis in the first half<br />
of 2013.
66<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
Industry Engagement<br />
LOC has closely collaborated with its main industrial<br />
partners, especially with Symantec, VistaTEC and<br />
Microsoft. Additional collaboration with international<br />
collaborators from The Rosetta Foundation also<br />
provided valuable input. Following the open sourcing<br />
of GlobalSight and the establishment of The Rosetta<br />
Foundation as a spin-off from the University of Limerick<br />
and <strong>CNGL</strong>, LOC also collaborated closely with The<br />
Rosetta Foundation and Welocalize. The engagement<br />
with industrial partners happened through site visits and<br />
one-to-one focused meetings between them and LOC<br />
researchers.<br />
In the SOLAS platform LOC supports the development<br />
of a <strong>CNGL</strong> open localisation platform that will, in<br />
addition to serving as a test bed for <strong>CNGL</strong> research in<br />
the different work packages, provide large multinational<br />
publishers with a solid case study for the viability of<br />
open standards for the negotiation of localisation<br />
data and localisation knowledge, thus providing them<br />
with the arguments necessary for a migration from an<br />
enclosed proprietary localisation scenario to a more<br />
open, interconnecting and interoperable framework. This<br />
platform will also encourage the uptake of localisation<br />
and process automation solutions by small and mediumsized<br />
enterprises, create new business opportunities and<br />
support the up-scaling of localisation offerings by smaller<br />
firms. More than 40 individuals and companies have so<br />
far joined the Dynamic Coalition for a Global Localisation<br />
Platform: Localisation for All, initiated by LOC and The<br />
Rosetta Foundation. We expect the platform to generate<br />
increased activity in sectors of the localisation industry<br />
(some first indicators show that growth by a factor of 100,<br />
in certain sectors, is not out of reach). Subsequently, we<br />
expect employment to rise in these sectors driven by a<br />
growth in translation and localisation as well as in the<br />
technical support and development area.<br />
The opportunities and the requirements for SOLAS,<br />
especially in the non-profit sector, are significant. In 2007,<br />
almost 1.5 million non-profits were registered with the US<br />
Tax Authorities and non-profits reported US$1.9 trillion in<br />
revenue and US$4.3 trillion in assets. From 1998 to 2005,<br />
non-profit employment grew 16.4 per cent, compared to<br />
6.2 per cent for overall employment in the US.<br />
It is in the nature of non-profit to deal with a multilingual<br />
and multicultural constituency. Surprisingly, no adequate<br />
technology is available to support their localisation and<br />
translation activities.<br />
In Ireland, the non-profit sector employs more than<br />
100,000 people with pay costs in the order of €3.5bn, has<br />
revenues of more than €6bn, and holds assets valued at<br />
more than €3.5bn. The sector is, perhaps, the principal<br />
source of social capital in Irish society, with more than<br />
560,000 people engaged as volunteers, and more than<br />
50,000 people engaged in their governance. In scale,<br />
the non-profit sector in Ireland is at least comparable to<br />
if not greater than agriculture or tourism as a source of<br />
employment.<br />
Research into SOLAS by <strong>CNGL</strong>, with subsequent<br />
development of this framework through The Rosetta<br />
Foundation, has the potential to turn Ireland into the hub<br />
for the internationally traded localisation and translation<br />
service provision of the world-wide non-profit sector, with<br />
revenues of more than US$1.9 trillion in the USA alone.<br />
Indeed, as the international independent review panel<br />
stated in its review of <strong>CNGL</strong> in July <strong>2012</strong>, “<strong>CNGL</strong>’s goal<br />
of making significant societal impact is illustrated by the<br />
potentially ground-breaking social localization concept,<br />
embodied in a spinout (The Rosetta Foundation).”<br />
Achievements (grouped by category)<br />
Operational Management and Governance<br />
} On-going research collaboration with <strong>CNGL</strong> ILT<br />
Track, e.g. in the area of MT; with DCM, e.g. in the<br />
area of personalisation; and SF, e.g. in the area of<br />
interoperability and metadata<br />
} On-going active engagement with LOC’s international<br />
collaborators<br />
} On-going engagement with world-leading standards<br />
associations, including Unicode and the world-wideweb<br />
consortium (W3C)<br />
} Participation and programme input to the world’s<br />
leading localisation events, including Localization<br />
World and GALA<br />
} Engagement with the non-profit sector, including<br />
the Irish umbrella body for non-profits, The Wheel,<br />
representing close to 2,000 Irish non-profit enterprises,<br />
and Dochas, representing the Irish-based overseas aid<br />
organisations<br />
} Collaboration with one of the developers of one of<br />
the most widely used open source localisation tools,<br />
Translate.za.org, and its principal Dwayne Bailey
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 67<br />
Research Programme<br />
LOC1<br />
} Continuing contributions to the development of<br />
the XLIFF standard of OASIS (members of Technical<br />
Committee)<br />
} RDF-XLIFF mapping (contacts from LREC: Thierry<br />
Declerck, Tobias Wunner and John McCrae (DERI),<br />
also Dr. David Lewis SF, Dr. Alex O’Connor SF)<br />
} Successful PhD defence by Lucía Morado Vázquez,<br />
who is now employed as postdoctoral researcher at<br />
the University of Geneva, Switzerland<br />
} Significant contributions to the knowledge of content<br />
development for global markets<br />
} Filing of invention disclosures<br />
} Integration of LOC1 components into the overall<br />
SOLAS framework<br />
} Successful PhD defence by Lorcan Ryan<br />
LOC 2<br />
} Significant contribution to the knowledge of<br />
localisation resource evaluation and interoperability<br />
} Further research and implementation of LocConnect<br />
component<br />
} Integration of LOC2 components into the overall<br />
SOLAS framework<br />
} Further research and assessment of quality and<br />
consistency in Translation Memories, including<br />
successful PhD defence by Joss Moorkens of his thesis<br />
“Measuring Consistency in Translation Memories:<br />
A Mixed-Methods Case Study”. This work involved<br />
significant industry input and has led to substantial<br />
industry interest in its outcomes.<br />
} Further research and implementation of Localisation<br />
Service Descriptor component<br />
} Further research and implementation of Quality<br />
Assessment Engine component<br />
} Cross-strand collaboration with ILT1<br />
} Asanka Wasala writing up PhD thesis<br />
} Filing of invention disclosure for several research<br />
demonstrators<br />
LOC3<br />
} Research and implementation of Workflow<br />
Recommendation Engine component<br />
} Investigation of industrial workflows<br />
} Investigation of data transfer practices for Term Bases<br />
and Glossaries<br />
} PhD students approaching write-up stage, reporting<br />
very significant results on their research into<br />
localisation service descriptors, strategies to surpass<br />
the established concept of locale in localisation, and<br />
community-based social localisation workflows.<br />
} Filing of invention disclosure for several research<br />
demonstrators<br />
} Integration of LOC3 components into the overall<br />
SOLAS framework<br />
LOC Overall<br />
} Collaboration with the United Nations Internet<br />
Governance Forum (IGF)<br />
} Support for the University of Limerick and the United<br />
Nations Economic Commission for Africa’s launch of<br />
the MSc in Multilingual Computing and Localisation<br />
to be delivered through distance learning and cohosted<br />
by UNECA at its Information Training Centre<br />
for Africa (ITCA) in Addis Ababa, Ethiopia. The aim of<br />
the programme is to promote African languages in the<br />
Information Society.<br />
} Invention Disclosures for Localisation Knowledge<br />
Repository (LKR), Automated Optimal Machine<br />
Translation System Selection supporting XLIFF, XLLIFF<br />
Phoenix, LocConnect and Workflow Recommender.<br />
} Further development of SOLAS integrated system and<br />
branching into SOLAS Productivity and SOLAS Match<br />
products.<br />
Industry Partner Engagement<br />
} Alchemy Software Development played an integral<br />
part in the <strong>2012</strong> LRC Summer School, preparing<br />
and presenting materials related to mobile device<br />
localisation.<br />
} LRC <strong>Annual</strong> Conference featured contributions from<br />
industry partners Symantec and Welocalize, as well<br />
as presentations from <strong>CNGL</strong> Spinout The Rosetta<br />
Foundation.
68<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
NEXT GENERATION LOCALISATION<br />
} LRC Best Thesis Award <strong>2012</strong> was sponsored by<br />
Symantec Ireland.<br />
Tech Transfer Activities<br />
The following invention disclosures have been filed with<br />
the Technology Transfer office at UL:<br />
2006167 – Deed of Assignment of Intellectual<br />
Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />
Localisation Knowledge Repository (LKR)<br />
2006166 – Deed of Assignment of Intellectual<br />
Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />
Automated Optimal Machine Translation System<br />
Selection Supporting XML Localization Interchange<br />
File Format (XLIFF)<br />
2006165 – Deed of Assignment of Intellectual<br />
Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />
XLIFF Phoenix<br />
2006164 – Deed of Assignment of Intellectual<br />
Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February<br />
– LocConnect – Localisation Orchestration<br />
Framework<br />
2006163 – Deed of Assignment of Intellectual<br />
Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />
Workflow Recommender<br />
} Localization World Paris and Seattle – Rosetta<br />
Foundation invited to exhibit at both European and<br />
American events.<br />
} LOC postdoctoral researcher Dr. David Filip launched<br />
FEISGILTT <strong>2012</strong>, a new federated event dedicated<br />
to Interoperability Standardization in Globalization,<br />
Internationalization, Localisation, and Translation<br />
Technologies.<br />
} Launch of AGIS Africa initiative, in collaboration with<br />
the Rosetta Foundation, United Nations Economic<br />
Commission for Africa, GALA and the University of<br />
Limerick.<br />
Plans<br />
The Next Generation Localisation area will work with<br />
The Rosetta Foundation as well as with the United<br />
Nation’s Internet Governance Forum (IGF) working<br />
group Dynamic Coalition for a Global Open Localization<br />
Platform: Localization for All on the further development<br />
of SOLAS leading to its deployment as an Open<br />
Localisation Platform, supported by the SF1 and SF2<br />
<strong>CNGL</strong> research areas.<br />
Education and Outreach<br />
} 10th <strong>Annual</strong> LRC Internationalisation and Localisation<br />
Summer School took place from 13-15 June <strong>2012</strong> in<br />
Limerick. The Summer School focused on Mobile<br />
Application development and localisation and was<br />
presented by a mix of <strong>CNGL</strong> industrial partners<br />
(Alchemy Software Development), PhD Students,<br />
academic staff and UL students.<br />
} Localisation Focus – The International Journal of<br />
Localisation published and sent out to libraries and<br />
subscribers, as well as being made available online<br />
for free at www.localisation.ie. Direct download links<br />
were sent to all members of the LRC mailing list<br />
(approximately 2,500).<br />
} LRC XVII, 20-21 September <strong>2012</strong>, Limerick, annual<br />
conference. Conference also featured <strong>CNGL</strong><br />
Innovation Showcase <strong>2012</strong>.<br />
} Launch and support of the MSc in Global Computing<br />
and Localisation by distance learning.<br />
Reinhard Schäler (second from left) presented on “Opportunities and<br />
Growth in Africa” at GALA <strong>2012</strong> in Monaco in March. Pictured with<br />
Reinhard are Renée Salzman (GALA Co-Founder), Hans Fenstermacher<br />
(GALA CEO) and María José Velasco (GALA founding member and<br />
Mondragón Lingua)<br />
The Rosetta Foundation was launched in 2009 by the<br />
President of UL and is supported by <strong>CNGL</strong> through<br />
formal decisions by its Integration and Management<br />
Committees. It works with more than 2,600 volunteers<br />
in over 40 languages and with 50 partner organisations<br />
including Special Olympics Europe Eurasia/International,<br />
Trócaire, the London School for Tropical Medicine and
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 69<br />
Hygiene, and Ruhama. IP developed by <strong>CNGL</strong> has been<br />
transferred to The Rosetta Foundation to support its<br />
technology platform and The Rosetta Foundation has<br />
become a UL Campus Company. The Rosetta Foundation<br />
has already provided very valuable feedback into <strong>CNGL</strong><br />
research which has resulted in joint publications (ASLIB<br />
Translation and the Computer, 2010). The platform<br />
serves as a test bed for the SOLAS research carried out in<br />
LOC, specifically with regard to SOLAS Match, allowing<br />
it to demonstrate the viability and to measure the<br />
improvements achieved in the localisation process. This<br />
work has been documented in at least two non-<strong>CNGL</strong><br />
funded MSc theses in <strong>2012</strong>.<br />
The LOC track publishes ‘Localisation Focus – the International Journal<br />
of Localisation’<br />
In line with this research, the platform is being open<br />
sourced with the aim of allowing SOLAS Match to<br />
become the de facto platform for non-industrial, nonprofit<br />
and non-market localisation and translation<br />
activities, driving social localisation as defined by LOC<br />
researchers and in support of the social agenda of <strong>CNGL</strong>.<br />
The results of this work have been commented on in the<br />
<strong>CNGL</strong> Final Review as the independent international<br />
review panel commented that “The most visible success<br />
here is without a doubt the Rosetta Foundation spinoff,<br />
which pioneers a novel, comprehensive localization<br />
model for organizations seeking to translate content<br />
for underserved communities. The panel feels this<br />
accomplishment has great societal impact that<br />
transcends the boundaries of Ireland and even the EU.”<br />
Testing is underway with a focus on demonstrating<br />
the viability of the SOLAS Match platform with a<br />
subset of projects within the Rosetta Foundation.<br />
The publication of specifications and an invitation for<br />
“open” contributions (such as from the African Network<br />
for Localisation; the Centre for the Development of<br />
Advanced Computing (CDAC) in Pune, India; the<br />
micro-lending organisation KIVA and other organisations<br />
such as TechSoup Global or Zafen), the creation of<br />
the component repository, and the demonstration of<br />
“open” interoperability, in collaboration with industry<br />
associations such as GALA and Interoperability Now are<br />
on-going priorities in this area.<br />
Improvements will be demonstrated and measured in<br />
relation to particular tasks, e.g. MT and MT post-editing,<br />
and in relation to the overall process, e.g. user interaction<br />
evaluation, (re-)use of localisation knowledge and flexible<br />
workflow specification supported by the platform. Each<br />
section in each LOC work package is associated with<br />
one particular aspect of this demonstrator and each will<br />
contribute to an improvement in the performance of the<br />
overall platform with component technologies from LOC<br />
sections connected to the localisation platform. This will<br />
enable us to measure the impact of these technologies<br />
on the performance of the overall localisation workflow.<br />
The LOC research track will support The Rosetta<br />
Foundation on the development and the deployment<br />
of SOLAS which, in turn, will provide highly valuable<br />
feedback from a concrete implementation scenario<br />
into the scientific research carried out within LOC<br />
and other <strong>CNGL</strong> areas. Now that the platform can be<br />
demonstrated, additional component technologies from<br />
other <strong>CNGL</strong> research areas are being considered for<br />
integration.<br />
The LOC research strand of <strong>CNGL</strong> will be subsumed<br />
in <strong>CNGL</strong>II under the Translation and Localisation<br />
Challenge (T&L), and the Interoperability and Analytics<br />
Challenge (I&A). T&L2 will focus on Social Localisation<br />
and continue with the research and development of<br />
the service-oriented localisation architecture solution<br />
(SOLAS) initiated under <strong>CNGL</strong> as the bulk localisation<br />
demonstrator. The work will focus on the identification<br />
and resolution of current problems around the correct<br />
identification of resources for localisation (SOLAS Match),<br />
as well as the identification and development of an<br />
adequate support infrastructure in terms of language<br />
technologies and resources in an ad hoc and dynamic<br />
setting.
Systems<br />
Framework
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 71<br />
Strand Name: Systems Framework<br />
AREA CO-ORDINATOR: DR. SATURNINO LUZ<br />
Participant Names and Affiliation<br />
Industrial Collaborators<br />
Prof. Andy Way<br />
Mr. Takeshi Fukunaga<br />
Mr. Dag Schmidtke<br />
Dr. Alexander Troussov<br />
Mr. David Clarke<br />
Capita<br />
Dai Nippon Printing<br />
Microsoft<br />
IBM<br />
Welocalize<br />
International<br />
Collaborators<br />
Dr. Alistair Edwards<br />
Dr. Masood Masoodian<br />
Prof. Michael McTear<br />
Prof. Chris Mellish<br />
University of York<br />
The University of Waikato<br />
University of Ulster<br />
University of Aberdeen<br />
Dr. Olga Beregovaya<br />
Welocalize<br />
Mr. Phil Richie<br />
VistaTEC<br />
Dr. Fred Hollowood<br />
Symantec<br />
Mr. Jason Rickard<br />
Symantec<br />
Faculty<br />
Prof. Julie Carson-Berndsen University College Dublin SF1<br />
Dr. Gavin Doherty Trinity College Dublin SF1, SF2<br />
Prof. Josef van Genabith Dublin City University SF2<br />
Dr. David Lewis Trinity College Dublin SF2 Leader<br />
Dr. Saturnino Luz Trinity College Dublin SF1 Leader<br />
Mr. Reinhard Schäler University of Limerick SF1, SF2<br />
Prof. Vincent Wade Trinity College Dublin SF1, SF2<br />
Postdoctoral Researchers<br />
Mr. Dominic Jones Trinity College Dublin SF2<br />
Dr. Nikiforos Karamanis* Trinity College Dublin SF1<br />
Dr. Anton Gerdelan* Trinity College Dublin SF2
72<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
SYSTEMS FRAMEWORK<br />
PhD Students<br />
Mr. John McAuley Trinity College Dublin SF2<br />
Mr. John Moran Trinity College Dublin SF2<br />
Ms. Ilana Rozanes Trinity College Dublin SF1<br />
Ms. Anne Schneider Trinity College Dublin SF1<br />
Mr. Stephan Schlögl Trinity College Dublin SF1<br />
Technicians<br />
Mr. Leroy Finn Trinity College Dublin SF2<br />
* Affiliated postdoctoral researchers<br />
Funding<br />
<strong>2012</strong> Funding from SFI<br />
€342,924<br />
SFI TIDA ‘iOmegaT: Instrumented CAT Tool’<br />
(12/TIDA/I2424) €92,273 over 12 months<br />
<strong>2012</strong> Funding from Other Sources<br />
EC FP7 Coordination and Support Action Language<br />
Technology Web – €149,280 to TCD over two years<br />
(UL, DCU, Microsoft and VistaTEC are also partners)
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 73<br />
Research Overview: Systems<br />
Framework (SF)<br />
Goals<br />
The Systems Framework track seeks to ensure that basic<br />
language technologies can be effectively integrated to<br />
form next generation localisation systems that meet<br />
high standards of usability, and to facilitate the use of<br />
such technologies in advanced research prototypes to<br />
creatively explore novel design spaces for interactive<br />
systems. SF aims to produce system services architecture<br />
and a system design methodology to support the<br />
integration of linguistic technologies, localisation<br />
workflow and digital content management. The ultimate<br />
goal being to enable rapid, iterative and instrumented<br />
integration of industrial software and academic research<br />
prototypes and to support their evaluation through<br />
provision of: a software integration platform based on<br />
open standards, guidelines and tools for developing<br />
workflows and applications using this platform, and<br />
methods for iterative prototyping and user studies. From<br />
a research perspective, SF focuses on the study of users<br />
(and potential users) of language technology-enabled<br />
systems in real work contexts, on the investigation of<br />
novel interaction design techniques, and on system<br />
support to the development of speech- and languageenabled<br />
applications.<br />
The work packages, SF1 and SF2 pursue these objectives<br />
from different perspectives. The Interaction Design<br />
Work Package (SF1) deals primarily with human-factors<br />
research and it explores the design of novel systems<br />
incorporating language technology. The Systems<br />
Service Architecture Work Package (SF2) has a dual<br />
role in <strong>CNGL</strong>: it acts as a coordinator and facilitator of<br />
practical systems integration for the <strong>CNGL</strong> Demonstrator<br />
Programme and it conducts research into service<br />
integration and service management techniques. These<br />
two roles are interrelated in that the Demonstrator<br />
Programme, due to its size and variety, offers a unique<br />
interoperability and evaluation laboratory that operates<br />
over a wide range of linguistic and digital content<br />
processing services and applications.<br />
The specific goals for <strong>2012</strong> were to (1) provide continued<br />
support for the demonstrator activities and incorporate<br />
lessons learned into service and metadata models that<br />
are contributing to international standards activities; (2)<br />
to analyse and create theories based on the workplace<br />
studies conducted in various work contexts, with focus<br />
on the work of medical interpreters; (3) to report<br />
research results in journal and conference publications;<br />
(4) to further disseminate and evaluate the Wizard-of-Oz<br />
system; and (5) to conduct further evaluation of language<br />
technologies in interactive contexts (e.g. speech-tospeech<br />
systems). These goals were satisfactorily met,<br />
several papers were published, substantive contributions<br />
were made to extension to the ITS (W3C) and XLIFF<br />
(OASIS) standards, and 3 PhD theses were submitted.<br />
Research Barriers and Methodologies<br />
to Address Them<br />
As noted in previous reports, we have identified a gap<br />
between language technology and systems development<br />
methodologies (including both systems and interaction<br />
design issues) which seems to extend beyond the<br />
usual issues in putting together demonstrator systems<br />
and research prototypes. The research done by SF has<br />
attempted to bridge this gap.<br />
John Moran of TCD presents at AMTA-<strong>2012</strong> Workshop on Post-editing<br />
Technology and Practice (WPTP) in San Diego, USA.
74<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
SYSTEMS FRAMEWORK<br />
semantic web, and to interface with standards used in<br />
localisation, such as XLIFF, for integration with work<br />
done in the LOC strand. SF also examines specific<br />
service management issues, in the context of its support<br />
for the demonstrator systems, namely the support for<br />
management of online communities, the unobtrusive<br />
monitoring of post-editing effort with regard to different<br />
configurations of SMT and other localisation support<br />
technologies and interoperable linked data formats for<br />
content management and localisation integration.<br />
Year 5 Progress<br />
Dominic Jones presents his PhD work at the national final of the Thesis<br />
in 3 competition<br />
From an interaction design perspective, SF has<br />
investigated methods for incorporating work contexts<br />
into the analysis of requirements for natural language<br />
generation systems, with special focus on MT technology<br />
in a localisation context (Doherty, Karamanis and Luz,<br />
<strong>2012</strong>; Karamanis, Luz and Doherty, 2011), ethnographic<br />
methods for the study of multilingual situations, with<br />
focus on the work of medical interpreters (Rozanes, Luz<br />
and Doherty, 2011) and rapid prototyping and evaluation<br />
methods for interactive language technologies such<br />
as systems that combine speech input/output to MT<br />
(Schlögl et al., 2011; Schneider and Luz, 2011).<br />
From a software engineering perspective, SF has<br />
successfully promoted the adoption of a Service Oriented<br />
Architecture (SOA) approach across <strong>CNGL</strong>, integrating<br />
different technologies into a range of applications<br />
spanning the use scenarios addressed by <strong>CNGL</strong>. This has<br />
allowed individual components, tools and platforms to<br />
retain autonomy in their choice of software technology,<br />
provided they adhere to some common interoperability<br />
models. The overall strategy was to employ existing<br />
standards as much as possible, by defining a common<br />
model based on standard languages from the W3C<br />
addressing provenance, internationalisation and the<br />
SF activities in Year 5 consisted largely of analysing<br />
and publishing results of research work conducted in<br />
the last 18 months, with a focus on the completion<br />
of PhD theses. Several papers have been written.<br />
Two papers appeared in major HCI journals (van der<br />
Sluis, Luz et al., <strong>2012</strong>; Doherty, Karamanis and Luz,<br />
<strong>2012</strong>), one will appear in the proceedings of the ACM<br />
Computer Supported Cooperative Work conference<br />
(Kane, Toussaint and Luz, 2013) and four others are<br />
under preparation for publication (to be submitted to<br />
journals ‘Interacting with Computers’ and ‘Computer<br />
Supported Cooperative Work’ and the conferences ACL<br />
2013 and Interact 2013). Research related to service and<br />
content and language resource interoperability were<br />
published at WWW <strong>2012</strong> (Filip, Lewis and Sasaki, <strong>2012</strong>)<br />
and LREC <strong>2012</strong> (Lewis et al., <strong>2012</strong>) and a paper and<br />
book chapter on community management were also<br />
published. Several presentations and talks were given,<br />
including presentations at the <strong>CNGL</strong> review meeting<br />
and <strong>CNGL</strong> Scientific Committee Meeting. In addition,<br />
several presentations were made at industrially-focused<br />
events, including Multilingual Web workshops. Two<br />
international workshops were organised. The first,<br />
focused on Multilingual Web and Linked Open Data,<br />
was held in June in Dublin. The second (FEISGILT <strong>2012</strong>),<br />
organised in collaboration with LOC and co-located with<br />
Localization World in Seattle in September, was focused<br />
on standardisation and interoperability issues around<br />
globalisation, internationalisation, localisation and<br />
translation. SF members also contributed significantly to<br />
the <strong>CNGL</strong>II proposal.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 75<br />
Accomplishments, Impact and Plans<br />
We have concluded the study on support for<br />
collaborative aspects of translation work and analysis<br />
of impacts of language technology, particularly MT,<br />
and reported this work in two major journal papers:<br />
the Machine Translation journal and the Computer<br />
Supported Cooperative Work journal (Karamanis, Luz<br />
and Doherty, 2011; Doherty, Karamanis and Luz, <strong>2012</strong>).<br />
The work on language generation in cross-cultural<br />
settings has also been concluded and published in the<br />
premier HCI journal (van Der Sluis et al., <strong>2012</strong>). The<br />
Wizard-of-Oz platform has been fully deployed online<br />
and released as an open source project. Experiments and<br />
interviews to assess wizard performance and the usability<br />
of the platform have been successfully concluded and<br />
a paper is currently in preparation for submission to<br />
the journal ‘Interacting with Computers’. Community<br />
management trials with Symantec were successfully<br />
completed, and MT post-editing trials with professional<br />
translators at Welocalize and with crowdsource translators<br />
were completed. The former resulted in post-editing<br />
machine translation (PEMT) analytics solutions being<br />
licensed to Welocalize, while the latter demonstrated<br />
strong MT improvement resulting from selective training<br />
based on PEMT logging. Further details of on-going<br />
activities and plans for future work are given below.<br />
Fieldwork for Language Technologies in Work<br />
Contexts<br />
SF PhD student Ilana Rozanes has concluded the<br />
elaboration of a grounded theory of the work of<br />
medical interpreters. This work spanned two years of<br />
extensive observation of medical interpreters at work,<br />
interviews, data collection and data coding. Results are<br />
being currently written up for publication in journal and<br />
HCI conference papers. These papers will explore the<br />
data and theory in the context of designing languagetechnology<br />
applications for use by interpreters in medical<br />
settings, drawing on <strong>CNGL</strong> technology.<br />
Figure 4: <strong>CNGL</strong> Wizard-of-Oz Homepage<br />
Writing of a paper describing the results of these<br />
activities is in progress for submission to a journal.<br />
Stephan Schlögl has submitted his PhD thesis, and<br />
his viva is scheduled for January 2013. The WebWOZ<br />
software has now been released under an open source<br />
licence and we plan to use and extend it in <strong>CNGL</strong>II.<br />
Interaction Design for Speech-to-Speech<br />
Translation<br />
Complementing our published work (Schneider and<br />
Luz, 2011; Schneider) a further experiment has been<br />
conducted on the use of speech recognition in an<br />
instructional task. Results are currently being written<br />
up for a paper to be submitted to ACL 2013 or SIGdial<br />
2013. The overall aim of this line of research is to<br />
assess the potential mismatches between intrinsic<br />
and extrinsic evaluation methods for component<br />
language technologies. In this case we focused on how<br />
well or otherwise (intrinsic metric) word-error rate<br />
correlates to (extrinsic measures of) task success, and<br />
proposed alternative methods for identifying potential<br />
communication difficulties in automatic speech<br />
recognition (ASR)-mediated communication.<br />
Wizard-of-Oz Platform<br />
The <strong>CNGL</strong> Wizard-of-Oz platform (WebWOZ) was made<br />
available online in 2011 (http://www.webwoz.com).<br />
Since then it has been used to gather data on patterns<br />
of usage of the tool and on wizard performance. Specific<br />
projects (e.g. HCI coursework projects) were designed to<br />
assess the tool under controlled conditions.
76<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
SYSTEMS FRAMEWORK<br />
Figure 5: End-to-end localisation workflow monitoring using RDF Provenance, through integration of CMS-LION,<br />
SOLAS, Matrex and other components<br />
Linked Data for Content Management<br />
and Localisation Integration<br />
Integration of Content Management Systems and<br />
Localisation Workflows remains a challenge with no<br />
established standard. However, content is increasingly<br />
generated and revised in a continuous stream including<br />
user-generated content, while existing push-based<br />
integration between content management and<br />
localisation systems constrains both agility in support<br />
of new content processing modes (e.g. crowdsourcing)<br />
and upstream feedback from translators. This activity<br />
therefore provides a standard linked data-oriented<br />
approach to agile multilingual content management.<br />
The approach supports both push- and pull-based<br />
CMS-Localisation interactions via a common Resource<br />
Description Framework (RDF) Provenance Model. This is<br />
implemented in a system, CMS-LION, which populated<br />
the RDF Provenance Model from exchanges of XLIFF files<br />
within a localisation workflow operated by LOC’s SOLAS<br />
platform. This model uses the RDF Open Provenance<br />
Vocabulary to log all CMS-Localisation interactions<br />
and content transformations. This allows standard<br />
SPARQL queries to be used for workflow monitoring and<br />
translation corpora extraction from fresh post-editing,<br />
for immediate retraining of an MT engine based on<br />
MaTrex from DCU and the bi-text corpora processing<br />
chains developed in the PANACEA project. Over several<br />
retraining iterations, this approach showed a strong 25%<br />
improvement in BLEU scores within a single crowdsourced<br />
translation job.<br />
To demonstrate this approach, a crowd-sourced<br />
translation application has been implemented with<br />
a Drupal frontend via which users can create and<br />
contribute to translation jobs in XLIFF. An RDFLogger<br />
component is used to change the XLIFF document<br />
into RDF provenance statements and then log these to<br />
a triple store. The Sesame Triple Store used provides<br />
an open source Java framework for storing, querying<br />
and reasoning with RDF. A RDF Provenance Visualiser<br />
has been implemented for exploring outcomes of<br />
process steps. This platform was also used in prototype<br />
integration with translation quality assurance data<br />
gathered by translation review processes conducted<br />
by VistaTEC.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 77<br />
Figure 6: Petri – A community analytics tool developed for identifying and tracking community guru behaviour<br />
at Symantec<br />
This experience and demonstration using CMS-LION<br />
and SOLAS have enabled <strong>CNGL</strong> to promote a strong<br />
vision of end-to-end interoperability and monitoring<br />
of localisation workflows. This, in turn, has fed into new<br />
metadata definitions related to translation provenance<br />
in the new version of the Internationalization Tag Set<br />
being developed by the Language Technology-Web<br />
project via the W3C’s Multilingual Web-Language<br />
Technology working group. Feedback on this approach<br />
has also been provided to the XLIFF Technical<br />
Committee and the W3C PROV working group. This<br />
capability has also placed <strong>CNGL</strong> well for continued<br />
international collaboration at the intersection of linked<br />
data and language resource technology research, both<br />
through collaboration on workshops in the area and<br />
through two EU-funded project proposal submissions.<br />
Visual Analytics for Online Communities<br />
Visual analytics can help users to extract knowledge<br />
from massive amounts of data, make sound decisions<br />
based on evidence and increase understanding of<br />
complex online processes. However, applications are<br />
generally developed with a focus on the researcher or the<br />
analyst, and lack a clear context for the end-user. This<br />
research seeks to investigate the potential<br />
of visual analytics for online communities. It<br />
has evaluated how to extract knowledge from<br />
communication data and represent this visually<br />
to support evidence-based decision making and<br />
understanding complex processes in online communities.<br />
An initial visual analytics tool was developed for the<br />
Stack Exchange Super-User meta community. The<br />
tool visualises the community’s social and temporal<br />
interaction patterns and provides collaboration support<br />
in the form of visualisation bookmarking, view sharing<br />
and threaded discussion. Based on this experience, a<br />
revised tool was tailored to the community management<br />
requirements of customer support staff in Symantec,<br />
enabling an evaluation of innovative methodologies<br />
and tools in developing such a tool.<br />
The tool, Petri, was designed to encourage a more<br />
analytical approach to online community management<br />
that is based on cycles of observation and intervention.<br />
We conducted several interviews and design workshops<br />
with Symantec’s online community team to help<br />
formalise a set of requirements. These requirements<br />
were then used to inform the design. Petri enables<br />
the community manager to analyse their community<br />
from multiple perspectives, shifting between phases
78<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
SYSTEMS FRAMEWORK<br />
of explorative and confirmative analysis, and to identify<br />
users that could prove valuable to the community<br />
over time. Explorative evaluation, conducted with five<br />
members of the Symantec community management<br />
team, found the visualisation tool to be both useful<br />
and usable. This work, therefore, proposed a new<br />
approach to online community management, which is<br />
built upon cycles of analysis and informed intervention.<br />
It is supported by the implementation of advanced<br />
visual analytic technologies and has established a set of<br />
design requirements that can be readdressed by other<br />
researchers interested in online community visualisation.<br />
termed instrumented OmegaT or iOmegaT, has been<br />
deployed at a large <strong>CNGL</strong> industrial partner, Welocalize.<br />
As a result, large quantities of translation process field<br />
data have been gathered from production tests where<br />
individual translator speed was measured for segments<br />
that were translated using MT and segments that were<br />
not (Human Translation). So far, over half a million<br />
words in approximately 60,000 sentences have been<br />
translated by more than 50 translators using iOmegaT<br />
and this number is growing on a monthly basis as more<br />
productivity tests are carried out. This uniquely large data<br />
set is currently being analysed to determine: whether<br />
automated MT metrics or string distance calculations<br />
correlate with post-editing (PE) time data; if analysis<br />
of patterns in keystroke and other translation process<br />
(TP) field data provides insight into the MT post-editing<br />
process; if features of source sentences which correlate<br />
with increased post-editing time across multiple<br />
languages can be identified; and what volume of PEMT<br />
data needs to be gathered to form reliable analyses of<br />
MT engines.<br />
Industry Engagement and Future Plans<br />
Prof. Felix Sasaki of DFKI and Dag Schmidtke of Microsoft Ireland confer<br />
with Dr. Mark Davis, President of the Unicode Consortium via video link<br />
at the W3C Multilingual Web Workshop at TCD<br />
Instrumenting CAT Tools to evaluate Post-editing<br />
of SMT<br />
Machine translation (MT) evaluation metrics based on<br />
n-gram co-occurrence statistics are financially cheap<br />
to execute and their value in comparative research is<br />
well documented. However, their value as a standalone<br />
measure of MT output quality is questionable. In<br />
contrast, manual methods of MT evaluation are<br />
financially expensive. This work is developing a lowcost<br />
means of acquiring MT evaluation data in an<br />
operationalised manner in a commercial post-edited<br />
MT context. To this effect, OmegaT, a popular open<br />
source CAT tool has been augmented to capture postediting<br />
keystroke and other CAT tool actions, and to<br />
capture this in an open XML log file so that it can be<br />
analysed by workflow managers. The resulting tool,<br />
Strong industry engagement through deployment and<br />
trialling of tools has been conducted with Welocalize<br />
and Symantec, resulting in one technology licence.<br />
Further close collaboration is being undertaken together<br />
with LOC and ILT through the W3C’s MultilingualWeb-<br />
Language Technology working group. These and<br />
earlier engagements are resulting in on-going industry<br />
collaboration at the level of proposal writing, especially<br />
<strong>CNGL</strong>II, FP7 and Science Foundation Ireland/Enterprise<br />
Ireland Technology Innovation Development Award<br />
(TIDA). The WOZ and CMS-LION systems now also<br />
form core platforms for research in interactivity,<br />
interoperability and analytics in <strong>CNGL</strong>II.
Year 5 Demonstrator<br />
Programme
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 81<br />
Year 5 Demonstrator Programme<br />
Goals<br />
The <strong>CNGL</strong> Demonstrator Programme aims to: promote<br />
and guide collaborative scientific work between <strong>CNGL</strong><br />
partners and between research tracks; showcase the<br />
relevance of <strong>CNGL</strong> research to industry and society in<br />
general; and provide regular milestones for assessing the<br />
collective progress and impact of <strong>CNGL</strong>.<br />
Research Challenges and Methods<br />
The Demonstrator Programme has achieved these<br />
objectives through a rolling programme of engagement<br />
by multiple teams of collaborating researchers across<br />
<strong>CNGL</strong> tracks that address specific use scenarios in<br />
response to industry needs. Each team developed<br />
a demonstrator system in an iterative manner<br />
and presented them in bi-annual showcases. The<br />
Demonstrator Programme balanced the scientific<br />
needs of individual researchers and PhD topics, diverse<br />
and evolving requirements of industry partners, and<br />
of the Centre in advancing collaborative research<br />
and IP commercialisation. It has also tracked and<br />
assessed progress visible via demonstrator systems<br />
and communicated this internally, to reviewers and<br />
advisers, to industry and the general public at <strong>CNGL</strong><br />
Localisation Innovation Showcase events. <strong>CNGL</strong> has<br />
carefully developed and resourced a flexible coordination<br />
organisational structure that enabled the programme to<br />
address its challenges effectively and in a timely fashion.<br />
Work on advancing the demonstrator systems was<br />
conducted by demonstration teams with members<br />
drawn from across universities, research tracks and<br />
industry partners. Demonstrator systems must exhibit<br />
potential industrial impact, but are also vehicles for<br />
scientific collaboration and instances of model-driven<br />
interoperability. These three factors therefore form the<br />
basis for evaluating demonstrator systems. Evaluations<br />
are recorded so as to track the progress through<br />
increasing maturity across the Demonstrator Programme<br />
as well as track links to the peer-review publications<br />
produced by the Centre.<br />
Achievements in Year 5 (<strong>2012</strong>)<br />
The Demonstrator Programme accomplished four major<br />
milestones in <strong>2012</strong>:<br />
1. In July, a showcase of selected demonstrator<br />
systems was presented to a panel of distinguished<br />
international reviewers as part of <strong>CNGL</strong>’s Year 5<br />
Review site visit.<br />
2. The Programme’s work on Metadata Semantics for<br />
Next Generation Localisation and its instantiation<br />
in demonstrator systems is receiving international<br />
recognition and is exerting a coordinated impact on<br />
both the major extant international standardisation<br />
efforts in localisation, namely W3C working group<br />
on Multilingual Web – Language Technology and<br />
the OASIS XLIFF Technical Committee.<br />
3. A large set of the demonstrator systems was<br />
showcased at a final public event at the Localisation<br />
Research Centre conference in Limerick in<br />
September.<br />
4. Several of the demonstrator systems have successfully<br />
graduated to the <strong>CNGL</strong> Commercialisation<br />
Programme and are now receiving seed funding from<br />
various sources to further develop their commercial<br />
potential.<br />
Another important achievement for the Demonstrator<br />
Programme has been in showing the real benefits<br />
of active resource curation and its role in improving<br />
the quality and performance of language technology<br />
components.<br />
As shown below in Figure 7, collectively the <strong>CNGL</strong><br />
Demonstrator Programme covers a range of content<br />
processing scenarios, from community management, to<br />
multimodal interaction to personalised discovery and<br />
consumption of content. Processes to both translate and<br />
slice/recompose content are core to these activities. The<br />
role of language technology (e.g. text analytics, machine<br />
translation and speech processing) in these scenarios is<br />
supported by the active curation of language resources.
82<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
YEAR 5 DEMONSTRATOR PROGRAMME<br />
Figure 7: Content Processing Scenarios covered by the Demonstrator Programme<br />
Curating language resources as a secondary output<br />
of content processing activities promises significant<br />
progressive improvement of language technology<br />
components through systematic targeting and reuse of<br />
these resources. Such active curation has already shown<br />
significant improvement in Statistical Machine Translation<br />
(SMT) performance within a crowd-sourced translation<br />
project where the rapid retraining of SMT has been made<br />
possible by the active curation and reuse of human<br />
translation corrections. This work has led to funding<br />
being secured (SFI TIDA feasibility funding) to develop<br />
further rapid SMT retraining techniques.<br />
Demonstrator Showcases<br />
As mentioned above, the demonstrator systems were<br />
showcased at two events in <strong>2012</strong>, the <strong>CNGL</strong> Year 5<br />
Review (July) and the <strong>CNGL</strong> Localisation Innovation<br />
Showcase at the Localisation Research Centre conference<br />
(September). The following provides an overview of the<br />
key systems that were showcased at these events.<br />
This initial set of demonstrators highlights the<br />
commercialisation outputs of <strong>CNGL</strong> that have emanated<br />
from the Demonstrator Programme:<br />
} Text Classification for Bulk Localisation Review<br />
[Digital Linguistics/TCD – ILT/SF]: Phil Ritchie<br />
(Digital Linguistics) and Gerard Lynch demonstrated<br />
Review Sentinel, a software-as-a-service offering for<br />
scalable and consistent language quality management<br />
from <strong>CNGL</strong> spinout Digital Linguistics. This direct<br />
licensing and commercialisation of <strong>CNGL</strong> academic/<br />
industrial collaboration reduces linguistic review cost<br />
while ensuring the highest levels of style and brand<br />
consistency.<br />
} Wripl – Personalisation-as-a-Service across<br />
Websites [TCD – DCM]: Kevin Koidl and Brian<br />
Gallagher demonstrated non-invasive cross-site<br />
personalisation. This work improves a user’s<br />
experience as they browse across multiple different<br />
CMS systems to solve a particular task. As the user<br />
browses from site to site, the system gains knowledge<br />
about their task and gives hints to the CMS on which<br />
content to recommend. Wripl has been developed<br />
with the support of Science Foundation Ireland (SFI)/<br />
Enterprise Ireland (EI) Technology Innovation<br />
Development Award (TIDA) funding and its<br />
development is now supported by the Enterprise<br />
Ireland Commercialisation Development Fund.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 83<br />
} Emizar – Personalised Retrieval Composition and<br />
Presentation [TCD/Symantec – DCM]: Dr. Alex<br />
O’Connor presented the Personalised Multilingual<br />
Customer Care Adaptive Portal. This system combines<br />
formal technical content with social content harvested<br />
from user forums to present tailored, task-specific<br />
solutions to users for customer support and technical<br />
support problems, across languages and levels of<br />
expertise. This is already in receipt of SFI/EI TIDA<br />
funding and is undertaking trials with industrial<br />
reference customers.<br />
} KantanMT – Moses on the Cloud [DCU/<br />
Xcelerator – ILT]: Tony O’Dowd (Xcelerator)<br />
and Dr. Declan Groves demonstrated how ILTbased<br />
machine technology and know-how is being<br />
leveraged commercially to provide cloud-based MT<br />
services. Tony O’Dowd has formed a DCU spin out,<br />
Xcelerator, to commercialise this technology, which<br />
already has over 400 mid-sized client LSPs. Xcelerator<br />
secured US$1.2 million in funding from a syndicate of<br />
investors which will allow for the creation of 25 new<br />
development jobs.<br />
} iOmegaT – An Instrumented CAT Tool and its<br />
use in a Commercial Machine Translation Study<br />
[TCD/Welocalize – SF]: John Moran demonstrated<br />
how his instrumented version of the CAT tool<br />
OmegaT was used to collate post-editing time data<br />
during commercial MT evaluation projects conducted<br />
by Welocalize. Such data has the potential to be vital<br />
in assessing the post-editing effort and quality of<br />
machine translation and assessing the performance<br />
of different MT offerings in a commercial translation<br />
setting. This technology has also now secured SFI/EI<br />
TIDA feasibility funding for 2013.<br />
} PLuTO – Facilitating Patent Search with Machine<br />
Translation [DCU/FP7 – ILT]: Dr. John Tinsley<br />
demonstrated work by the EU-funded PLuTO<br />
project which has developed in-browser software<br />
that allows patent search professionals to carry out<br />
personalised translations on-the-fly. The technology<br />
uses statistical machine translation that has been<br />
adapted to the patent domain and deployed as a web<br />
service. This technology is now supported by the EI<br />
Commercialisation Development Fund for further<br />
development at DCU.<br />
} SOLAS Match – Leveraging community translation<br />
[UL/Rosetta Foundation – LOC]: Dr. Eoin Ó<br />
Conchúir demonstrated how SOLAS Match is used as<br />
a collaborative localisation platform for communitybased<br />
volunteer translators. This is being rolled out in<br />
the non-profit <strong>CNGL</strong> spin-out, the Rosetta Foundation,<br />
where it is being used to support a cohort of 6,000<br />
volunteer translators.<br />
} Rapid SMT Re-training [DCU/TCD/MLW-LT/<br />
PANACEA – ILT/SF]: Dr. Antonio Toral (Affiliated<br />
project – PANACEA) and Leroy Finn showed how<br />
a statistical machine translation (SMT) engine is<br />
re-trained using post-edits from non-professional<br />
translators. <strong>CNGL</strong> provides CMS-LION, which offers<br />
crowd-sourced post-editing integrated with Content<br />
Management Systems (CMS). PANACEA provides a<br />
web service for machine translation and workflows for<br />
the retraining of the SMT engine. This has resulted in<br />
additional SFI/EI TIDA feasibility funding to develop<br />
more rapid SMT retraining techniques.<br />
The following demonstrators showcased a high degree<br />
of industrial engagement and impact:<br />
} Visual Analytics for the Management of Online<br />
Communities [TCD/Symantec – SF]: John McAuley<br />
showed how visual analytics can make analysis of<br />
online interactions accessible to all members of an<br />
online community. This actively supports online<br />
communities in discussing and planning the evolution<br />
of their policies and processes, thereby increasing<br />
member engagement, and has been trialled through<br />
development of a tool to enable members of customer<br />
support communities at Symantec to observe and gain<br />
insight into the behaviour of key community members,<br />
or ‘gurus’.<br />
} Multilingual User Modelling for Personalised<br />
Multilingual Information Retrieval [TCD/DCU/<br />
Microsoft – DCM]: M. Rami Ghorab demonstrated<br />
a framework for multilingual search personalisation.<br />
This work provides a system to permit the delivery<br />
and evaluation of different combinations of functional<br />
elements of a personalised, multilingual information<br />
retrieval system, such as user modelling, query<br />
adaptation, results adaptation and translation. This<br />
demonstrator was advanced through a placement at<br />
Microsoft Ireland offices.
84<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
YEAR 5 DEMONSTRATOR PROGRAMME<br />
} CMS-LIONSolas Integration: Full Content Lifecycle<br />
Metadata Interoperability TestBed [UL/TCD/MLW-<br />
LT – SF/LOC]: Dr. David Filip demonstrated a unique<br />
platform for testing complex metadata designs<br />
spanning process areas over the full multilingual<br />
content life cycle. David showed how a RDF-based<br />
provenance store is used between Web Content<br />
Management System (CMS) and XLIFF-based<br />
translation workflows. This demonstrates use cases<br />
for the round-tripping of Internationalisation Tag Set<br />
(ITS) metadata between content generation and<br />
publication in HTML5/XML and localisation processes<br />
in XLIFF. This therefore provides direct testable<br />
input into current standardisation working groups<br />
developing ITS, XLIFF and HTML5<br />
The final set of demonstrators highlights promising<br />
research directions that have influenced the focus of<br />
<strong>CNGL</strong>II:<br />
} MOODfinger – An Affective Search Engine [UCD-<br />
DCM]: Alejandra López-Fernández and Yanfen Hao<br />
presented their initial prototype of a search engine<br />
that retrieves texts that express a certain mood for a<br />
given query and then ranks the texts according to the<br />
degree to which they exhibit this mood. As part of this<br />
work, an affective lexicon is built which can be used to<br />
help retrieve, filter and rank web content in the most<br />
emotionally useful ways. The affective qualities of<br />
content, especially user-generated content, underpin<br />
several research activities in <strong>CNGL</strong>II.<br />
} WebWOZ – A Wizard of Oz Platform [SF/<br />
ILT – TCD/UCD]: Stephan Schlögl demonstrated<br />
a web-based system for supporting online dynamic<br />
intervention by designers while testing user<br />
interactions with application prototypes that will later<br />
incorporate language processing components. This<br />
provides a flexible tool for rapidly iterating low fidelity<br />
application prototypes using Wizard-of-Oz techniques.<br />
Stephan also discussed how WebWOZ was leveraged<br />
by ILT researchers in UCD for user evaluations of<br />
their MySpeech system. Released as an open source<br />
system, WebWOZ forms a key platform for multimodal<br />
interaction and dialogue research in <strong>CNGL</strong>II.<br />
} WinkTalk – Linking Facial Expressions to<br />
Expressive Synthetic Voices [UCD – ILT]: Éva<br />
Székely and Zeeshan Ahmed presented their work on<br />
using facial gestures to automatically select between<br />
expressive synthetic voice styles for use by synthetic<br />
voices and speech generating devices. The expressive<br />
features of the synthetic voices represent dimensions<br />
of emotional intensity rather than distinct emotions.<br />
This work shows the potential for supporting affectdriven<br />
dialogue systems.<br />
Metadata Semantics for Next Generation<br />
Localisation<br />
In addition to developing and showcasing a set of<br />
demonstrator systems, the Demonstrator Programme<br />
provides a basis for examining and modelling problems of<br />
interoperability across the scope of end-to-end content<br />
processing. The components used and integrated in<br />
the demonstrator systems derive from a number of<br />
different research and industrial communities, where<br />
typically either metadata was not formally defined or was<br />
specified in a fragmented set of standards.<br />
The Metadata Group (MDG) was established to<br />
concentrate and integrate the metadata knowledge from<br />
these different communities, including statistical machine<br />
translation and text analytics research, adaptive content<br />
and personalisation research, and localisation workflow<br />
and interoperability expertise. To address the universal<br />
trend towards web-based content and to offer a wellsupported,<br />
community-neutral approach to semantic<br />
modelling of metadata, the standardised languages of<br />
the W3C Semantic Web initiative were used, specifically<br />
the Resource Description Framework (RDF). This allowed<br />
multiple existing metadata standards and component<br />
metadata requirements to be incorporated into a single<br />
model. This thereby demonstrates the interrelation<br />
and utility of such interlinked metadata and provides<br />
a focus for wider consensus building on a semantic<br />
model that combines content management, localisation,<br />
natural language processing and content adaptation/<br />
personalisation. Such an approach enables existing<br />
service-oriented system integration to be enhanced<br />
through semantic annotation of differing interfaces,<br />
e.g. those used in SOLAS, or for SMT integration. It also<br />
supports linked-data provenance annotation for the pullbased<br />
interoperability approach used for the CMS-LION<br />
system.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 85<br />
Figure 8: WinkTalk Workflow<br />
With strong input from the demonstrator teams, the<br />
MDG has developed semantic models of content and<br />
process taxonomies that span the content processing<br />
scenarios (see Figure 7) supported by the Demonstrator<br />
Programme. This provides a broad and validated<br />
semantic model that will guide future interoperability<br />
solutions across the Global Intelligent Content space.<br />
This broad view of interoperability based around<br />
semantic metadata and its mapping to existing<br />
standards has allowed <strong>CNGL</strong> to impact on international<br />
standardisation efforts. As well as UL’s established<br />
participation in the OASIS XLIFF Technical Committee,<br />
in <strong>2012</strong> UL, TCD, DCU, Microsoft and VistaTEC together<br />
with several international academic and industrial<br />
collaborator and the support of EU funding, founded<br />
a new W3C working group on Multilingual Web –<br />
Language Technology. This working group addresses<br />
the interoperability challenges that exist in integrating<br />
content management systems, localisation systems and<br />
machine translation services. Interoperability use cases<br />
being addressed include: CMS-based content translation<br />
and quality assurance; CMS-LSP metadata round-tripping<br />
and content metadata for machine translation training<br />
and on-demand content translation. The consortium is<br />
led by DFKI (Germany), and contains other academic<br />
experts, a CMS vendor (Cocomore), several LSP and<br />
language technology providers (Moravia, Enlaso,<br />
LinguaServ. ]Init[. Logrus, Tilde and Lucy Software)<br />
as well as attracting further participation from large<br />
localisation clients including Adobe, SAP, Intel and IBM.<br />
Input from the Demonstrator Programme has been in the<br />
form of integration between CMS-LION, SOLAS, MaTrEx,<br />
PANACEA MT training services and localisation quality<br />
assurance from Digital Linguistics and VistaTEC. UL(LOC)<br />
and TCD(SF) have also been instrumental in driving<br />
roundtrip scenarios between ITS in HTML5/XML files<br />
and XLIFF-based workflows, thereby acting to harmonise<br />
parallel specification activities in the MLW-LT working<br />
group at the W3C and the XLIFF Technical Committee at<br />
OASIS as well as contributing to those groups individually<br />
as editors and co-chairs.
86<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
YEAR 5 DEMONSTRATOR PROGRAMME<br />
Figure 9: Overview of Semantic Modelling of the Metadata Group in influencing international standards<br />
The chairing and organisation by the Metadata Group<br />
of the inaugural FEISGILTT <strong>2012</strong> interoperability and<br />
standards harmonisation workshop, co-located with<br />
Localization World in Seattle in October, played a key<br />
role in this harmonisation, and will be repeated at<br />
Localization World in London in 2013. In addition, the<br />
Metadata Group organised, in collaboration with the<br />
MLW-LT working group, a workshop in the Multilingual<br />
Web series on the role of Linked Open Data in the<br />
development of the multilingual web. This together with<br />
committee involvement in the Multilingual Semantic Web<br />
workshop in Boston and the Multilingual Linked Open<br />
Data for Enterprises workshop in Leipzig demonstrates<br />
that the <strong>CNGL</strong> Metadata Group is playing a significant<br />
role in guiding the convergence of language and<br />
localisation technologies with the linked data cloud. This<br />
role will continue in <strong>CNGL</strong>II through the Interoperability<br />
and Analytics theme as well as through proposed new<br />
EU projects.
Industry Partnerships and<br />
Technology Transfer
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 89<br />
Industry Partnerships and<br />
Technology Transfer<br />
Overview<br />
For the last twenty years the localisation industry has<br />
focused on delivering valuable solutions that adapted<br />
content for specific geographic regions and cultures.<br />
The next twenty years will be about inventing realtime<br />
solutions which operate across the global content<br />
value chain to transform content into small actionable<br />
bits of information personalised to specific individuals,<br />
regardless of their current location. To accommodate<br />
this industry shift, the <strong>CNGL</strong> research programme has<br />
expanded out to look at key aspects of the end-to-end<br />
content value chain.<br />
Knowledge transfer within the Centre operates under<br />
an industry-standard Collaborative Research and<br />
IP Agreement. The IP agreement was signed by all<br />
parties in May 2008, while the Collaborative Research<br />
Agreement was signed in May 2009 at an event held at<br />
the IBM campus in Dublin. The Collaborative Research<br />
Agreement clearly defines how intellectual property<br />
generated by the Centre is managed and ultimately<br />
commercialised.<br />
<strong>CNGL</strong> is working with our industrial partners to deliver<br />
a range of solutions across the global content value<br />
chain that provide consistently fine-grained analysis and<br />
services to an ever more empowered and demanding<br />
group of global consumers.<br />
As <strong>CNGL</strong>’s fifth year draws to a close, we can report<br />
progress on multiple fronts, particularly in our<br />
commercialisation and industry outreach efforts. During<br />
the past year the <strong>CNGL</strong> Centre Management team has<br />
placed significant emphasis on maturing and deepening<br />
relationships with our current industry partners, as<br />
well as engaging with the broader ecosystem. At the<br />
same time, our Intellectual Property portfolio and<br />
commercialisation pipeline have come together and<br />
are demonstrating significant market potential. To date<br />
<strong>CNGL</strong> spinouts have raised in excess of €1.25M in<br />
venture capital funding and are projecting the creation of<br />
25+ private-sector jobs in the coming year.<br />
In <strong>2012</strong> <strong>CNGL</strong> continued its successful Localisation<br />
Innovation Showcase series, which has continually<br />
strong attendance since it was launched in 2009. The<br />
event, which attracts upwards of 100 attendees, is an<br />
opportunity to showcase emerging <strong>CNGL</strong> innovations.<br />
In addition, the event has become a catalyst for<br />
an expanding array of interactions between <strong>CNGL</strong><br />
researchers and practitioners from the broader industrial<br />
ecosystem.<br />
<strong>CNGL</strong> Spinout Showcase at Symantec’s offices in Ballycoolin, Dublin<br />
As a commercially-focused research centre, <strong>CNGL</strong><br />
depends upon its industrial partners to provide<br />
candid guidance regarding the research agenda and<br />
to continually assess our progress towards key project<br />
milestones. Industrial partners have representatives on<br />
every significant management committee within the<br />
<strong>CNGL</strong> organisational structure; this provides them with<br />
formal top-down communication channels through<br />
which to influence the research agenda. Furthermore,<br />
our corporate engagement strategy emphasises oneon-one<br />
reciprocal relationships between academic<br />
researchers in <strong>CNGL</strong> and their corporate equivalents,<br />
which provides equally important and effective bottomup<br />
communication channels.
90<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />
Alchemy’s initial 5-year commitment to <strong>CNGL</strong> had been<br />
valued at €630K, which is a combination of software<br />
licences and consulting expertise. The company has<br />
already contributed the full complement of software to<br />
our research tracks, valued at over €600K. In addition to<br />
software licences, Alchemy personnel have dedicated a<br />
significant number of hours working directly with <strong>CNGL</strong><br />
staff.<br />
TCD Winners of EI/SFI Technology Innovation Development Awards<br />
(TIDA) <strong>2012</strong> include Prof. Séamus Lawless, Dr. Alex O’Connor and Prof.<br />
Vincent Wade of <strong>CNGL</strong><br />
During <strong>2012</strong> Alchemy Software Development has been<br />
particularly active with respect to the use of Machine<br />
Translation technology and was a key supporter during<br />
<strong>2012</strong> of the successful SFI TIDA application which<br />
secured additional funding for research prototyping in<br />
the area of rapid machine translation retraining.<br />
Capita Translation & Interpreting<br />
(Previously Applied Language Solutions)<br />
Current Industrial Partnerships<br />
<strong>CNGL</strong> currently has 10 diverse corporate partners<br />
who maintain a strong commitment to the long-term<br />
success of our research efforts. Our partners include<br />
multinational companies such as DNP, IBM, Microsoft,<br />
and Symantec as well as indigenous and regional SMEs<br />
including Alchemy, SDL, SpeechStorm, Applied Language<br />
Solutions, Welocalize and VistaTEC.<br />
The diversity of our partners is a reflection of the<br />
challenges facing <strong>CNGL</strong> as well as the importance of<br />
our research to both the Irish economy and global<br />
marketplace. A successful realisation of the <strong>CNGL</strong><br />
objectives will help drive not only the development and<br />
productisation of novel early stage technologies but also<br />
solidify Ireland as the centre of excellence for multilingual<br />
localisation research and development.<br />
Alchemy<br />
Capita Translation & Interpreting became a full member<br />
of <strong>CNGL</strong> in January <strong>2012</strong> with its acquisition of Applied<br />
Language Solutions, which in turn had previously<br />
acquired original <strong>CNGL</strong> Partner Traslán. The company<br />
employs more than 150 members of staff worldwide<br />
and provides language solutions to customers in over 90<br />
countries, in more than 200 different languages. Traslán’s<br />
initial 5-year commitment to <strong>CNGL</strong> had been valued at<br />
€958K, which is a combination of software licences and<br />
consulting expertise. Applied Language Solutions has<br />
taken over the mantra and is already making significant<br />
contributions in terms of translation memories. One<br />
of the key benefits of <strong>CNGL</strong> membership is talent<br />
acquisition – having access to highly skilled researchers.<br />
Applied Language Solutions has hired three <strong>CNGL</strong><br />
researchers to enable the fast growth translation services<br />
provider to improve further its industry-leading service,<br />
through driven development of its machine-assisted<br />
translation solution.<br />
Alchemy Software Development is one of the world’s<br />
foremost and recognised localisation technology<br />
providers. The company was founded as an Irish SME<br />
in 2000 and, as a result of its phenomenal growth and<br />
success, completed a merger with Translations.com, a<br />
leading provider of software, website and enterprisewide<br />
localisation services, as well as localisation-related<br />
technology products, in 2008.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 91<br />
DNP<br />
Founded in Japan in 1876, Dai Nippon Printing (DNP)<br />
has grown to become one of the world’s leading<br />
comprehensive printing companies. DNP has developed<br />
a unique vision of the future of multilingual multi-modal<br />
digital media based on its significant expertise in the<br />
management of global multilingual content distribution.<br />
With the company predicting the coexistence of paper<br />
and digital media along with the anticipated creation<br />
of new forms of media, DNP’s participation in <strong>CNGL</strong><br />
is of particular strategic importance to its long-term<br />
objectives.<br />
Despite the distance, DNP is actively involved in the<br />
strategic direction of the Centre. During the course of<br />
<strong>2012</strong> DNP has sent representatives to Dublin to discuss<br />
commercialisation strategy as well as hosted a <strong>CNGL</strong><br />
Delegration to discuss <strong>CNGL</strong>’s new research programme.<br />
IBM<br />
IBM is one of the world’s leading technology and<br />
service providers dedicated to helping clients succeed<br />
in delivering business value by becoming more efficient<br />
and competitive through the use of business insight<br />
and information technology. As a multinational firm,<br />
IBM takes a globally integrated approach to innovation<br />
with a network of more than 60 software development<br />
and research laboratories that explore, test and support<br />
a wide range of emerging technologies. IBM first set<br />
up operations in Ireland over 50 years ago and since<br />
then the region has become the hub of worldwide<br />
research into linguistic technologies. Furthermore, the<br />
recently established IBM Dublin Centre for Advanced<br />
Studies (CAS) has made Human Language Technologies<br />
one of its core research priorities. IBM launched the<br />
LanguageWare project in 2001 with the vision of creating<br />
a componentised linguistic platform with applications<br />
across the company’s entire product portfolio.<br />
LanguageWare is now the most broadly used linguistic<br />
technology across IBM.<br />
Over the initial five years of the <strong>CNGL</strong> operation, IBM<br />
has committed a total of €8.65M in funding to the<br />
programme, €7.7M in the form of software licences and<br />
1.75 FTEs valued at €950K. To date we have integrated<br />
€6.9M worth of IBM software licences.<br />
Microsoft<br />
Founded in 1975, Microsoft is the global leader in<br />
software, services and solutions that help people and<br />
businesses realise their full potential. The company first<br />
set up operations in Ireland in 1985 and has steadily<br />
expanded its base of activity, now employing almost<br />
2,000 full-time and contract staff. As a company that<br />
localises products and services into 60+ languages,<br />
the need for integrated enterprise and personalised<br />
localisation tools is one of the fundamental challenges<br />
stretching across each of Microsoft’s business units.<br />
The company’s participation in <strong>CNGL</strong> provides our<br />
researchers with a unique industry perspective on the<br />
challenges of international product development.<br />
Microsoft has already contributed the full complement<br />
of original proposed contribution to the research tracks,<br />
valued at over €2M in terms of translation memories,<br />
helping researchers both in Bulk Enterprise Localisation<br />
and Personalised Multilingual Customer Care. Microsoft<br />
has filled two intern positions with <strong>CNGL</strong> researchers<br />
during <strong>2012</strong> and continues to be proactive on the<br />
industrial committee.<br />
SDL<br />
SDL was founded in 1992 and has since grown to become<br />
one of the world’s foremost localisation providers to<br />
businesses maintaining a global market presence. SDL<br />
is at the forefront of research and development in the<br />
fields of machine translation and global information<br />
management technologies. SDL’s industry leading<br />
position in the translation supply chain offers <strong>CNGL</strong><br />
researchers unparalleled access the tools and expertise<br />
that are used to serve over 400 of the world’s leading<br />
enterprises.
92<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />
SDL’s initial commitment to <strong>CNGL</strong> included a localisation<br />
management system (Idiom Worldserver) valued in<br />
excess of €300K over the life of the project. This software<br />
had already been delivered during the first year of the<br />
Centre’s operation and formed the backbone of the<br />
baseline <strong>CNGL</strong> Demonstrator System.<br />
SpeechStorm<br />
SpeechStorm is a solutions provider that specialises<br />
in integrating market leading voice platforms and<br />
speech recognition software with in-house application<br />
development expertise. The company is an SME based<br />
in Northern Ireland and serves a range of customers<br />
including multiple government agencies, utility providers<br />
and financial service firms. The company’s expertise<br />
in integrating multiple voice platforms and speech<br />
recognition systems is particularly relevant to the<br />
research work packages on Speech Technology within<br />
the Integrated Language Technologies track.<br />
SpeechStorm’s initial five-year commitment to <strong>CNGL</strong> was<br />
valued at €140K, which includes €80K worth of software<br />
services and 0.10 FTEs valued at €60K. SpeechStorm<br />
has to date interacted primarily through direct research<br />
engagements with the Speech Technology groups at<br />
UCD and TCD.<br />
Symantec<br />
Symantec is a global forerunner in the provision of<br />
solutions to help individuals and enterprises assure<br />
the security, availability and integrity of their digital<br />
information. The Symantec Shared Engineering Services<br />
group is responsible for company-wide localisation<br />
management along with on-going research and<br />
development efforts.<br />
Symantec’s primary areas of localisation-related research<br />
focus on machine translation, MT customer satisfaction<br />
studies, and techniques to enhance Rule-Based MT<br />
(RBMT) performance. During <strong>2012</strong> Symantec funded an<br />
additional PhD and postdoctoral research in the area of<br />
natural language parsing.<br />
Symantec’s initial commitments to <strong>CNGL</strong> have been<br />
exceeded, valued at €2.25M comprised of €2.0M worth<br />
of multiple translation memories and 2.15 FTEs valued<br />
at €225K. <strong>CNGL</strong> has seen additional commitments<br />
of content and translation memory resources from<br />
the company during <strong>2012</strong>. Symantec has also helped<br />
the researchers with specification of use scenarios for<br />
Demonstrator Systems and provided cash contributions<br />
to further the research and development in the area<br />
of Domain Adaption and Personalised Multilingual<br />
Customer Care.<br />
VistaTEC<br />
VistaTEC is a supplier of premier quality Translation,<br />
Linguistic Review and other language-related business<br />
services to leading high-tech companies throughout<br />
the world. Its sophisticated service delivery platforms<br />
contribute significant value to customers by providing<br />
them with enterprise solutions which are: scalable, time<br />
efficient, cost effective, synergistic and innovative.<br />
As a prominent provider of Language Services, VistaTEC<br />
has committed to an extensive programme of Research<br />
and Development that ensures that the firm remains at<br />
the forefront of the localisation industry and can offer<br />
its customers the pinnacle of added value. VistaTEC<br />
is a founding Industrial Partner of the Centre for Next<br />
Generation Localisation. VistaTEC’s research activities<br />
during <strong>2012</strong> have centred on the area of Text Analytics<br />
for translation review and the company has contributed<br />
to this research in terms of providing large testing data<br />
and access to human translation quality review. This<br />
commitment from VistaTEC has resulted in a very<br />
successful commercialisation of the research.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 93<br />
New Industrial Partnerships<br />
In the past year <strong>CNGL</strong> has continued with its extensive<br />
programme of industry outreach, following a strategy<br />
of targeting specific industry verticals where <strong>CNGL</strong> has<br />
developed robust, rapidly transferable expertise. This<br />
has resulted in one-on-one discussions with an array of<br />
companies and a number of new industrial collaborations<br />
that serve to extend the reach of our activities and help<br />
diversify funding to complement the initial investment<br />
made by Science Foundation Ireland. As a result of these<br />
activities we were pleased to welcome Intel/McAfee,<br />
our as our newest industrial partner, to <strong>CNGL</strong>’s research<br />
consortium starting in 2013.<br />
The industrial outreach efforts of <strong>CNGL</strong> emphasise two<br />
main pillars:<br />
Mr. Phil Ritchie presents Digital Linguistics at the <strong>CNGL</strong> Spinout<br />
Showcase in September <strong>2012</strong><br />
Welocalize<br />
} Ireland as a centre of excellence for high-value R&D<br />
(top-20 globally) with a critical mass of industry<br />
participants and ancillary activities<br />
} <strong>CNGL</strong> has a critical mass of applied academic research<br />
expertise in localisation and related industries which is<br />
valuable for partners and collaborators.<br />
Welocalize became the tenth industrial partner of <strong>CNGL</strong><br />
during 2011. Welocalize was founded in 1997, and is a<br />
privately-held, venture-backed company. Welocalize<br />
has more than 500 employees in 11 offices located<br />
in the USA, UK, Ireland, Germany, China and Japan.<br />
Clients include eight of the world’s top ten software<br />
and hardware companies. Welocalize provides nextgeneration<br />
translation supply chain management that<br />
delivers market-ready, translated content – when and<br />
where users demand – at a higher output, a faster<br />
pace and an affordable price. Welocalize supports<br />
organisations throughout the entire global content<br />
lifecycle, from authoring and product development,<br />
translation and quality assurance, to complete business<br />
process outsourcing and market validation.<br />
In conjunction with our industry outreach efforts, we<br />
have launched the <strong>CNGL</strong> Collaboration Framework,<br />
which provides mechanisms for new partners to<br />
engage with the Centre. This collaboration framework<br />
is designed to foster the flow of information among<br />
trusted partners while at the same time respecting the<br />
intellectual property obligations set forth by the <strong>CNGL</strong><br />
Collaborative Research Agreement. There are three<br />
broad types of classified collaboration opportunities set<br />
out: Full Members, Collaborators and Associates.<br />
Figure 10: <strong>CNGL</strong> Collaboration Framework<br />
Welocalize’s contribution to <strong>CNGL</strong> will be in terms<br />
of software development resources and supporting<br />
researchers with access to GlobalSight, a collaborative,<br />
flexible and sustainable translation management system.<br />
Welocalize was a key supporter during <strong>2012</strong> of the<br />
successful TIDA application which secured €98K funding<br />
for research in the area of rapid retraining of machine<br />
translation systems.
94<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />
Full Members<br />
Full Members are both industrial and academic partners<br />
who have agreed to be bound by the terms of the<br />
<strong>CNGL</strong> Collaborative Research and IP Agreements. Full<br />
Membership is available on a limited basis to third<br />
parties, who have a long-term strategic interest in <strong>CNGL</strong><br />
and the wherewithal to contribute substantial resources<br />
to on-going research activities within the Centre. Full<br />
Membership provides preferential IP access, Committee<br />
Membership, and direct access to researchers and staff<br />
within <strong>CNGL</strong>.<br />
Associate Members<br />
Associate Membership provides a springboard for<br />
organisations that may be interested in establishing<br />
deeper ties with <strong>CNGL</strong>. In exchange for a small<br />
membership fee, associates are granted an array of<br />
benefits, the most noteworthy being access to the prescreened<br />
<strong>CNGL</strong> publication stream. While Associates<br />
are not granted preferential access to IP generated in<br />
<strong>CNGL</strong>, it is expected that this group will play a critical<br />
role in the commercialisation and licensing of emerging<br />
technologies.<br />
Commercialisation<br />
<strong>CNGL</strong> is entering its fifth year with a rich pipeline of<br />
business opportunities. Previously, in order to support<br />
the maturation of our commercial pipeline, the<br />
management of <strong>CNGL</strong> placed significant emphasis on<br />
developing the Centre’s entrepreneurial ecosystem. In<br />
<strong>2012</strong>, as part of our Commercialisation Strategy, <strong>CNGL</strong><br />
initiated a comprehensive outbound effort to engage<br />
with the broader entrepreneurial ecosystem. This effort<br />
was made possible through the continued support of the<br />
Enterprise Ireland Commercial Development Manager<br />
(CDM) programme. The CDM programme has provided<br />
<strong>CNGL</strong> with a full-time staff member who focuses<br />
specifically on partnering strategies, open innovation<br />
initiatives, fund-raising and business development<br />
activities within the Centre.<br />
Mr. Tony O’Dowd of <strong>CNGL</strong> spinout Xcelerator Machine Translations<br />
discusses the KantanMT product with Mr. Steve Gotz, <strong>CNGL</strong><br />
Commercial Development Manager<br />
Collaborators<br />
Collaborators engage directly with <strong>CNGL</strong> on issues of<br />
strategic importance to them. Collaborators can be<br />
both industrial and academic entities that are either<br />
1) a <strong>CNGL</strong> Full Member who has sponsored a specific<br />
research project or 2) a legal entity not previously<br />
affiliated with <strong>CNGL</strong>. Collaborator projects are governed<br />
under separate and individual Collaborative Research, IP<br />
and Confidentiality Agreements, which provide a range<br />
of structural options. While collaborators operate under<br />
separate agreements, there is a benefit to integrating<br />
them under the broader <strong>CNGL</strong> umbrella, thereby<br />
facilitating valuable interactions and sharing of expertise.<br />
<strong>CNGL</strong> finished <strong>2012</strong> with two actively trading spinout<br />
companies: Xcelerator Machine Translation Solutions<br />
and Scream Technologies. To date these companies<br />
have raised a combined €1.25M in venture capital<br />
funding from an array of investors including Delta<br />
Partners, Enterprise Ireland as well as two private family<br />
offices. The companies are projecting the creation of<br />
over 25 private-sector jobs in the coming year.<br />
Scream Technologies<br />
Scream Technologies is a <strong>CNGL</strong> spinout company that<br />
specialises in creating synthetic voices from human<br />
actors, enabling companies to create human-sounding<br />
synthetic speech and control how it sounds. The service,<br />
which can run as a standalone installation, embedded<br />
solution or web application, has valuable applications in<br />
areas as diverse as video games, customer support and<br />
advertising.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 95<br />
Scream Technologies is currently located in DogPatch<br />
Labs Dublin, a startup incubator funded by Polaris<br />
Venture Partners. The company is promoted by Dr.<br />
Peter Cahill, a funded <strong>CNGL</strong> researcher, who joined<br />
the company full-time in <strong>2012</strong>. During <strong>2012</strong> Dr. Cahill<br />
was named one of Ireland’s top technology and startup<br />
leaders by well-known entrepreneurs Dylan Collins and<br />
Sean Blanchfield.<br />
Xcelerator Machine Translation Solutions<br />
The Localisation Service Market generates over US$20BN<br />
in annual revenues and has a robust and resilient<br />
annual growth rate of 7.5%. However, while the demand<br />
for translation services is surging upwards, there is<br />
downward pressure on prices and reducing margins. This<br />
is coupled with a demand by customers for shortened<br />
turnaround cycles for translation projects (shortened<br />
project cycles). Essentially, clients want more for less,<br />
faster.<br />
SME Collaboration Spotlight<br />
Reverbeo is a startup company using technology<br />
to help companies of all sizes growth their global<br />
audience. The company’s novel technology is able<br />
to harvest monolingual websites, translate them<br />
into multiple languages using a range of services<br />
including machine translation, crowd-sourcing and<br />
professional translators, and ultimately republish<br />
them with minimal effort.<br />
During <strong>2012</strong>, while the company participated<br />
in the NDRC Launchpad Programme, a team of<br />
<strong>CNGL</strong> researchers worked with the founders to<br />
help refine their minimum viable product and<br />
extend their product development roadmap.<br />
During 2013 <strong>CNGL</strong> is expanding the collaboration<br />
with Reverbeo, supported by Enterprise Ireland,<br />
and applying <strong>CNGL</strong> expertise to the challenge of<br />
domain-tuned machine translation systems.<br />
Professional translators need to explore new ways of<br />
improving productivity and reducing project turnaround<br />
times whilst maintaining exacting quality standards and<br />
linguistic consistency for their clients. The downward<br />
pressure on pricing and restraints on client budgetary<br />
plans makes this a daunting challenge. Xcelerator is a<br />
spin-out, promoted by Tony O’Dowd, which is developing<br />
software solutions to help professional translators<br />
address these challenges head-on; improving quality and<br />
consistency, and reducing project turnaround times and<br />
costs.<br />
Beyond startups, <strong>CNGL</strong> research and expertise<br />
has helped a range of external companies which<br />
are launching new products and services. These<br />
collaborations have leveraged crucial Enterprise Ireland<br />
funding schemes (Innovation Partnerships, Innovation<br />
Vouchers, Commercialisation Fund) to bridge the<br />
gap between research and the market. During <strong>2012</strong><br />
DigitalLinguistics, a <strong>CNGL</strong> licensee, launched its<br />
first product: ReviewSentinel. The product leverages<br />
core <strong>CNGL</strong> research in the area of text analytics to<br />
automatically perform linguistic quality assurance<br />
testing in a scalable and cost-efficient manner.<br />
Intellectual Property Management<br />
There are three agreements providing the legal<br />
framework in which the <strong>CNGL</strong> operates. The Funding<br />
Agreement outlines the financial arrangements between<br />
SFI and the lead institution. The IP Agreement outlines<br />
how IP is managed within <strong>CNGL</strong>, and the Collaborative<br />
Research Agreement is the all-encompassing agreement<br />
on how the programme is governed and managed.<br />
One of the core missions of <strong>CNGL</strong> is excellence in<br />
research, expanding the state-of-the-art through<br />
dissemination of research results. At the same time,<br />
<strong>CNGL</strong> is required to protect valuable IP and make it<br />
available for commercial exploitation. This needs careful<br />
management and our researchers operate under a<br />
publication code of practice. Before a paper is submitted<br />
to a conference it is uploaded to a publication tracking<br />
system and in turn emailed automatically to the <strong>CNGL</strong><br />
IP Committee to review for valuable IP. <strong>CNGL</strong> is a large<br />
research centre that generates over 100 publications<br />
each year and this is one of the ways in which all<br />
partners and PIs can identify IP across all research tracks.
96<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />
Another way to identify IP is through relationship and<br />
event-driven audits. Formal mechanisms are in place<br />
such as software disclosures and invention disclosures<br />
and publication reviews. Nonetheless, one of the best<br />
ways to identify IP is through continuous engagement<br />
with the researchers through both informal and formal<br />
meetings across the four universities. This engagement<br />
enables the IP team to identify patterns of activity across<br />
the research streams before publication of material.<br />
This also helps to promote awareness of IP at an early<br />
stage before the formal disclosures and helps the<br />
commercialisation team to bridge any gaps between<br />
research and industry.<br />
One of the mandates of <strong>CNGL</strong> is to diversify the funding<br />
base through affiliate collaborations. This presents certain<br />
challenges with regards to IP Management. Nevertheless,<br />
there is a framework in place that allows us to manage<br />
these collaborative projects in a way that protects the<br />
rights of the <strong>CNGL</strong> members as well as our affiliated<br />
partners. This year has seen a successful application of<br />
this framework across multiple EU FP 7 projects, IRCSETand<br />
EI-funded projects and direct industry funded<br />
engagements. The collaboration framework is designed<br />
to foster the flow and control of information between the<br />
affiliated project and the core <strong>CNGL</strong>, while at the same<br />
time respecting the IP obligations set forth by the original<br />
CRA.<br />
Spinouts panel at the <strong>CNGL</strong> Spring Scientific Committee Meeting,<br />
which took place in Dublin in May<br />
To facilitate successful implementation of our<br />
commercialisation strategy, we have been at the forefront<br />
of developing internal platforms that allow us to better<br />
collect, identify and manage all of the IP being generated<br />
by our researchers. This has been evident in the roll-out<br />
of a new product called LabJam that is currently in Beta.<br />
This system is designed to provide a more detailed view<br />
into our research streams and activities and to give our<br />
industry partners and SFI visibility into our innovation<br />
pipeline.
Management and<br />
Governance
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 99<br />
Management and Governance<br />
Management Overview<br />
The Centre for Next Generation Localisation believes<br />
that clear and simple Management and Governance<br />
structures are essential to ensure the scientific,<br />
commercial and operational success of the Centre. Our<br />
Management and Governance structures are designed to<br />
support a world-class research environment based on:<br />
} simple, effective and efficient planning and decision<br />
making<br />
} clear responsibility<br />
} open and transparent communication structures<br />
} balanced and comprehensive representation and<br />
involvement of all partners and stakeholders<br />
} provision of point of contact and procedures for<br />
conflict resolution<br />
} flexibility to respond quickly and appropriately<br />
to changing environments<br />
} structures and support for Intellectual Property<br />
management, Technology Transfer and commercial<br />
exploitation<br />
} regular appraisal of the scientific programme by<br />
international experts<br />
} regular appraisal of management and governance<br />
structures<br />
} reflecting best practice in management and<br />
governance of large collaborative research centres.<br />
The Centre Director, Prof. Josef van Genabith, provides<br />
overall scientific leadership and responsibility for<br />
the running of the Centre. A number of boards and<br />
committees support the Director in the management,<br />
integration and oversight of the Centre’s research and<br />
operations following the principles set out above. In<br />
particular, the research efforts of the Centre involve a<br />
considerable amount of cross-site collaboration and<br />
interdependency between our four academic and ten<br />
industrial partners. This requires a strong emphasis on<br />
cross-site coordination.<br />
The overall management and governance of the Centre is organised as follows:<br />
Figure 11: <strong>CNGL</strong> Governance and Management Structure
100<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT AND GOVERNANCE<br />
Research Co-Ordination<br />
The <strong>CNGL</strong> research programme is organised in a<br />
hierarchy of research tracks, work-packages, and subwork-packages.<br />
The four main research tracks relate<br />
to work in Integrated Language Technologies (ILT),<br />
Digital Content Management (DCM), Next Generation<br />
Localisation (LOC) and Systems Framework (SF). Within<br />
these four research tracks, the research programme is<br />
organised into 11 main work-packages, with individual<br />
research projects then organised in 50 sub-workpackages.<br />
Following this structure, co-ordination of the<br />
<strong>CNGL</strong> research activities operates across four interrelated<br />
levels:<br />
} CSET Coordination<br />
} Research Track Coordination<br />
} Main Work-package Coordination<br />
} Sub-Work-package Coordination<br />
Overall CSET Coordination is the responsibility of the<br />
Centre Director, Prof. Josef van Genabith. Research<br />
track coordination is the responsibility of the four Track<br />
Coordinators:<br />
} Integrated Language Technologies (ILT): Prof. Nick<br />
Campbell, TCD<br />
} Digital Content Management (DCM): Prof. Vincent<br />
Wade, TCD<br />
} Next Generation Localisation (LOC): Mr. Reinhard<br />
Schäler, UL<br />
} Systems Framework (SF): Dr. Saturnino Luz, TCD<br />
Each of the eleven main work-packages within the<br />
four research tracks has a work-package co-ordinator<br />
who liaises with the relevant research track leader. The<br />
structure of the four research tracks, 11 main workpackages<br />
and 50 individual sub-work-packages is shown<br />
below:<br />
Figure 12: <strong>CNGL</strong> Research Organisation
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 101<br />
Integration Committee<br />
The <strong>CNGL</strong> research programme is highly collaborative,<br />
with two basic (ILT & DCM) and two applied (LOC<br />
& SF) research tracks and a demonstrator systems<br />
programme centred around shared use scenarios and<br />
demonstrator systems. Given the level of research coordination<br />
and integration across the four research tracks<br />
and main work-packages, and the level of integration<br />
involved in building demonstrator systems from research<br />
outputs, the <strong>CNGL</strong> Integration Committee is the main<br />
body dealing with the operations of the <strong>CNGL</strong> with<br />
particular emphasis on scientific matters. The Integration<br />
Committee is composed of the Centre Director (who<br />
chairs the committee), the Associate Director, all four<br />
track leaders, Prof. Julie Berndsen from UCD, and<br />
a representative of each industry partner to ensure<br />
maximum engagement of industry partners in oversight<br />
of the research programme. The Integration Committee<br />
meets on a bi-monthly schedule, with additional ad-hoc<br />
meetings called when necessary.<br />
Scientific Committee<br />
The <strong>CNGL</strong> Scientific Committee is comprised of all<br />
members of the Centre across all levels and functions.<br />
The full Scientific Committee typically meets twice every<br />
year in a two- or three-day plenary session to review and<br />
share research progress and outcomes. The meetings of<br />
the Scientific Committee also provide the opportunity<br />
for engagement with our International Collaborators and<br />
External Scientific Advisory Board.<br />
The inaugural <strong>CNGL</strong> Innovation Charette at the Spring Scientific<br />
Committee Meeting<br />
The <strong>CNGL</strong> Spring Scientific Meeting was held over<br />
two days (17th–18th May) at Chartered Accountants<br />
House near Trinity College Dublin. With participation<br />
from across the entire CSET and Industry Partners, the<br />
Meeting focused on discussion of the past and future<br />
of language and content research as well as ways to<br />
further catalyse collaboration with industry. The Meeting<br />
included presentations on key scientific areas including<br />
rapid-prototyping tools, personalised search using<br />
social media, and open-source localisation frameworks.<br />
It also featured demonstrations by <strong>CNGL</strong> spinout<br />
companies, along with a hands-on session on <strong>CNGL</strong>’s<br />
LabJam research activity platform, and the inaugural<br />
<strong>CNGL</strong> Innovation Charette. A charrette is an intense<br />
collaborative session designed to allow participants the<br />
opportunity to work together in a close setting to discuss<br />
real-world challenges and potential solutions. Following<br />
a vigorous period of interaction, each of the teams<br />
presented a three-minute pitch and the audience then<br />
had the opportunity to “invest” in the best ideas. The<br />
charette encouraged participants to imagine inspirational<br />
products that the Centre’s members could create with<br />
their knowledge, and it proved an excellent vehicle<br />
through which to foster imaginative thinking.<br />
<strong>CNGL</strong> researchers and industrial collaborators share their research<br />
highlights at the <strong>CNGL</strong> Spring Scientific Committee Meeting<br />
Due to significant Centre-wide planning for <strong>CNGL</strong>II<br />
and preparations for the <strong>CNGL</strong> Localisation Innovation<br />
Showcase in September, the Centre did not host an<br />
Autumn Scientific Committee Meeting in <strong>2012</strong>.
102<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT AND GOVERNANCE<br />
Operational Management<br />
Centre Operations Team<br />
The day-to-day implementation of the Centre’s<br />
operational decisions and policies, financial management,<br />
activity co-ordination, tracking and reporting is carried<br />
out by the Centre Operations Team in close co-operation<br />
with the Centre Director. The Centre Operations Team<br />
is led by Dr. Páraic Sheridan and meets weekly with<br />
the Centre Director and Deputy Director to continually<br />
monitor and prioritise activities across all operational<br />
functions, including finance, human resources, reporting,<br />
system administration and software and IP management.<br />
The composition of the Centre Operations team is as<br />
follows:<br />
} Dr. Páraic Sheridan, Associate Director<br />
} Ms. Hilary McDonald, Project Manager<br />
} Mr. Steve Gotz, Commercial Development Manager<br />
} Mr. Stephen Roantree, IP Manager (departed <strong>CNGL</strong><br />
in Quarter 2 <strong>2012</strong>)<br />
} Ms. Sophie Matabaro, Centre Administrator (on<br />
maternity leave from September <strong>2012</strong>)<br />
} Mr. Joachim Wagner, Systems Administrator<br />
} Ms. Fiona Maguire, Financial Administrator<br />
} Ms. Eithne McCann, Centre Secretary<br />
} Ms. Cara Greene, Education and Outreach Manager<br />
} Ms. Laura Grehan, Marketing and Communications<br />
Officer<br />
Mr. Stephen Roantree departed from his position as<br />
<strong>CNGL</strong> Intellectual Property Manager during Quarter 2,<br />
to take up a senior management role with Lionbridge,<br />
based in Dublin. Stephen now leads Lionbridge’s Quality<br />
and Innovation, Engineering, Testing, DTP and Web<br />
Publishing Groups. He continues to engage with <strong>CNGL</strong>.<br />
In addition to the day-to-day work of the Centre<br />
Operations team in executing the operational policies<br />
and activities of the <strong>CNGL</strong>, several Management Boards<br />
and Committees provide direction and prioritisation of<br />
the Centre’s various activities.<br />
Management Committee<br />
The Management Committee is the <strong>CNGL</strong>’s decision<br />
making body and provides leadership, policy, strategy,<br />
resource allocation, performance monitoring and<br />
review, management of CSET membership, and conflict<br />
resolution. The Management Committee meets quarterly<br />
and is chaired by the Centre Director. Its membership is<br />
made up of the Centre’s Co-Principal Investigators and,<br />
although Industry Partner representatives are invited to<br />
participate in Management Committee meetings, they<br />
do not hold a vote. The membership of the Management<br />
Committee for <strong>2012</strong> included:<br />
} Prof. Josef van Genabith, DCU (Director) [Chair]<br />
} Prof. Vincent Wade, TCD (Deputy Director)<br />
} Prof. Nick Campbell, TCD<br />
} Mr. Reinhard Schäler, UL<br />
} Dr. Saturnino Luz, TCD<br />
Education and Outreach Board<br />
The Education and Outreach Board provides leadership,<br />
policy and strategy, objectives and resource allocation<br />
for the Centre’s Education and Outreach Programme.<br />
The Education and Outreach Board meets quarterly<br />
and reports to the <strong>CNGL</strong> Management Committee.<br />
The Board is chaired by the Education and Outreach<br />
Manager, and consists of participants from the academic<br />
participants who have funded E&O Programmes<br />
(TCD and UL) and one nominee from the Industrial<br />
participants in the Centre. The membership of the<br />
Education and Outreach Board in <strong>2012</strong> included:<br />
} Ms. Cara Greene, DCU [Chair]<br />
} Dr. Páraic Sheridan, DCU<br />
} Mr. Karl Kelly, UL<br />
} Dr. Seamus Lawless, TCD<br />
} Ms. Laura Grehan, DCU<br />
} Dr. Fred Hollowood, Symantec
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 103<br />
IP Management Board<br />
The IP Management Board manages the Intellectual<br />
Property of the Centre and facilitates Technology<br />
Transfer and commercial exploitation of IP generated<br />
by the Centre. The IP Management Board advises the<br />
Centre on all IP issues and, in particular, evaluates<br />
proposed publications and invention disclosures in<br />
accordance with the Centre’s IP agreement. The<br />
IP Management Board meets quarterly. The IP<br />
Management Board consists of nominees from each of<br />
the participating university and industrial partners plus<br />
the Centre’s Associate Director. It was chaired by the IP<br />
Manager, Mr. Stephen Roantree until his departure from<br />
<strong>CNGL</strong> to Lionbridge in Quarter 2. The IP Management<br />
Board membership draws on academic membership both<br />
from the Research Leaders (Co-PIs) and representatives<br />
from the respective University Technology Transfer<br />
Offices (TTOs). The IP Management Board reports to the<br />
Management Committee.<br />
External Oversight<br />
Following SFI guidelines and best practice for the<br />
oversight and governance of large research centres,<br />
<strong>CNGL</strong> has two external advisory and oversight boards<br />
that meet regularly to review the scientific and<br />
operational progress of the Centre.<br />
Mr. Steve Gotz, <strong>CNGL</strong> Commercial Development Manager, H.E. Mr. John<br />
Neary, Ambassador of Ireland to Japan, Dr. Páraic Sheridan, Associate<br />
Director, <strong>CNGL</strong>, and Ms. Diane Foley, IDA Ireland Deputy-Director<br />
Japan. <strong>CNGL</strong> delivered a seminar to Japanese businesses in Tokyo in<br />
April, which was hosted by the Irish Ambassador to Japan and facilitated<br />
by IDA Ireland’s Japan Office.<br />
External Scientific Advisory Board<br />
Mr. Stephen Roantree, previously <strong>CNGL</strong> Intellectual Property Manager,<br />
now with Lionbridge Dublin<br />
Commercialisation Committee<br />
The Commercialisation Committee promotes and<br />
oversees the agenda of research commercialisation,<br />
which is a core part of the Centre’s strategy. The<br />
Committee meets on a quarterly basis and its meetings<br />
are co-located with meetings of the IP Management<br />
Board and the Industry Advisory Board at <strong>CNGL</strong> Industry<br />
Partner sites.<br />
The External Scientific Advisory Board provides review of<br />
the long-term scientific direction, impact and progress of<br />
the Centre. It advises, challenges and provides guidance<br />
to the Management Committee on both the overall<br />
scientific goals and objectives of the Centre as well as on<br />
the on-going management of the Centre. The External<br />
Scientific Advisory Board aims to meet bi-annually and<br />
work in close co-operation with the Executive Committee<br />
and the Centre Director. The <strong>CNGL</strong> External Scientific<br />
Advisory Board consists of recognised world leaders from<br />
both academia and industry in the fields of Language<br />
Technology, Machine Translation, Speech, Adaptive<br />
Hypermedia, Information Retrieval, and Localisation.<br />
The External Scientific Advisory Board is chaired by an<br />
expert from the area of Localisation, Mr. Francis Tsang.<br />
Mr. Tsang is Director of Globalisation at Adobe Systems<br />
Inc. He is responsible for the strategy and delivery of all<br />
localised Adobe product releases and the development<br />
of tools and libraries in the internationalisation area. Mr.<br />
Tang has spent the last twenty years building software<br />
for various international markets. He holds degrees in<br />
computing and business management.
104<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
MANAGEMENT AND GOVERNANCE<br />
The <strong>CNGL</strong> External Scientific Advisory Board actively<br />
participates in the bi-annual <strong>CNGL</strong> Scientific Committee<br />
meetings and reports back to the Centre Director<br />
and Management Committee. The board is currently<br />
composed of the following members:<br />
} Mr. Francis Tsang, Adobe Corporation, USA<br />
[Localisation] (Chair)<br />
} Dr. Andrew Bredenkamp, Acrolinx GmbH, Germany<br />
[Language Technology]<br />
} Prof. Lauri Karttunen, PARC, USA [Language<br />
Technology]<br />
} Prof. Makato Nagao, President, NIST, Japan<br />
[Machine Translation]<br />
} Prof. Carol Espy-Wilson, University of Maryland, USA<br />
[Speech Technology]<br />
} Prof. Peter Brusilovsky, University of Pittsburgh, USA<br />
[Adaptive Hypermedia]<br />
} Prof. Elizabeth Liddy, Syracuse University, USA<br />
[Information Retrieval and NLP]<br />
} Dr. Mike Dillinger, Principal, TOPs Globalization<br />
Consulting<br />
External Oversight Board<br />
In accordance with SFI requirements, the President of<br />
DCU as the host institution has appointed an External<br />
Oversight Board to help with the oversight and<br />
assessment of the Centre’s progress. The Oversight Board<br />
reports to SFI on a quarterly basis. The Oversight Board<br />
is composed of members drawn from a mix of academic<br />
partners, a representative from the <strong>CNGL</strong> Industry<br />
Partners, and other external independent members.<br />
The board currently consists of the following members:<br />
} Mr. David MacDonald [Chair]<br />
} Prof. Josef van Genabith, Centre Director<br />
} Prof. Alan Harvey (VP Research, DCU)<br />
} Prof. Vinny Cahill (Dean of Research, TCD)<br />
} Mr. Gearóid Mooney (Enterprise Ireland)<br />
} Mr. Aidan Sweeney (IBEC)<br />
In addition to the full members of the External<br />
Governance Board (which included Centre Director Prof.<br />
Josef van Genabith), <strong>CNGL</strong> is represented at quarterly<br />
meetings by:<br />
} Prof. Vincent Wade, Deputy Director<br />
} Dr. Páraic Sheridan, Associate Director<br />
The Oversight Board met quarterly during <strong>2012</strong> to review<br />
<strong>CNGL</strong> progress against its scientific and operational<br />
targets to review Key Performance Indicators (KPIs) and<br />
report back to SFI.<br />
<strong>2012</strong> Significant Accomplishments<br />
In the fifth year of the Centre for Next Generation<br />
Localisation, the following management and governance<br />
accomplishments have been recorded:<br />
} <strong>CNGL</strong> successfully passed its SFI Final Review and<br />
succeeded in its application for a second cycle of<br />
funding from SFI. The Review and funding application<br />
appraisal were conducted over two days in July at<br />
Trinity College Dublin. The review panel, which<br />
comprised senior figures from industry and academia,<br />
assessed the Centre’s performance and future<br />
potential on a range of criteria, including scientific<br />
excellence and social and economic impact. In its<br />
report the panel stated that “<strong>CNGL</strong> successfully built<br />
the infrastructure for a fully functioning, professional<br />
research centre, including strong capabilities in<br />
overall research direction, reporting, professional<br />
administration, outreach, budget allocation, and<br />
more.” The panel also acknowledged the Centre’s<br />
“mature change management approach” and its<br />
“forward-thinking, strong IP management and tech<br />
transfer capability”, and “was impressed by the<br />
educational outreach at all levels”.<br />
} The Centre Operations Team performed excellently<br />
the challenging task of final reporting for <strong>CNGL</strong>I<br />
alongside providing significant input into preparation<br />
of the <strong>CNGL</strong>II proposal and coordinating the Site Visit<br />
of the review panel in July. The Site Visit included an<br />
exhibition of posters and demos of <strong>CNGL</strong> research<br />
to date. This substantial additional workload was<br />
managed while still maintaining quality delivery of the<br />
day-to-day operations of the Centre and roll-out of a<br />
number of new initiatives in the areas of education<br />
and outreach, commercialisation, reporting and<br />
finance.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 105<br />
Demonstrator showcase at the <strong>CNGL</strong> SFI Site Visit in July at Trinity College Dublin<br />
} The Centre Operations Team has continued to adapt<br />
to the evolving needs of the Centre with changes in<br />
<strong>2012</strong> reflecting in particular the greater emphasis on<br />
commercial engagement. The Associate Director and<br />
Commercial Development Manager spearheaded<br />
a coordinated campaign to attract new industrial<br />
collaborators. Supported by industry-facing Marketing<br />
and Communications resources, the campaign team<br />
stepped up the Centre’s presence at and input into<br />
key industry events in the Intelligent Content area<br />
and delivered pitches to high priority targets. The<br />
campaign has led to Intel signing up as Industry<br />
Partner for <strong>CNGL</strong>II and it has also generated a<br />
number of other promising active leads.<br />
Operational and Management plans for the coming<br />
year focus on ensuring smooth transition to <strong>CNGL</strong>’s<br />
second cycle of funding. Priorities include rollout of<br />
the Centre’s novel research programme centred on the<br />
Global Intelligent Content theme, attracting talented new<br />
recruits at all levels, establishing the Centre’s new Design<br />
and Innovation Lab, finalising and signing off on renewed<br />
IP and collaborative research agreements, and securing<br />
additional industry partners. There will also be significant<br />
input from the Centre Operations team into the running<br />
of SIGIR 2013 – The 36th <strong>Annual</strong> Conference of the ACM<br />
Special Interest Group on Information Retrieval. <strong>CNGL</strong><br />
will co-host SIGIR 2013 in Dublin in July-August 2013.<br />
} The ‘Localisation Innovation Showcase’ event<br />
collocated in Limerick with the LRC <strong>Annual</strong><br />
Conference in September was a huge success,<br />
drawing in more than 70 industry representatives<br />
from companies based in Ireland and abroad. The<br />
Showcase event included 10 individual stations of<br />
<strong>CNGL</strong> demonstrator systems as well as a multitude<br />
of research posters, industry partner booths, and<br />
display of education and outreach activities.
Education and Outreach
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 107<br />
Education and Outreach<br />
The <strong>CNGL</strong> Education and Outreach Programme<br />
encompasses a broad range of activities from internal<br />
communications and professional development, public<br />
relations and marketing to public-facing projects and<br />
education programmes to foster the next generation of<br />
professionals in content-related industries. We aim to<br />
raise the profile of scientific research within Ireland by<br />
highlighting education and career opportunities in key<br />
areas in the content field. Through <strong>CNGL</strong> carrying out<br />
world-class research and commercialisation activities,<br />
we are promoting Ireland as a global leader in the<br />
localisation industry. Below is an overview of activities<br />
under each Programme.<br />
Overview of <strong>CNGL</strong> Education and Outreach<br />
Reach and Impact<br />
Education and Human Capital Development<br />
Strategic Marketing and Communications<br />
Education and Human Capital Development<br />
The aim of our Education Programme is to provide<br />
education and promote career opportunities in key areas<br />
of content intelligence, computer science and language<br />
technology. We aim to engage young people in these<br />
areas to build a strong Irish base of future computer<br />
scientists in content related industries.<br />
<strong>CNGL</strong> offers a comprehensive programme of education<br />
programmes aimed at all age-groups ranging from<br />
courses for primary school students, secondary<br />
school programmes, undergraduate and postgraduate<br />
programmes, to internal professional development for<br />
our <strong>CNGL</strong> researchers and staff. Above is an overview<br />
of the education programme’s aims for each target level.<br />
Education and Human Capital Development<br />
Highlights from <strong>2012</strong><br />
Fourth Level Education: <strong>CNGL</strong> supports a number of<br />
seminar series across individual component research<br />
disciplines, including a popular series with the National<br />
Centre for Language Technology, seminars hosted by<br />
each of the member research groups and the Dublin<br />
Computational Linguistics Research Seminars series.<br />
<strong>CNGL</strong> operates internal member-focused training<br />
programmes on presentation skills, Intellectual Property,<br />
commercialisation and entrepreneurship and project<br />
management. <strong>CNGL</strong> also provides “101” sessions for all<br />
staff on key <strong>CNGL</strong> topics. PhD students are also given<br />
opportunities to undertake an internship with industry<br />
partners.<br />
Eleven visiting MSc and PhD interns joined ILT<br />
over a period of five months in <strong>2012</strong>, under <strong>CNGL</strong>’s<br />
postgraduate internship programme. The programme<br />
enables students to gain valuable experience as part of a<br />
highly-regarded and continually-growing research centre.<br />
This year’s programme attracted interns from institutions<br />
across the globe, including Italy, France, China and India.<br />
The internships covered a wide range of topics in Natural<br />
Language Processing and Machine Translation.<br />
Education Programme Aims and Targets<br />
Encourage ICT and Language awareness<br />
Promote study of STEM disciplines<br />
Promote focus on <strong>CNGL</strong><br />
research topics<br />
Preparing graduates<br />
for careers<br />
Career Opportunities<br />
Primary Level<br />
Second Level<br />
Third Level<br />
Fourth Level<br />
In partnership with DCU and the National Centre for<br />
Language Technology, <strong>CNGL</strong> was successful in a Marie<br />
Curie Mobility grant application for the EXPERT PhD<br />
Graduate School with a total of 15 PhD Marie Curie<br />
fellowships (two of them at DCU) and three postdoctoral<br />
researchers. EXPERT comprises DCU and five other<br />
university partners and five industry partners. It focuses<br />
on empirical approaches to (machine) translation, and<br />
as part of their training PhD students will spend time at<br />
DCU’s EXPERT university and industry partners.
108<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
EDUCATION AND OUTREACH<br />
Table 1<br />
Project name Student Supervisor Track<br />
Using Biometric Response to Locate Personally Interesting Digital Content Robert Lis Liadh Kelly DCM<br />
Implementing new methods for speech retrieval Tom Mason Liadh Kelly DCM<br />
Exploring Personalised and Collaborative Information Retrieval Paul Redmond Liadh Kelly DCM<br />
Visualisation of Topic Models Conor O’Gorman Liadh Kelly DCM<br />
Crowd-sourcing for query development and relevance judgment Ciaran Porter Liadh Kelly DCM<br />
Communications and Education Siobhan O’Mara Cara Greene/<br />
Laura Grehan<br />
E&O<br />
Facial recognition for real-time content personalisation (Kinect) Thomas Dunne Steve Gotz CM<br />
Facial recognition for real-time content personalisation (Kinect) Emer Hedderman Steve Gotz CM<br />
Building ontology-based content management (OCM) system<br />
James Mark<br />
Hender<br />
Yalemisew<br />
Abgaz<br />
DCM<br />
Generation of interactive infographics from Semantic and Open Data Erika Duriakova Alex O’Connor DCM<br />
Yodle – Generating Presentations from Wikipedia Alla Kovaleva Alex O’Connor DCM<br />
Query-biased summarization Shane McQuillan Gareth Jones DCM<br />
Communications and Education Siobhan Swords Cara Greene/<br />
Laura Grehan<br />
E&O<br />
Real-time Web Annotation Kristo Mikkonen Dominic Jones E&O<br />
Economic Commission for Africa at its Information<br />
Training Centre for Africa in Addis Ababa, Ethiopia.<br />
The aim of the programme is to promote African<br />
languages in the Information Society.<br />
Finally, the LRC Best Thesis Award <strong>2012</strong> was presented<br />
in September to former <strong>CNGL</strong> PhD student Ben<br />
Steichen for his thesis “Adaptive Retrieval, Composition<br />
& Presentation of Closed-Corpus and Open-Corpus<br />
Information”. Katrin Drescher of Award-sponsors<br />
Symantec praised the scientific excellence and industrial<br />
relevance of Ben’s work.<br />
Ms. Aida Opoku-Mensah of the United Nations Economic Commission<br />
for Africa (UNECA) speaks at the launch of University of Limerick’s MSc<br />
in Multilingual Computing and Localisation to be co-hosted by UNECA<br />
in Ethiopia.<br />
Another exciting development on the fourth level<br />
education front was the announcement in November<br />
that University of Limerick’s MSc in Multilingual<br />
Computing and Localisation is to be delivered through<br />
distance learning and co-hosted by the United Nations<br />
Third Level Education: The <strong>CNGL</strong> Undergraduate<br />
Internship Programme continued to attract top students<br />
in <strong>2012</strong>. The primary aim of the <strong>CNGL</strong> undergraduate<br />
internship is to offer exceptional undergraduate students<br />
the opportunity to participate in and contribute to<br />
exciting research projects at <strong>CNGL</strong>. The programme<br />
enables interns to use leading research facilities and<br />
we aim to inspire these students to take the first step<br />
on a path to a research career. It is also an important<br />
opportunity to promote taught Masters programmes<br />
at <strong>CNGL</strong> universities and to host interns at our
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 109<br />
industrial partners. The internships consist of INTRA/<br />
Co-op placements for 6 months, and 8-week summer<br />
internships. Many of those interns go on to do <strong>CNGL</strong>themed<br />
third and fourth year projects with <strong>CNGL</strong><br />
supervisors.<br />
<strong>CNGL</strong> hosted ten undergraduate interns across a wide<br />
range of research areas. Table 1 shows the list of <strong>2012</strong><br />
Undergraduate Summer internships.<br />
<strong>CNGL</strong> is currently creating an online graduate brochure<br />
aimed at third level students with information on the<br />
Taught Masters and PhD programmes available in each<br />
of the <strong>CNGL</strong> universities. The brochure also includes<br />
profiles of our graduated PhD students and former<br />
postdoctoral researchers. The profiles detail <strong>CNGL</strong><br />
alumni education and career paths since graduating<br />
from <strong>CNGL</strong>.<br />
Some of the 450 second level students completed the <strong>CNGL</strong>-supported<br />
‘ComputeTY’ programme at DCU in January <strong>2012</strong><br />
<strong>CNGL</strong> continued to support the ComputeTY <strong>2012</strong><br />
Programme in DCU. ComputeTY students select<br />
one of two streams: Web Design or Introduction to<br />
Programming. The overall content offers a broad<br />
range of computing skills from the creative aspect of<br />
website design to the problem-solving challenges of the<br />
programming stream. 450 students attended the course<br />
over 4 weeks in January <strong>2012</strong> with the same number due<br />
to complete the course in January 2013. Since its launch<br />
in 2005, ComputeTY has been completed by almost<br />
3,500 Transition Year students from Dublin schools.<br />
The programme has a strong track record of recruiting<br />
students to study computing at third level.<br />
<strong>CNGL</strong>’s undergraduate interns showcase the outcomes of their work<br />
at a poster and demo display at DCU<br />
Second Level Education: Secondary school students are<br />
a key demographic for the education programme with<br />
more than 1,500 secondary school students engaging<br />
with <strong>CNGL</strong> education programmes and competitions.<br />
<strong>CNGL</strong> aims to attract students to study fields related to<br />
content intelligence by running programmes that foster<br />
key problem-solving skills that are needed for<br />
this industry.<br />
The outstanding success of the Education Programme is<br />
the <strong>CNGL</strong> All Ireland Linguistics Olympiad (AILO). Over<br />
3,500 secondary school students from 167 schools in<br />
the Republic of Ireland and Northern Ireland have taken<br />
part in AILO since the first competition in 2009. The<br />
competition challenges secondary school students to<br />
apply logic and computational thinking to solve complex<br />
puzzles in unfamiliar languages. Past participants have<br />
gone on to pursue studies in computer science, maths<br />
and linguistics at third level, which suggests that the<br />
competition is meeting its goal of fostering the next<br />
generation of problem solvers.<br />
More than 400 students from 44 schools in 23 counties<br />
competed in the preliminary round of AILO <strong>2012</strong>. The top<br />
100 performers were allocated a <strong>CNGL</strong> researcher, who<br />
acted as a tutor for the national final in March at DCU.
110<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
EDUCATION AND OUTREACH<br />
The top four individual students went on to represent<br />
Ireland at the International Linguistics Olympiad (ILO) in<br />
Slovenia in July <strong>2012</strong>.<br />
Also targeting the second level market is <strong>CNGL</strong>’s<br />
Language Trap: An Adaptive Language Learning Video<br />
Game. The game was initially designed to aid students<br />
in preparing for the Leaving Certificate German Oral<br />
Examinations by means of an iteractive dialogue<br />
system. The Irish language version of the game has<br />
been evaluated with schools. The German game is now<br />
available on http://seriousgames.cs.tcd.ie/.<br />
<strong>CNGL</strong> promoted its research and education programmes at the SFI<br />
booth at the BY Young Scientist & Technology Exhibition in January<br />
Education Programme plans for 2013 will focus on<br />
the transition to a second cycle of funding for <strong>CNGL</strong>,<br />
in which the Centre will pioneer the concept “Global<br />
Intelligent Content”. Opportunities in this area include<br />
app competitions for secondary school students, and<br />
establishing a Masters programme in Intelligent Content.<br />
Strategic Marketing and Communications<br />
Ms. Mary Mitchell-O’Connor, T.D. attended the national final of the All<br />
Ireland Lingusitics Olympiad at DCU. Deputy Mitchell-O’Connor urged<br />
students to use their aptitude for problem-solving to pursue careers at<br />
the intersection of computing, language and linguistics<br />
The <strong>CNGL</strong> Education Programmes are complemented<br />
by Transition Year internships in the <strong>CNGL</strong> labs and<br />
by the high-quality Careers Brochure focused on the<br />
commercial career opportunities at the intersection of<br />
Computing, Languages, Culture and Business. The guide<br />
was distributed to guidance counsellors in 729 secondary<br />
schools. <strong>CNGL</strong> exhibited at the BT Young Scientist <strong>2012</strong><br />
competition in the RDS in January <strong>2012</strong>. Students got<br />
the chance to try out demos and also test their problemsolving<br />
skills with AILO puzzles.<br />
<strong>CNGL</strong>’s Outreach Programme aims to highlight <strong>CNGL</strong><br />
achievements, to engage with the public and to promote<br />
Ireland as a world leader in localisation. The programme<br />
spans public relations and marketing, to hosting industry<br />
and academic events, publishing ‘Localisation Focus – the<br />
International Journal of Localisation’, and attending the<br />
BT Young Scientist and Technology Exhibition.<br />
Strategic Marketing and Communications<br />
Higlights from <strong>2012</strong><br />
<strong>CNGL</strong> has raised its media profile with 84 media<br />
mentions recorded in <strong>2012</strong>. The <strong>CNGL</strong> newsletter<br />
was published on quarterly basis and has proved an<br />
effective means through which to communicate <strong>CNGL</strong><br />
news, events, success stories and researcher profiles to<br />
government, media, industry and academic stakeholders.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 111<br />
VOL 01 ISSUE<br />
07<br />
QUARTER 2 <strong>2012</strong><br />
Mr. Steve Gotz, <strong>CNGL</strong><br />
Commercial Development<br />
Manager, H.E. Mr. John<br />
Neary, Ambassador of<br />
Ireland to Japan, Dr.<br />
Páraic Sheridan, Associate<br />
Director, <strong>CNGL</strong>, and Ms.<br />
Diane Foley, IDA Ireland<br />
Deputy-Director Japan<br />
<strong>CNGL</strong>News<br />
News<br />
QUARTERLY NEWSLETTER OF THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>)<br />
this issue<br />
Headline News P.1-3<br />
Partnerships & Commercialisation P.4-5<br />
Education & Outreach P.6<br />
Research Track Updates P.7-8<br />
News: In Brief P.9<br />
<strong>CNGL</strong> People P.10<br />
Conferences & Workshops P.11-12<br />
Upcoming Events P.13<br />
Irish Ambassador to Japan hosts <strong>CNGL</strong> Seminar in Tokyo<br />
T<br />
Subscribe to<br />
<strong>CNGL</strong>News!<br />
represented at the event by Mr. Steve<br />
he Centre for Next Generaon Localisaon (<strong>CNGL</strong>) delivered a with Dai Nippon Prinng (DNP). The Tokyo seminar aracted representaves from a further 14 Japanese-based<br />
Gotz, <strong>CNGL</strong>’s Commercial Development<br />
Manager. The seminar concluded<br />
with a networking recepon, which<br />
seminar to Japanese businesses companies, who were<br />
“The event aracted<br />
produced some promising leads.<br />
in Tokyo in April, which was hosted by the Irish Ambassador to Japan and officially welcomed by H.E.<br />
Mr. John Neary, many new contacts for<br />
Japan”<br />
Mr. Derek Fitzgerald of IDA Ireland<br />
aracted<br />
commented that the event facilitated by IDA Ireland’s Japan Office. Ambassador of Ireland to IDA Ireland in variety of high-level<br />
a good professionals and, in many cases,<br />
Japan.<br />
The event, which was held at the aimed to<br />
these were new contacts for IDA<br />
residence, Mr. Derek Fitzgerald, IDA<br />
Ambassador’s Ireland in Japan.<br />
highlight opportunies for Japanese companies to engage with <strong>CNGL</strong>’s team of more than 150 researchers and to reinforce Ireland’s status as a world and Ireland Director Japan and<br />
Ms. Diane Foley, IDA Ireland Deputy-Director Japan presented an overview of Irish research and development.<br />
The seminar marked the start of a<br />
series of meengs which <strong>CNGL</strong> aended<br />
with individual companies, including<br />
leader in the fields of localisaon Dr. Páraic Sheridan, <strong>CNGL</strong> Associate partner DNP, in Japan over two weeks in<br />
global content.<br />
Director, introduced aendees to <strong>CNGL</strong>’s<br />
also April.<br />
<strong>CNGL</strong> already has strong links with research programme. <strong>CNGL</strong> was Japan through its industry partnership C<br />
- Mr. Derek Fitzgerald,<br />
IDA Ireland Director, Japan<br />
<strong>CNGL</strong>: Contributing to a Strategic Research Agenda for Europe<br />
<strong>CNGL</strong> Director advocates support of technologies for data access across languages<br />
assessing the key<br />
aended by 1,150 delegates. A further 4,000 idenfying and challenges to delivering the benefits of a<br />
NGL is influencing Europe’s<br />
people followed through live web stream and through digital society and economy to Europe’s<br />
23<br />
strategic research agenda<br />
more than 1,000 contributed acvely cizens”, says van Genabith. “With through its engagement with<br />
official languages in the EU alone, it is<br />
the Digital Agenda for Europe iniave.<br />
social media.<br />
vital that we connue to develop<br />
Digital Agenda<br />
Headed by European Nellie Kroes, Digital Agenda is<br />
technologies to enable cizens and<br />
<strong>CNGL</strong> Director, Prof. Josef van Genabith last month addressed the European Commission’s Digital Agenda Assembly <strong>2012</strong> (DAA12) on the Commissioner Europe’s strategy for a flourishing digital<br />
economy by 2020. It outlines policies and<br />
to maximise the benefits of the Digital<br />
companies to access digital content in<br />
their own language”, adds van Genabith.<br />
A recorded stream of DAA12 will be<br />
substanal benefits to be derived from of advanced<br />
acons Revoluon for all, and will help to shape the<br />
Programme for<br />
available shortly at daa.ec.europa.eu<br />
the development for the access and EU’s Horizon 2020 Framework Innovaon.<br />
technologies exploitaon of data across languages. The Assembly, which was hosted in<br />
Brussels on 21st-22nd June, was Research and “<strong>CNGL</strong> is among the stakeholders involved in<br />
www.cngl.ie<br />
1<br />
The Centre’s international reach has been enhanced<br />
through closer engagement with international<br />
organisations including the Globalization and<br />
Localization Association (GALA). A new industry<br />
prospectus is in production, and this will support the<br />
Centre’s drive to attract additional industry partners and<br />
clients.<br />
The <strong>CNGL</strong> quarterly newsletter, available in both e-zine and print format<br />
The Marketing and Communications Officer has worked<br />
closely with the Centre’s Commercial Development<br />
Manager to strenghten industry outreach efforts.<br />
Significant progress has been made on the customer<br />
relationship management front, including further<br />
development of <strong>CNGL</strong>’s mailing list, which now includes<br />
over 2,000 subscribers. <strong>CNGL</strong> exhibited at a significant<br />
number of industry and commercialisation events,<br />
including Localization World (in Seattle, USA in October),<br />
DCU Tech Transfer Exhibition in June, and Enterprise<br />
Ireland’s ‘Big Ideas’ showcase in November. <strong>CNGL</strong> also<br />
presented a panel on Global Content Intelligence at<br />
the Gilbane Conference in Boston in November, and a<br />
seminar for Japanese Business which was hosted by the<br />
Ambassador of Ireland to Japan and supported by IDA<br />
Ireland Japan in April.<br />
<strong>CNGL</strong> booth at Localization World Seattle in October<br />
<strong>CNGL</strong> continued to host conferences and workshops<br />
for the international research community this year<br />
in the computational linguistics, digital content<br />
management and localisation areas. The 17th<br />
<strong>Annual</strong> LRC Internationalisation and Localisation<br />
Conference took place in Limerick in September<br />
with 70 participants from localisation companies and<br />
academia. The conference was collocated with the<br />
<strong>2012</strong> <strong>CNGL</strong> Localisation Innovation Showcase, which<br />
has now been established as a “must attend” event<br />
for professionals in Ireland involved in localistion and<br />
multilingual customer care. The keynote address was<br />
this year delivered by Dr. Thomas Arend, International<br />
Product Lead at Twitter.<br />
Irish Times coverage of <strong>CNGL</strong>’s work on sign language machine<br />
translation(left) and opinion piece by Prof. Josef van Genabith in the<br />
Irish Independent (right)
112<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
EDUCATION AND OUTREACH<br />
Feature on localisation careers in ‘Education’ magazine<br />
<strong>CNGL</strong>’s Localisation Innovation Showcase was collocated with the 17th<br />
<strong>Annual</strong> LRC Internationalisation and Localisation Conference in Limerick<br />
Other significant scientific events organised by <strong>CNGL</strong> in<br />
<strong>2012</strong> include the Interntational Postgraduate Conference<br />
in Translating and Interpreting, the Workshop on<br />
Innovation and Applications in Speech Technology,<br />
and the Workshop on Best Practices in Post-editing<br />
(in assocation with the Translation Automation Users’<br />
Society) at Localization World in Paris. The Centre<br />
was successful in its bid to bring COLING 2014, one of<br />
the world’s largest and most influential computational<br />
linguistics conferences, to Dublin in 2014.<br />
<strong>CNGL</strong>’s strong second-level education programmes were<br />
this year strengthened significantly by the production<br />
of a guide to ‘Careers in Next Generation Localisation’.<br />
This high-quality brochure focuses on commercial<br />
career opportunities at the intersection of Computing,<br />
Languages, Culture and Business. The guide was<br />
distributed to guidance counsellors in 729 secondary<br />
schools and has generated over 1,400 unique views<br />
of our careers web page to date. The brochure was<br />
launched in February by Mr. Seán Sherlock, T.D., Minister<br />
for Research and Innovation, and generated substantial<br />
media interest including spreads in ‘Education’ magazine<br />
and ‘Guideline’ – the official magazine of the Institute of<br />
Guidance Counsellors.<br />
The social impact of <strong>CNGL</strong>’s research programmes<br />
is evident in its social spinout activity, The Rosetta<br />
Foundation. The Foundation now has more than 2,600<br />
registered volunteer translators and the number of NGO<br />
partners increased fourfold during <strong>2012</strong>, allowing it to<br />
further its goal of facilitating access to information and<br />
knowledge to those who really need it. The Rosetta<br />
Foundation’s first NGO partner, Special Olympics – the<br />
world’s largest sports organisation for children and adults<br />
with intellectual disabilities – remained the most active in<br />
<strong>2012</strong>, with over fifty translation projects submitted. Other<br />
partners benefitting from the work of the Foundation’s<br />
volunteers include Community Eye Journal, The World<br />
Association of Girl Guides and Girl Scouts, Ruhama and<br />
Trócaire.<br />
Mr. Seán Sherlock T.D., Minister for Research and Innovation and Prof<br />
Josef van Genabith, Director of <strong>CNGL</strong> pictured at the launch of <strong>CNGL</strong>’s<br />
Next Generation Localisation careers guide in February
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 113<br />
<strong>CNGL</strong> coordinated Thesis in Three <strong>2012</strong> with the<br />
Systems Biology Ireland and CLARITY research centres.<br />
The aim of the competition is for PhD students to give<br />
an elevator pitch for the PhD thesis. Three slides in<br />
just three minutes. On the night, centre directors and<br />
principal investigators also delivered elevator pitches for<br />
their research centres. The event, held in collaboration<br />
with Innovation Dublin, attracted an audience of more<br />
than two hundred. The night celebrated the best of Irish<br />
science and innovation in bite-sized chunks.<br />
Plans for 2013<br />
Strategic marketing and communication plans for 2013<br />
will focus on the transition to a second cycle of funding<br />
for <strong>CNGL</strong>, in which the Centre will pioneer the concept<br />
“Global Intelligent Content”. Creation of a new brand<br />
for <strong>CNGL</strong> is already underway. This brand will reflect<br />
the broadening of the Centre’s research programme and<br />
will reflect the Centre’s greater emphasis on industrial<br />
engagement. A new website that communicates our<br />
vision of global intelligent content is in train, and the new<br />
branding will be rolled out across a suite of marketing<br />
materials designed to support business development<br />
efforts.<br />
Significant events planned for 2013 include SIGIR<br />
2013 – the 36th <strong>Annual</strong> ACM SIGIR Conference, which<br />
<strong>CNGL</strong> will co-host in July/August 2013, and Think Latin<br />
America, which will take place at Carton House, Kildare<br />
in April 2013.<br />
Jonathan McCrea, host of Newstalk’s ‘Futureproof’ show, is MC<br />
for Thesis in 3
Appendices
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 115<br />
Appendix 1: People and Partnerships<br />
CSET RESEARCH TEAMS<br />
Team Members Associated with the CSET During the <strong>Report</strong>ing Period<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
Yalemisew Abgaz PhD DCU DCM MSc M Ethiopian Yes Dr Claus Pahl<br />
Mohamed Abou-Zleikha PhD UCD ILT MSc M Syrian Yes Prof Julie Carson-<br />
Berndsen<br />
Zeeshan Ahmed PhD UCD ILT MSc M Pakistani Yes Prof Julie Carson-<br />
Berndsen<br />
Dimitra Anastasiou Postdoctoral<br />
Researcher<br />
Lamine Aouad Postdoctoral<br />
Researcher<br />
Ruwan<br />
Asanka<br />
Wasala<br />
UL LOC PhD F Greek Yes Mr Reinhard Schäler<br />
UL LOC PhD M Algerian Yes Mr Reinhard Schäler<br />
PhD UL LOC MSc M Sri Lankan No Mr Reinhard Schäler<br />
Akshat Bakliwal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />
Genabith<br />
Renu Balyan PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />
Genabith<br />
Pratyush Banerjee PhD DCU ILT MSc M Indian Yes Prof Josef van<br />
Genabith<br />
Jonathan Barr Graphics<br />
Designer<br />
DCU E&O BA M Irish Yes Prof Josef van<br />
Genabith<br />
Hanna Béchara PhD DCU ILT BA F Irish Yes Prof Josef van<br />
Genabith<br />
Urvesh Bhowan Research<br />
Assistant<br />
TCD DCM MSc M South Afican Yes Prof Vincent Wade<br />
Arianna Bisazza PhD Intern DCU ILT MSc M Italian Yes Prof Josef van<br />
Genabith<br />
Anton Bryl Postdoctoral<br />
Researcher<br />
DCU ILT PhD M Belarussian Yes Prof Josef van<br />
Genabith<br />
Jim Buckley Co-Supervisor UL LOC PhD M Irish No N/A<br />
Joao Cabral Postdoctoral<br />
Researcher<br />
Nick Campbell Co-Principal<br />
Investigator<br />
Julie<br />
Carson-<br />
Berndsen<br />
Co-Principal<br />
Investigator<br />
Özlem Çetinoglu Postdoctoral<br />
Researcher<br />
UCD ILT PhD M Portugese Yes Prof Julie Carson-<br />
Berndsen<br />
TCD ILT PhD M British No N/A<br />
UCD ILT DPhil F Irish No N/A<br />
DCU ILT PhD F Turkish Yes Prof Josef van<br />
Genabith<br />
Yi Chen PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />
Yvonne Cleary Co-Supervisor UL LOC PhD F Irish No N/A<br />
JJ Collins Co-Supervisor UL LOC PhD M Irish No N/A<br />
Declan Dagger Postdoctoral<br />
Researcher<br />
TCD DCM PhD M Irish Yes Prof Vincent Wade
116<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Sandipan Dandapat Postdoctoral<br />
Researcher<br />
Domenico De Feo Research<br />
Assistant<br />
Gavin Doherty Co-Principal<br />
Investigator<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
DCU ILT PhD M Indian Yes Prof Josef van<br />
Genabith<br />
TCD DCM MSc M Italian Yes Prof Vincent Wade<br />
TCD SF PhD M Irish No N/A<br />
Amelie Dorn PhD TCD ILT MSc F French Yes Prof Ailbhe Ní<br />
Chasaide<br />
Thomas Dunne Intern DCU CM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Erika Duriak Intern TCD DCM UnderGrad F Slovakian Yes Prof Josef van<br />
Genabith<br />
Mohammed<br />
Rami<br />
ElHussein<br />
Ghorab<br />
Martin Emms Co-Principal<br />
Investigator<br />
PhD TCD DCM MSc M Egyptian Yes Prof Vincent Wade<br />
TCD ILT PhD M Irish No N/A<br />
Maria Eskevich PhD DCU DCM Msc F Russian Yes Prof Gareth Jones<br />
Chris Exton Co-Supervisor UL LOC PhD M Australian/Irish No N/A<br />
David Filip Postdoctoral<br />
Researcher<br />
UL LOC PhD M Czech Yes Mr Reinhard Schäler<br />
Ríona Finn Administrative DCU CM MSc F Irish Yes N/A<br />
Hector Hugo Franco Penya PhD TCD ILT BSc M Spanish Yes Dr Martin Emms<br />
Brian Gallagher Technician TCD DCM MSc M Irish Yes Prof Vincent Wade<br />
Debasis Ganguly PhD DCU DCM MTech M Indian Yes Dr Gareth Jones<br />
Solomon Gizaw PhD UL LOC MSc M Ethiopian Yes Mr Reinhard Schäler<br />
Christer Gobl Co-Principal<br />
Investigator<br />
Yvette Graham Postdoctoral<br />
Researcher<br />
TCD ILT PhD M American No N/A<br />
DCU ILT PhD F Irish Yes Prof Josef van<br />
Genabith<br />
Cara Nicole Greene E&O Manager DCU E&O BSc F Irish Yes N/A<br />
Laura Grehan Marketing and<br />
Communications<br />
Officer<br />
Alfredo<br />
Guerra<br />
Maldonado<br />
DCU E&O MSc F Irish Yes N/A<br />
PhD TCD ILT BSc M Mexican/Irish Yes Dr Carl Vogel<br />
Rajat Gupta PhD UL LOC BSc M Indian Yes Mr Reinhard Schäler<br />
Yanfen Hao Postdoctoral<br />
Researcher<br />
UCD DCM PhD M Chinese Yes Dr Tony Veale<br />
Geraldine Harrahill Administrative UL CM FETAC F Irish Yes N/A<br />
Emer Hedderman Intern DCU CM UnderGrad F Irish Yes Prof Josef van<br />
Genabith<br />
James Mark Hender Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Yu Hui PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />
Genabith<br />
Muhammad Javed PhD DCU DCM MSc M Pakistani Yes Dr Claus Pahl
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 117<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Gareth Jones Co-Principal<br />
Investigator<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
DCU DCM PhD M British No N/A<br />
Supervisor<br />
Amir Kamran PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />
Genabith<br />
John Kane PhD TCD ILT MPhil M Irish Yes Prof Ailbhe Ní<br />
Chasaide<br />
Mark Kane PhD UCD ILT MSc M Irish Yes Prof Julie Carson-<br />
Berndsen<br />
Bridget Kane Postdoctoral<br />
Researcher<br />
TCD SF PhD F Irish Yes Dr Saturnino Luz<br />
Karl Kelly Administrative UL E&O Grad Dip M Irish Yes N/A<br />
Dorothy Kenny Co-Principal<br />
Investigator<br />
DCU ILT PhD F Irish No N/A<br />
Kevin Koidl PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />
Alla Kovaleva Intern TCD DCM UnderGrad F Kazakhstan Yes Prof Josef van<br />
Genabith<br />
Ru Kuang PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />
Genabith<br />
Sudip Kumar Naskar Postdoctoral<br />
Researcher<br />
Séamus Lawless Assistant<br />
Professor<br />
DCU ILT PhD M Indian Yes Prof Josef van<br />
Genabith<br />
TCD DCM PhD M Irish Yes Prof Vincent Wade<br />
Madeleine Lenker PhD UL LOC MA F German Yes Mr Reinhard Schäler<br />
Killian Levacher PhD TCD DCM MSc M French/Irish Yes Prof Vincent Wade<br />
Johannes Leveling Postdoctoral<br />
Researcher<br />
David Lewis Funded<br />
Investigator<br />
DCU DCM PhD M German Yes Dr Gareth Jones<br />
TCD SF PhD M English Yes N/A<br />
Wei Li PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />
Junhui Li Postdoctoral<br />
Researcher<br />
DCU ILT PhD M Chinese Yes Prof Josef van<br />
Genabith<br />
Robert Lis Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Qun Liu Co-Principal<br />
Investigator<br />
Luca Longa Research<br />
Assistant<br />
Alejandra<br />
Lopez<br />
Fernandez<br />
DCU ILT PhD M Chinese No N/A<br />
TCD DCM MSc M Italian Yes Prof Vincent Wade<br />
PhD UCD DCM MSc F Mexican Yes Dr Tony Veale<br />
Juan Luo PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />
Genabith<br />
Saturnino Luz Co-Principal<br />
Investigator<br />
TCD SF PhD M Brazilian No N/A<br />
Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr Carl Vogel<br />
Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr. Carl Vogel<br />
Walid Magdy Postdoctoral<br />
Researcher<br />
DCU DCM PhD M Egyptian Yes Dr Gareth Jones
118<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Fiona Maguire Finance<br />
Administrative<br />
Liliana<br />
Mamani-<br />
Sanchez<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
DCU CM CIMA F Irish Yes N/A<br />
Supervisor<br />
PhD TCD ILT MSc F Peruvian Yes Dr Carl Vogel<br />
Tom Mason Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Sophie Matabaro Centre<br />
Administrative<br />
DCU CM F Irish Yes N/A<br />
John McAuley PhD TCD SF MPhil M Irish Yes Dr David Lewis<br />
Eithne McCann PA to Director DCU E&O National<br />
Cert<br />
F Irish Yes N/A<br />
Hilary McDonald Project Manager TCD CM MSc F Irish Yes N/A<br />
Shane McQuillan Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Kristos Mikkonen Intern TCD E&O UnderGrad M Finnish Yes Dr David Lewis<br />
Jinming Min PhD DCU DCM MSc M Chinese Yes Dr Gareth Jones<br />
Joss Moorkens Postdoctoral<br />
Researcher<br />
Lucía<br />
Morado<br />
Vásquez<br />
DCU LOC PhD M Irish Yes Dr Sharon O’Brien<br />
PhD UL LOC MSc F Spanish Yes Mr Reinhard Schäler<br />
John Moran PhD TCD SF BSc M Irish Yes Dr David Lewis<br />
Erwan Moreau Postdoctoral<br />
Researcher<br />
TCD ILT PhD M French Yes Dr Carl Vogel<br />
Aram Morera Mesa PhD UL LOC Grad Dip M Spanish Yes Mr Reinhard Schäler<br />
Sara Morrissey Postdoctoral<br />
Researcher<br />
DCU ILT PhD F Irish Yes Prof Josef van<br />
Genabith<br />
Catherine Mulwa PhD TCD DCM MSc F Kenyan Yes Prof Vincent Wade<br />
Dat Tien Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />
Genabith<br />
Dat Quoc Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />
Genabith<br />
Ailbhe Ní Chasaide Co-Principal<br />
Investigator<br />
TCD ILT PhD F Irish No N/A<br />
Neasa Ní Chiaráin PhD TCD ILT MSc F Irish Yes Prof Ailbhe Ní<br />
Chasaide<br />
Naoto Nishio PhD UL LOC Grad Dip M Japanese Yes Mr Reinhard Schäler<br />
Conor O Gorman Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Siobhan O Mara Research<br />
Assistant<br />
Sharon O’Brien Co-Principal<br />
Investigator<br />
DCU E&O BA F Irish Yes Prof Josef van<br />
Genabith<br />
DCU ILT PhD F Irish No N/A<br />
Eoin Ó’Conchuir Technician UL CM PhD M Irish Yes Mr Reinhard Schäler<br />
Alexander O’Connor Postdoctoral<br />
Researcher<br />
TCD DCM PhD M Irish Yes Prof Vincent Wade
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 119<br />
First<br />
Name<br />
Udochukwu<br />
Kalu<br />
Surname Type Institution Research<br />
Strand<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
Ogbureke PhD UCD ILT MPhil M Nigerian Yes Prof Julie Carson-<br />
Berndsen<br />
Ian O’Keeffe Postdoctoral<br />
Researcher<br />
Declan O’Sullivan Co-Principal<br />
Investigator<br />
Claus Pahl Co-Principal<br />
Investigator<br />
UL LOC PhD M Irish Yes Mr Reinhard Schäler<br />
TCD DCM PhD M Irish No N/A<br />
DCU DCM PhD M German No N/A<br />
Santanu Pal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />
Genabith<br />
Ciaran Porter Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Enda Quigley PhD UL LOC BSc M Irish Yes Mr Reinhard Schäler<br />
Paul Redmond Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />
Genabith<br />
Corentin Ribeyre PhD Intern DCU ILT MSc M French Yes Prof Josef van<br />
Genabith<br />
Stephen Roantree IP Manager DCU CM Grad Dip M Irish Yes N/A<br />
Ilana Rozanes PhD TCD SF MSc F American Yes Dr Saturnino Luz<br />
Lorcan Ryan PhD UL LOC MSc M Irish Yes Mr Reinhard Schäler<br />
Melike Sah Postdoctoral<br />
Researcher<br />
Reinhard Schäler Lead Principal<br />
Investigator<br />
TCD DCM PhD F Cypriot Yes Prof Vincent Wade<br />
UL LOC MSc M German No N/A<br />
Stephan Schlögl PhD TCD SF MSc M Austrian Yes Dr Saturnino Luz<br />
Anne Schneider PhD TCD SF BSc F German Yes Dr Saturnino Luz<br />
Mary Sharp Co-Principal<br />
Investigator<br />
Páraic Sheridan Associate<br />
Director<br />
TCD DCM BSc F Irish No N/A<br />
DCU CM PhD M Irish Yes N/A<br />
Harold Somers E&O DCU E&O PhD M British Yes N/A<br />
Brendan Spillane PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />
Ben Steichen Postdoctoral<br />
Researcher<br />
TCD DCM PhD M Luxembourgish Yes Prof Vincent Wade<br />
Siobhan Swords Intern DCU E&O UnderGrad F Irish Yes Prof Josef van<br />
Genabith<br />
Eva Szekely PhD UCD ILT MA F Hungarian Yes Prof Julie Carson-<br />
Berndsen<br />
Josef van Genabith Lead Principal<br />
Investigator<br />
Tony Veale Co-Principal<br />
Investigator<br />
Carl Vogel Co-Principal<br />
Investigator<br />
DCU CM PhD M German Yes N/A<br />
UCD DCM PhD M Irish No N/A<br />
TCD ILT PhD M American No N/A<br />
Joris Vreeke Programmer DCU E&O M Dutch Yes Prof Josef van<br />
Genabith
120<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Vincent Wade Lead Principal<br />
Investigator<br />
Joachim Wagner Systems<br />
Administrator<br />
Andy Way Lead Principal<br />
Investigator<br />
Xiaofeng Wu Postdoctoral<br />
Researcher<br />
Irena Yanushevskaya Postdoctoral<br />
Researcher<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
TCD DCM PhD M Irish No N/A<br />
Supervisor<br />
DCU CM MA M German Yes Prof Josef van<br />
Genabith<br />
DCU ILT PhD M British No N/A<br />
DCU ILT PhD M Chinese Yes Prof Josef Van<br />
Genabith<br />
TCD ILT PhD F Russian Yes Prof Ailbhe Ní<br />
Chasaide<br />
Amalia Zahra PhD UCD ILT BSc F Indonesian Yes Prof Julie Carson-<br />
Berndsen<br />
Dong Zhou Postdoctoral<br />
Researcher<br />
TCD DCM PhD M Chinese Yes Prof Vincent Wade<br />
Affiliated Members and Collaborators Not Receving Funds<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
Hala Al Maghout Postdoctoral<br />
Researcher<br />
Mohammed Attia Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD F Syrian No Prof Josef van<br />
Genabith<br />
DCU Affiliated PhD M Egyptian No Prof Josef Van<br />
Genabith<br />
Eoin Bailey PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />
Ergun Bicicci Postdoctoral<br />
Researcher<br />
Peter Cahill Co-Principal<br />
Investigator<br />
DCU Affiliated PhD M Cypriot No Prof Josef van<br />
Genabith<br />
UCD Affiliated PhD M Irish No Prof Julie Carson-<br />
Berndsen<br />
Oscar Cassetti PhD TCD Affiliated MSc M Italian No Dr Saturnino Luz<br />
Alexandru Ceausu Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD M Romanian No Dr Páraic Sheridan<br />
Yi Chen PhD DCU Affiliated PhD F Chinese No Prof Gareth Jones<br />
Owen Conlan Assistant Professor TCD Affiliated PhD M Irish No NA<br />
Seamus Coogan Marketing Lead TCD Affiliated BSc M Irish No Prof Vincent Wade<br />
Stephen Curran Research<br />
Assistant/<br />
Programmer<br />
Aswarth Dara Postdoctoral<br />
Researcher<br />
Stephen Doherty Postdoctoral<br />
Researcher<br />
TCD Affiliated MSc M Irish Yes Dr David Lewis<br />
DCU Affiliated PhD M Indian No Prof Josef van<br />
Genabith<br />
DCU Affiliated PhD M Irish No Prof Josef van<br />
Genabith<br />
David Faherty Research Assistant TCD Affiliated BSc M Irish No Prof Vincent Wade<br />
Leroy Finn Programmer TCD Affiliated MSc M Irish No Dr David Lewis<br />
Frank Fowley Research Assistant DCU Affiliated MSc M Irish No Dr Claus Pahl<br />
Manisha Ganguly Programmer DCU Affiliated F Indian No Prof Gareth Jones
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 121<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
Federico Gaspari Postdoctoral<br />
Researcher<br />
Anton Gerdelan Postdoctoral<br />
Researcher<br />
Lorraine Goeuriot Postdoctoral<br />
Researcher<br />
Steve Gotz Commercialisation<br />
Manager<br />
Declan Groves Research<br />
Integration Officer<br />
Cormac Hampson Postdoctoral<br />
Researcher<br />
Deirdre Hogan Postdoctoral<br />
Researcher<br />
Dominic Jones Postdoctoral<br />
Researcher<br />
John Judge Postdoctoral<br />
Researcher<br />
Liadh Kelly Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD M Italian No Prof Josef van<br />
Genabith<br />
TCD Affiliated PhD M New Zealand No Dr Dave Lewis<br />
DCU Affiliated MSc F French No Dr Gareth Jones<br />
DCU Affiliated MSc M American No N/A<br />
DCU Affiliated PhD M Irish No N/A<br />
TCD Affiliated PhD M No Prof Vincent Wade<br />
DCU Affiliated Phd F Irish No N/A<br />
TCD Affiliated MSc M British No Dr David Lewis<br />
DCU Affiliated PhD M Irish No Prof Josef Van<br />
Genabith<br />
DCU Affiliated MSc F Irish No Dr Gareth Jones<br />
Alex Killen Programmer DCU Affiliated BSc M Irish No Prof Josef van<br />
Genabith<br />
Kris McGlinn Research Assistant TCD Affiliated MSc M Irsh No Dr David Lewis<br />
Brenda McGuirk Project<br />
Co-ordinator<br />
TCD Affiliated F Irish No Prof Vincent Wade<br />
Gavin<br />
Mendel-<br />
Gleeson<br />
Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD M Irish No Dr Deirdre Hogan<br />
Sebastian Molines Research Assistant TCD Affiliated MSc M French Yes Dr David Lewis<br />
Adam Moore Postdoctoral<br />
Researcher<br />
TCD Affiliated PhD M Irish No Prof Vincent Wade<br />
Lynda O Donovan Pedagogical Lead TCD Affiliated MSc F Irish No Prof Vincent Wade<br />
Ian O’Keeffe Postdoctoral<br />
Researcher<br />
TCD Affiliated PhD M Irish No Prof Vincent Wade<br />
Tsuyoshi Okita PhD DCU Affiliated MSc M Japanese No Prof Josef Van<br />
Genabith<br />
Neil Peirce Research Assistant TCD Affiliated PhD M Irish No Prof Vincent Wade<br />
Raphael Rubino Postdoctoral<br />
Researcher<br />
Rasoul Samad Zadeh Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD M French No Dr Jennifer Foster<br />
DCU Affiliated PhD M Iranian No Dr Jennifer Foster<br />
Eduardo Shanahan Programmer DCU Affiliated BSc M Argentina No Prof Josef van<br />
Genabith<br />
Ankit Srivastava Postdoctoral<br />
Researcher<br />
DCU Affiliated PhD M Indian No Prof Josef van<br />
Genabith
122<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />
First<br />
Name<br />
Surname Type Institution Research<br />
Strand<br />
Highest<br />
Degree<br />
Gender Nationality CSET<br />
Funded<br />
Supervisor<br />
Thanos Staikopolous Postdoctoral<br />
Researcher<br />
John Tinsley Project Coordinator<br />
Antonio Toral Postdoctoral<br />
Researcher<br />
Lamia Tounsi Postdoctoral<br />
Researcher<br />
TCD Affiliated PhD M Greek No Prof Vincent Wade<br />
DCU Affiliated PhD M Irish No Dr Páraic Sheridan<br />
DCU Affiliated PhD M Spanish No Prof Andy Way<br />
DCU Affiliated PhD F Algerian No Prof Josef Van<br />
Genabith<br />
Eddie Walsh PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />
Rachel Wrafter Postdoctoral<br />
Researcher<br />
Lei Xu Postdoctoral<br />
Researcher<br />
TCD Affiliated PhD F Irish No Prof Vincent Wade<br />
DCU Affiliated PhD M Chinese No Dr Claus Pahl<br />
Hong Yi Wang Research Assistant DCU Affiliated BSc F Chinese No Dr Deirdre Hogan<br />
Bilal Yousuf PhD TCD Affiliated MSc M No Prof Vincent Wade<br />
Jian Zhang Technician DCU Affiliated BSc M Chinese No Dr Páraic Sheridan<br />
Industry Partners and Contact Names<br />
Industry Partners<br />
Contact<br />
Organisation<br />
Type<br />
Organisation<br />
Name<br />
Location<br />
Date joined<br />
CSET<br />
Date departed First Name Surname Position<br />
SME<br />
MNC<br />
Alchemy Software<br />
Development<br />
Dai Nippon<br />
Printing<br />
Dublin, Ireland 04/12/2007 N/A Enda McDonnell Director of Engineering<br />
Tokyo, Japan 04/12/2007 N/A Takeshi Fukunaga Advisor of Headquarters<br />
MNC IBM Dublin, Ireland 04/12/2007 N/A Brian O’Donovan Program Director,<br />
IBM Dublin Centre for<br />
Advanced Studies<br />
MNC Microsoft Dublin, Ireland 04/12/2007 N/A Dag Schmidtke Program Manager for<br />
Language Technology<br />
Strategy<br />
MNC SDL Wicklow, Ireland 04/12/2007 N/A Paul McManus General Manager<br />
SME SpeechStorm Belfast, Northern<br />
Ireland<br />
04/12/2007 N/A Oliver Lennon Chief Executive Officer<br />
MNC Symantec Dublin, Ireland 04/12/2007 N/A Fred Hollowood Research Director,<br />
Shared Engineering<br />
Services<br />
MNC<br />
CAPITA (Formerly<br />
Applied Language<br />
Solutions)<br />
Manchester, U.K. 04/12/2007 N/A Gavin Wheeldon Chief Executive Officer<br />
SME VistaTEC Dublin, Ireland 04/12/2007 N/A Phil Ritchie Chief Technology Officer<br />
MNC Welocalize Dublin, Ireland 23/02/2011 N/A Derek Coffey Vice President,<br />
Technology and<br />
Professional Services
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 123<br />
Governance Committee Members<br />
Role First Name Surname Organisation Position<br />
Chair David MacDonald IMS Maxims Chairman<br />
Member Alan Harvey DCU Vice President for Research<br />
Member Vinny Cahill TCD Dean of Research<br />
Member Gearóid Mooney Enterprise Ireland Director, Informatics Research and Commercialisation<br />
Member Phil Ritchie VistaTEC Chief Technical Officer<br />
Member Aidan Sweeney IBEC R&D Policy Executive<br />
Member Josef van Genabith DCU <strong>CNGL</strong> Director<br />
In Attendance Páraic Sheridan DCU <strong>CNGL</strong> Associate Director<br />
In Attendance Vincent Wade TCD <strong>CNGL</strong> Deputy Director<br />
Scientific Advisory Board Members<br />
Role First Name Surname Organisation Position<br />
Chair Francis Tsang Adobe Systems Director of Globalisation<br />
Member Andrew Bredenkamp Acrolinx Chief Executive Officer<br />
Member Carol Espy-Wilson University of Maryland, Department of Electrical<br />
and Computer Engineering<br />
Professor<br />
Member Lauri Karttunen Palo Alto Research Center Computational Linguist<br />
Member Makato Nagao NIST President
124<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
Appendix 2: Outputs<br />
PhDs Awarded<br />
Name Nationality Gender Institute<br />
Dominic Jones English M TCD<br />
Joachim Wagner Germany M DCU<br />
Ben Steichen Luxembourgish M TCD<br />
Lucia Morado Vazquez Spanish F UL<br />
Stephen Doherty Irish M UL<br />
Joss Moorkens Irish M DCU<br />
Ian O’Keeffe Irish M TCD<br />
Zohar Etzioni Israeli M TCD<br />
Walid Magdy Egyptian M DCU<br />
Sandipan Dadapat Indian M DCU<br />
Ankit Srivastava Indian M DCU<br />
Pratyush Banerjee Indian M DCU<br />
Hala Al Maghout Syrian F DCU<br />
All CSET Publications<br />
All <strong>CNGL</strong> publications are stored in a central document management system and are available through the institutional<br />
Open Access repositories.<br />
Refereed Conference and Workshop Papers<br />
Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />
Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />
Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />
Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />
Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />
In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />
Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />
In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />
Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />
Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy<br />
Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />
of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />
Republic of Korea<br />
Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />
Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />
detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai, India<br />
Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />
24th International Conference on Computational Linguistics, Mumbai, India
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 125<br />
Refereed Conference and Workshop Papers<br />
Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />
Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />
(EAMT-<strong>2012</strong>), Trento, Italy<br />
Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />
Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />
Resources and Evaluation (LREC), Istanbul, Turkey<br />
Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />
(LISTA) Workshop, Edinburgh, UK<br />
Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />
Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />
(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />
Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />
Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />
30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />
Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />
for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />
Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />
Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />
on Pattern Recognition Application and Methods, Portugal<br />
Eskevich, M., Magdy, W., Jones, G.J.F. (<strong>2012</strong>). New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval. ECIR <strong>2012</strong>, pages 170-181<br />
Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />
of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />
Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />
Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />
Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />
France, ACM proceedings 978-1-4503-1229-5/11/04<br />
Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />
Mumbai, India<br />
Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />
Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />
International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />
Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Rule-based stemmers for Bengali and Hindi. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum<br />
for Information Retrieval Evaluation, pages 37-42, Kolkata,India, <strong>2012</strong>. ISI.<br />
Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@INEX-<strong>2012</strong>: Exploring sentence retrieval for tweet contextualization. In Pamela Forner, Jussi Karlgren,<br />
and Christa Womser-Hacker, editors, CLEF <strong>2012</strong> Evaluation Labs and Workshop, Online Working Notes, 17-20 September, Rome, Italy<br />
Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). Technical challenges and design issues in Bangla language processing, chapter Bengali (Bangla) Information<br />
Retrieval. IGI Global, <strong>2012</strong>. (to appear)<br />
Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />
In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />
Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />
Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />
Knowledge Evolution and Ontology Dynamics, Boston, USA<br />
Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in<br />
HMM-based Speech Synthesis<br />
Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />
Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas
126<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Refereed Conference and Workshop Papers<br />
Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>,<br />
Portland, USA<br />
Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />
Florence, Italy<br />
Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />
Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />
international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />
Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation. In<br />
Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />
Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />
Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />
Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />
Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />
Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />
Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of<br />
The European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />
Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />
of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />
Leveling, J. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Monolingual and crosslingual SMS-based FAQ retrieval. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum for Information<br />
Retrieval Evaluation, pages 37-42, Kolkata, India, <strong>2012</strong>. ISI.<br />
Leveling, J. (<strong>2012</strong>). On the effect of stopword removal for SMS-based FAQ retrieval. In Gosse Bouma, Ashwin Ittoo, Elisabeth Métais, and Hans Wortmann,<br />
editors, Natural Language Processing and Information Systems – 17th International Conference on Applications of Natural Language to Information<br />
Systems, NLDB <strong>2012</strong>, 26-28 June, Groningen, The Netherlands, Proceedings, volume 7337 of Lecture Notes in Computer Science (LNCS), pages 128-139.<br />
Springer, <strong>2012</strong>.<br />
Leveling, J., Goeuriot, L., Kelly, L., Jones, G.J.F. (<strong>2012</strong>). DCU@TRECMed <strong>2012</strong>: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of<br />
TREC <strong>2012</strong>. NIST, <strong>2012</strong>.<br />
Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference<br />
(AIRS <strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />
Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />
Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />
Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />
Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />
Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />
Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />
Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />
Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />
Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />
Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />
of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />
Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />
Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />
Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />
word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles<br />
(JADT <strong>2012</strong>), Liège.<br />
Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />
or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 127<br />
Refereed Conference and Workshop Papers<br />
Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />
Infocommunications, Kosice, Slovenia<br />
McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />
Canada: ACM<br />
Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />
<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />
Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />
Machine Translation, Montreal, Canada<br />
Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />
on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />
Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />
Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal,<br />
Canada<br />
Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />
In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />
O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />
O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />
International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />
O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />
Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />
O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />
In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />
Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation, COLING<br />
<strong>2012</strong>, Mumbai, India<br />
Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />
Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />
Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />
& Demonstrations Track, Graz, Austria<br />
Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />
Translation, December 2011, San Francisco, USA<br />
Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />
translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />
Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />
expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />
Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />
Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />
Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />
LREC <strong>2012</strong>, Istanbul, Turkey<br />
Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />
In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />
Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />
In Proceedings of ICASSP, Kyoto, Japan<br />
Truran, M., Georg, G., Cavazza, M., Zhou, D. (<strong>2012</strong>). A Section Title Authoring Tool for Clinical Guidelines. In Proceedings of 12th ACM Symposium<br />
on Document Engineering (DocEng <strong>2012</strong>), 4-7 September, Paris, France, 41-44.<br />
Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />
Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea
128<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Refereed Conference and Workshop Papers<br />
Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai,<br />
India<br />
Veale, T. (<strong>2012</strong>) Detecting and Generating Ironic Comparisons: An Application of Creative Information Retrieval. AAAI Fall Symposium Series <strong>2012</strong><br />
Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />
Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />
In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />
Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />
Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />
Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis of<br />
Non-Canonical Language (SANCL), Montreal, Canada<br />
Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />
Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />
Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />
Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />
Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />
international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />
Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />
Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />
Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />
Conference on Natural Language Processing, Vienna, Austria<br />
Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />
Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />
Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory<br />
and Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />
Not Recorded in 2011 <strong>Annual</strong> <strong>Report</strong><br />
Refereed Conference and Workshop Papers<br />
Doherty, S. Exploring the Cognitive Elements of Think-Aloud Protocols. Show and Tell: Proceedings of the 2011 SALIS Postgraduate Showcase<br />
Abgaz, Y., Javed, M., Pahl, C. A Framework for Change Impact Analysis of Ontology-driven Content-based Systems. In Proceedings of On the Move to<br />
Meaningful Internet Systems: OTM Workshops. 7th International IFIP Workshop on Semantic Web and Web Semantics (SWWS), October, 2011, Crete,<br />
Greece<br />
Book Chapters<br />
Asanka Wasala, R., Buckley, J., Exton, C., Schaler, R., Weerasinghe, A. R. (<strong>2012</strong>). Building Multilingual Language Resources in Web Localisation:<br />
A Crowdsourcing Approach. In I. Gurevych and J. Kim (Eds.), The People’s Web Meets NLP: Collaboratively Constructed Language Resources,<br />
Springer Verlag Berlin Heidelberg [In Press]<br />
Banerjee, P. (<strong>2012</strong>). In Alexander Clark, Chris Fox and Shalom Lappin (eds.): Handbook of computational linguistics and natural language processing.<br />
Machine Translation. 10.1007/s10590-012-9124-2 (OnlineFirst)<br />
“Kane, M., Mauclair, J. and Carson-Berndsen, J. (2011). Automatic Identification of Phonetic Similarity based on Underspecification. Human Language<br />
Technology: Challenges for Computer Science and Linguistics. Lecture Notes in Computer Science (LNCS 6562) Poznan, pp.47-58<br />
Morera Mesa, A., Collins, J.J., Aouad, L. (<strong>2012</strong>). Assessing Support for Community Workflows in Localisation. In Florian Daniel, Kamel Barkaoui and<br />
Schahram Dustdar (Eds.) Business Process Management Workshops, BPM 2011 International Workshops Clermont-Ferrand, France, August 29, 2011,<br />
Revised Selected Papers, Part I, Lecture Notes in Business Information Processing (LNBIP) volume 99, part 3, pp 195-206, Springer Berlin Heidelberg<br />
O’Keeffe, I., Aouad, L., Collins, J.J., Asanka Wasala, R., Nishio, N., Morera Mesa, A., Morado Vázquez, L., Ryan, L., Gupta, R., Schaler, R. (<strong>2012</strong>).<br />
A View of Future Technologies and Challenges for the Automation of Localisation Processes: Visions and Scenarios. ICHIT (2) 2011: 371-382
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 129<br />
Refereed Original Articles<br />
Kane, J. and Gobl, C. Wavelet maxima dispersion for breathy to tense voice discrimination, IEEE Transactions on Audio, Speech and Language Processing<br />
[In Press]<br />
Doherty, G., Karamanis, N., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work. Computer Supported<br />
Cooperative Work (CSCW), August <strong>2012</strong><br />
Ghorab, R.M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: survey and classification. User Modeling and User-Adapted<br />
Interaction (UMUAI), <strong>2012</strong>.<br />
Kane, J., Drugman, T., Gobl, C. Improved automatic detection of creak, Computer Speech and Language [In Press]<br />
Karamanis, N., Doherty, G., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work, Computer Supported<br />
Cooperative Work (CSCW), August <strong>2012</strong><br />
Lambert, P., Petitrenaud S., Ma Y., Way A., (<strong>2012</strong>). What types of word alignment improve statistical machine translation In Machine Translation, Volume<br />
26,(4), edited by Springer, p.289-323, <strong>2012</strong><br />
Moorkens, J. (<strong>2012</strong>). A mixed-methods study of consistency in Translation Memories. Localisation Focus, Volume 11(1)<br />
Morena Mesa, A. (<strong>2012</strong>). Translation and localization project management: the art of the possible. In Keiran J. Dunne and Elena S. Dunne (eds.).<br />
Translation and localization project management: the art of the possible<br />
Mulwa, C., Lawless, S., O’Keeffe, I., Sharp, M., Wade, V. (<strong>2012</strong>). A recommender Framework for the Evaluation of End User Experience in Adaptive<br />
Technology enhanced Learning Systems. International Journal of Technology Enhanced Learning, IJTEL, Special Issue on “Datasets and Data Supported<br />
Learning in Technology-Enhanced Learning”, Vol. 4, pp. 67-84, Nos. 1/2, <strong>2012</strong><br />
O’Keeffe, I. (<strong>2012</strong>). Soundtrack Localisation: Culturally Adaptive Music Content for Computer Games, Journal of Internationalisation and Localisation<br />
Rami Ghorab, M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: Survey and Classification, In User Modeling and User<br />
Adapted Interaction (UMUAI) Journal1-63 (Published Online First: http://dx.doi.org/10.1007/s11257-012-9124-1), Springer.<br />
Ryan, L. (<strong>2012</strong>). Global Authoring Resources, Communicator, Spring <strong>2012</strong>, ISTC<br />
Ryan, L. (<strong>2012</strong>). Global Authoring Techniques. Communicator, Autumn <strong>2012</strong>, ISTC<br />
Ryan, L. (<strong>2012</strong>). Global Diversity and Localisation Issues. Communicator, Summer <strong>2012</strong>, ISTC XXX<br />
Sah, M. and Wade, V. (<strong>2012</strong>). Automatic Metadata Mining from Multilingual Enterprise Content. In Web Semantics: Science, Services and Agents on the<br />
World Wide Web, Volume 11, issue (March, <strong>2012</strong>), p. 41-62<br />
Van Der Sluis, I., Luz, S., Breitfus, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />
xpressions in a virtual world. International Journal of Human-Computer Studies, Volume 70, Issue 9, <strong>2012</strong><br />
Pages 611-619<br />
Van der Sluis, I., Luz, S., Breitfuß, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />
expressions in a virtual world. International Journal of Human-Computer Studies, 70(9):611-629, <strong>2012</strong>.<br />
Wasala, A., Schmidtke, D., Schaler, R. (<strong>2012</strong>). XLIFF and LCS Format: A Comparison. Localisation Focus, Volume 11(1)<br />
Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Improving search via personalized query expansion using social media, Information Retrieval, 15(3-4), 218-242<br />
Zhou, D., Truran, M., Brailsford, T., Wade, V., Ashman, H. (<strong>2012</strong>). Translation Techniques in Cross-Language Information Retrieval. ACM Computing<br />
Surveys (CSUR), 45(1), Article 1, 1-44. <strong>2012</strong><br />
Conference Presentations<br />
Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />
Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />
Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />
Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />
Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />
In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />
Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />
In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />
Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />
Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy
130<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Conference Presentations<br />
Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />
of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />
Republic of Korea<br />
Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />
Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />
detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai,<br />
India<br />
Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />
24th International Conference on Computational Linguistics, Mumbai, India<br />
Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />
Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />
(EAMT-<strong>2012</strong>), Trento, Italy<br />
Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />
Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />
Resources and Evaluation (LREC), Istanbul, Turkey<br />
Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />
(LISTA) Workshop, Edinburgh, UK<br />
Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />
Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />
(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />
Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />
Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />
30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />
Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />
for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />
Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />
Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />
on Pattern Recognition Application and Methods, Portugal<br />
Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />
of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />
Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />
Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />
Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />
France, ACM proceedings 978-1-4503-1229-5/11/04<br />
Filip, D., Lewis, D., Wasala, A., Jones, D., Finn, L. (<strong>2012</strong>). CMSL10n SOLAS Integration as an ITS 2.0 XLIFF test bed. Paper presented at the W3C<br />
MultilingualWeb (ITS 2.0) Track, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.<br />
Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />
Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />
International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />
Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />
Mumbai, India<br />
Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />
In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />
Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />
Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />
Knowledge Evolution and Ontology Dynamics, Boston, USA<br />
Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMMbased<br />
Speech Synthesis<br />
Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>, Portland, USA
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 131<br />
Conference Presentations<br />
Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />
Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas<br />
Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />
Florence, Italy<br />
Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />
Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />
international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />
Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation.<br />
In Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />
Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />
Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />
Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />
Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />
Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />
Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of The<br />
European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />
Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />
of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />
Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference (AIRS<br />
<strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />
Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />
International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />
Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />
Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />
Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />
Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />
Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />
Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />
Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />
Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />
Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />
of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />
Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />
Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />
Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />
word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles (JADT<br />
<strong>2012</strong>), Liège.<br />
Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />
or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy<br />
Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />
Infocommunications, Kosice, Slovenia<br />
McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />
Canada: ACM<br />
Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />
<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />
Moorkens, J., Doherty, S., Kenny, D., O’Brien, S. 2013 (forthcoming). A Virtuous Circle: Laundering Translation Memory Data using Statistical Machine<br />
Translation. Tralogy Conference, January 2013, Paris, France<br />
Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />
Machine Translation, Montreal, Canada
132<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Conference Presentations<br />
Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />
on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />
Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />
Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal, Canada<br />
Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />
In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />
O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />
O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />
International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />
O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />
Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />
O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />
In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />
Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation,<br />
COLING <strong>2012</strong>, Mumbai, India<br />
Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />
Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />
Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />
& Demonstrations Track, Graz, Austria<br />
Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />
Translation, December 2011, San Francisco, USA<br />
Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />
translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />
Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />
expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />
Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />
Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />
Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />
LREC <strong>2012</strong>, Istanbul, Turkey<br />
Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />
In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />
Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />
In Proceedings of ICASSP, Kyoto, Japan<br />
Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />
Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea<br />
Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai, India<br />
Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />
Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />
In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />
Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />
Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />
Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis<br />
of Non-Canonical Language (SANCL), Montreal, Canada<br />
Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />
Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />
Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />
Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />
Wasala, A., Filip, D., Exton, C., R., Schäler R. (<strong>2012</strong>). Making Data Mining of XLIFF Artefacts Relevant for the Ongoing Development of the XLIFF Standard.<br />
Paper presented at the 3rd International XLIFF Symposium, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 133<br />
Conference Presentations<br />
Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />
international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />
Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />
Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />
Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />
Conference on Natural Language Processing, Vienna, Austria<br />
Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />
Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />
Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory and<br />
Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />
Workshops and Conferences Hosted<br />
Date Event Title Location<br />
09/03/<strong>2012</strong> - 10/03/<strong>2012</strong> Workshop on Innovation and Applications in Speech Technology (IAST) University College Dublin<br />
11/03/<strong>2012</strong> <strong>CNGL</strong> Hadoop Hackathon Dublin City University<br />
16/05/<strong>2012</strong> - 17/05/<strong>2012</strong> <strong>CNGL</strong> Spring Scientific Committee Meeting (incorporating inaugural<br />
Innovation Charette)<br />
Chartered Accountants House, Dublin<br />
30/05/<strong>2012</strong> - 01/06/<strong>2012</strong> International Conference on Computational Creativity (ICCC) University College Dublin<br />
04/06/<strong>2012</strong> - 06/06/<strong>2012</strong> Workshop on Best Practices in Post-editing (in association with TAUS) at<br />
Localization World<br />
Paris, France<br />
13/06/<strong>2012</strong> - 15/06/<strong>2012</strong> LRC Summer School <strong>2012</strong> – Mobile App Localisation Carlton Castletroy Park Hotel, Limerick<br />
25/06/<strong>2012</strong> Trinity Access Programme ‘Editing Wikipedia’ Workshop Trinity College Dublin<br />
20/09/<strong>2012</strong> - 21/09/<strong>2012</strong> 17th <strong>Annual</strong> Localisation & Internationalisation Conference (LRC XVII) Carlton Castletroy Park Hotel, Limerick<br />
11/06/<strong>2012</strong> - 13/06/<strong>2012</strong> W3C Multilingual Web Workshop Trinity College Dublin<br />
08/10/<strong>2012</strong> - 09/10/<strong>2012</strong> International Workshop on Intelligent Exploration of Semantic Data (IESD)<br />
<strong>2012</strong> at 18th International Conference on Knowledge Engineering and<br />
Knowledge Management (EKAW<strong>2012</strong>)<br />
16/10/<strong>2012</strong> - 17/10/<strong>2012</strong> FEISGILTT <strong>2012</strong> (Federated Event for Interoperability Standardization<br />
in Globalization, Internationalization, Localization, and Translation<br />
Technologies)<br />
Galway, Ireland<br />
Seattle, USA<br />
11/1/<strong>2012</strong> Workshop on Monolingual Translation at AMTA <strong>2012</strong> San Diego, USA<br />
10/28/<strong>2012</strong> Workshop on Post-editing Technology and Practice (WPTP-12) at AMTA<br />
<strong>2012</strong><br />
08/11/<strong>2012</strong> - 11/11/<strong>2012</strong> International Postgraduate Conference in Translating and Interpreting<br />
(IPCITI)<br />
San Diego, USA<br />
Dublin City University<br />
24/11/<strong>2012</strong> The Multimodality and Cyberpsychology Conference Dublin City University<br />
09/12/<strong>2012</strong> Second Workshop on Applying Machine Learning Techniques to Optimise<br />
the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)<br />
[<strong>CNGL</strong> co-organiser]<br />
08/12/<strong>2012</strong> -15/12/<strong>2012</strong> Machine Translation and Parsing in Indian Languages (MTPIL-<strong>2012</strong>) [<strong>CNGL</strong><br />
co-organiser]<br />
Mumbai, India<br />
Mumbai, India
134<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Equipment Valued at Over €50,000 Funded by <strong>CNGL</strong><br />
Item Price Description<br />
No equipment valued over €50,000 was purchased by <strong>CNGL</strong> in <strong>2012</strong><br />
Invention and Software Disclosures Filed<br />
Date Track Title<br />
25/01/<strong>2012</strong> ILT A Measurement method for detecting changes in tone-of-voice (voice quality) from recorded speech signals<br />
25/01/<strong>2012</strong> ILT AlignRank:An Evidence Propagation Algorithmfor Word Alignment<br />
12/03/<strong>2012</strong> LOC WorkFlow Recommender<br />
12/03/<strong>2012</strong> LOC LocConnect – Localisation Orchestration Framework<br />
12/03/<strong>2012</strong> LOC Localisation Knowledge Repository<br />
12/03/<strong>2012</strong> LOC XLIFF Phoenix<br />
12/03/<strong>2012</strong> LOC MT Mapper<br />
19/04/<strong>2012</strong> ILT IR Retrieval Model which combines the integrated Recommendation Results with IR retrieval results<br />
30/05/<strong>2012</strong> SF CAT Tool Instrumentation<br />
08/10/<strong>2012</strong> ILT Machine Translation Performance Predictor<br />
Patent Applications Submitted or Granted, and Licence Agreements Signed<br />
Date Title Application number Inventor Track Status<br />
30/05/<strong>2012</strong> Automatic Metadata Extraction from Multilingual<br />
Enterprise Content<br />
61/656,499 Melike Sah DCM US Provisional<br />
Licensed Technologies<br />
Licensed To Technology Track<br />
Xcelerator Data Visualisation Dashboard ILT<br />
Xcelerator Data Health Estimator for Machine Translation ILT<br />
Xcelerator Predictive Performance Estimator for Machine Translation ILT<br />
Welocalize A System for Tracking and Analysing Translator Behaviour in an Instrumented Post-editing Environment ILT<br />
Spinout Companies Created<br />
Company<br />
Emizar Customer Solutions Ltd.<br />
Incorporation Date 8th November 2011<br />
Registration # 505776<br />
Website<br />
www.emizar.com
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 135<br />
Awards and Honours Received<br />
Name Award Body Award Type Date<br />
Prof. Josef van Genabith Dublin City University DCU President’s Research Award<br />
for Science and Engineering<br />
February <strong>2012</strong><br />
Dr. Martin Emms,<br />
Hector Franco-Penya<br />
Joachim Wagner (<strong>CNGL</strong>/DCU),<br />
Dr. Jennifer Foster (NCLT/DCU),<br />
Rasul Samad Zadeh Kaljahi (NCLT/Symantec),<br />
Dr. Anton Bryl (Systransis AG, formerly <strong>CNGL</strong>),<br />
Dr. Joseph Le Roux (Université Paris 13,<br />
formerly NCLT/DCU)<br />
International Conference on Pattern<br />
Recognition Application and Methods<br />
(ICPRAM <strong>2012</strong>)<br />
First Workshop on Syntactic Analysis of<br />
Non-Canonical Language (SANCL)<br />
Best Paper Award February <strong>2012</strong><br />
Shared Task Win June <strong>2012</strong><br />
Ben Steichen Localisation Research Centre Best Thesis Award September <strong>2012</strong><br />
Liliana Mamani Sanchez,<br />
Dr. Carl Vogel<br />
Debasis Ganguly<br />
Debasis Ganguly,<br />
Dr. Johannes Leveling,<br />
Dr. Gareth Jones<br />
3rd IEEE International Conference on<br />
Cognitive Infocommunications<br />
Morpheme Extraction Task (MET)<br />
of FIRE <strong>2012</strong><br />
Eighth ASIA Information Retrieval<br />
Societies Conference (AIRS <strong>2012</strong>)<br />
Steering Committee<br />
Best Paper Award December <strong>2012</strong><br />
Bengali (Best), Hindi (Second Best) December <strong>2012</strong><br />
AIRS’12 Best Poster Paper Award December <strong>2012</strong><br />
Media Coverage<br />
Date Media Outlet Event Headline Link<br />
05/01/<strong>2012</strong> Techcentral.ie Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
05/01/<strong>2012</strong> Siliconrepublic.com Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
05/01/<strong>2012</strong> Businessandfinance.ie Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
05/01/<strong>2012</strong> Businessandleadership.com Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
14/01/<strong>2012</strong> Limerick Leader Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
14/01/<strong>2012</strong> Limerick Post Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
Cetra to grow Limerick<br />
Operation (translation<br />
services company attracted<br />
by third level research)<br />
20 new jobs as Cetra rolls<br />
into town<br />
120 new jobs for Dublin and<br />
Limerick<br />
Cetra to locate European<br />
centre in Limerick creating<br />
20 jobs<br />
New Limerick firm translates<br />
into 20 positions<br />
Translation services<br />
company to create 20 jobs<br />
http://www.techcentral.ie/article.<br />
aspxid=18040<br />
http://www.siliconrepublic.com/careers-centre/<br />
item/25210-20-new-jobs-for-limerick-as/<br />
http://www.businessandfinance.ie/news/120ne<br />
wjobsfordublinandlimerick<br />
http://www.businessandleadership.com/<br />
business/item/33541-cetra-to-locate-european/<br />
Page 6<br />
Page 14<br />
24/01/<strong>2012</strong> Siliconrepublic.com Launch of <strong>CNGL</strong> careers<br />
guide<br />
Students urged to consider a<br />
career in localisation<br />
http://www.siliconrepublic.com/careers-centre/<br />
item/25466-students-urged-to-consider/<br />
24/01/<strong>2012</strong> Education Matters (www.<br />
educationmatters.ie)<br />
Launch of <strong>CNGL</strong> careers<br />
guide<br />
High demand for graduates<br />
in localisation<br />
http://www.educationmatters.ie/<strong>2012</strong>/01/24/<br />
high-demand-for-graduates-in-localisation/<br />
25/01/<strong>2012</strong> Scoop It! Language Blog<br />
(www.scoop.it)<br />
Launch of <strong>CNGL</strong> careers<br />
guide<br />
<strong>CNGL</strong> Localisation Careers<br />
http://www.scoop.it/t/translation-andlocalization/p/1050392099/cngl-localisationcareers
136<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Date Media Outlet Event Headline Link<br />
25/01/<strong>2012</strong> www.science.ie Launch of <strong>CNGL</strong> careers<br />
guide<br />
A world of opportunities in<br />
localisation<br />
http://www.science.ie/science-news/<br />
opportunities-in-localisation.html<br />
25/01/<strong>2012</strong> Waterford Institute of<br />
Technology website (www.<br />
wit.ie)<br />
Launch of <strong>CNGL</strong> careers<br />
guide<br />
Graduates in high demand<br />
in Ireland’s 16,000-job<br />
localisation sector<br />
http://www.wit.ie/News/News/<br />
MainBody,48355,en.html<br />
26/01/<strong>2012</strong> Lingport i18nlog Launch of <strong>CNGL</strong> careers<br />
guide<br />
<strong>CNGL</strong> Launches Localization<br />
Careers Guide<br />
http://i18nblog.com/<strong>2012</strong>/01/26/cngllaunches-localization-careers-guide/<br />
26/01/<strong>2012</strong> Gradireland.com (Ireland’s<br />
official graduate jobs and<br />
careers website)<br />
Launch of <strong>CNGL</strong> careers<br />
guide<br />
Localisation – a growth area<br />
and a career opportunity<br />
http://gradireland.wordpress.com/<strong>2012</strong>/01/26/<br />
localisation-a-growth-area-and-a-careeropportunity/<br />
26/01/<strong>2012</strong> www.mysciencecareer.ie Launch of <strong>CNGL</strong> careers<br />
guide<br />
01/02/<strong>2012</strong> Education Matters ezine Launch of <strong>CNGL</strong> careers<br />
guide<br />
01/02/<strong>2012</strong> Siliconrepublic.com All Ireland Linguistics<br />
Olympiad<br />
01/02/<strong>2012</strong> egovmonitor.com Launch of <strong>CNGL</strong> careers<br />
guide<br />
06/02/<strong>2012</strong> Evening Echo Launch of <strong>CNGL</strong> careers<br />
guide<br />
08/02/<strong>2012</strong> Irish Independent Fostering foreign language<br />
skills<br />
Ireland becoming a global<br />
expert in localisation<br />
High demand for graduates<br />
in localisation area<br />
AILO fosters next generation<br />
of Irish computational<br />
linguists<br />
“Ireland Is Recognised As A<br />
Leader In The Localisation<br />
And Global Services Sector<br />
But We Need To Do More”<br />
– Sherlock<br />
With technology and a<br />
second language you will be<br />
a professional in demand<br />
Teaching languages at<br />
primary level will be a key to<br />
our economic future<br />
http://www.mysciencecareer.ie/resources/newsand-events/localisation-in-ireland<br />
http://www.siliconrepublic.com/innovation/<br />
item/25584-skillsfeb/<br />
http://www.egovmonitor.com/node/46072<br />
Page 33<br />
Page 15<br />
08/02/<strong>2012</strong> Irish Independent website<br />
(www.independent.ie)<br />
Fostering foreign language<br />
skills<br />
Teaching languages at<br />
primary level will be a key to<br />
our economic future<br />
http://www.independent.ie/lifestyle/education/<br />
features/in-my-opinion-teaching-languagesat-primary-level-will-be-a-key-to-our-economicfuture-3012676.html<br />
09/02/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />
Olympiad<br />
14/02/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />
Olympiad<br />
All Ireland Linguistics<br />
Olympiad<br />
Budding Strokestown<br />
linguists seek to decode the<br />
languages of the world<br />
Page 15<br />
Page 53<br />
07/03/<strong>2012</strong> Dublin City of Science<br />
website (www.<br />
dublinscience<strong>2012</strong>.ie)<br />
All Ireland Linguistics<br />
Olympiad<br />
All Ireland Linguistics<br />
Olympiad<br />
http://www.dublinscience<strong>2012</strong>.ie/<strong>2012</strong>/03/allireland-linguistics-olympiad/<br />
07/03/<strong>2012</strong> Evening Echo All Ireland Linguistics<br />
Olympiad<br />
09/03/<strong>2012</strong> Céist website (www.ceist.ie) All Ireland Linguistics<br />
Olympiad<br />
13/02/<strong>2012</strong> Techcentral.ie ComputeTY transition year<br />
programme<br />
Students have strategy to<br />
solve problems<br />
All Ireland Linguistics<br />
Olympiad (AILO)<br />
Transition year students<br />
decode Web design<br />
http://www.ceist.ie/news_events/view_article.<br />
cfmloadref=2&id=595<br />
http://www.techcentral.ie/article.<br />
aspxid=18301&utm_source=TechCentral<br />
+newsletter&utm_campaign=4755350324-<br />
13_022_13_<strong>2012</strong>&utm_<br />
medium=email#ixzz1mGomGpL0
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 137<br />
Date Media Outlet Event Headline Link<br />
14/03/<strong>2012</strong> Sunday Business Post<br />
website (www.businesspost.<br />
ie)<br />
Xcelerator/DCU licence<br />
Startup of the day:<br />
Xcelerator<br />
http://www.businesspost.ie/#!story/Home/<br />
News/Startup+of+the+day%3A+Xcelerator/<br />
id/86478410-84f6-07f6-0e90-4e777652148<br />
18/03/<strong>2012</strong> Sunday Business Post Xcelerator/DCU licence Startup of the day:<br />
Xcelerator<br />
22/03/<strong>2012</strong> Guideline Magazine Next Generation<br />
Localisation Careers<br />
21/03/<strong>2012</strong> Irish Examiner All Ireland Linguistics<br />
Olympiad<br />
21/03/<strong>2012</strong> Irish Examiner Online All Ireland Linguistics<br />
Olympiad<br />
Graduates in high demand<br />
in Ireland’s localisation<br />
sector<br />
Pupils pit wits against<br />
language puzzles<br />
Pupils pit wits against<br />
language puzzles<br />
Cover & Page 3<br />
http://www.irishexaminer.com/ireland/pupilspit-wits-against-language-puzzles-187781.html<br />
27/03/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />
Olympiad<br />
AILO Olympiad Page 55<br />
28/03/<strong>2012</strong> Irish Independent Language Advocacy Adios Espanol – Quinn<br />
dumps languages in primary<br />
schools (Cara Greene<br />
comment)<br />
Page 17<br />
29/03/<strong>2012</strong> South Tipp Today All Ireland Linguistics<br />
Olympiad<br />
29/03/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />
Olympiad<br />
30/03/<strong>2012</strong> www.sam-xlation.de SAM Xlation GbmH tests<br />
KantanMT product of <strong>CNGL</strong><br />
spinout Xcelerator<br />
02/04/<strong>2012</strong> Education Magazine Next Generation<br />
Localisation Careers<br />
School Ruain student in<br />
Linguistics Olympiad<br />
Scoil Ruain Student in<br />
Linguistics Olympiad<br />
Machine Translation Testing<br />
Graduates in high demand<br />
in Ireland’s localisation<br />
sector<br />
Page 31<br />
Page SS 3<br />
http://www.sam-xlation.de/index.php/de/aktue<br />
lles#MachineTranslationTesting<br />
Pages 12-13<br />
22/04/<strong>2012</strong> LANGTECHNEWS Innovation Voucher<br />
collaboration with Cipherion<br />
Translations<br />
Irish localisation company to<br />
add MT, crowd-sourcing and<br />
gamification<br />
24/04/<strong>2012</strong> Irish Times Insight<br />
supplement<br />
30/04/<strong>2012</strong> Department of Jobs,<br />
Enterprise & Innovation<br />
website (http://www.<br />
enterprise.gov.ie)<br />
Sign Language Machine<br />
Translation<br />
wripl winning pitch at Get<br />
Started Technology Venture<br />
Programme<br />
Lost in translation Page 13<br />
SFI-funded scientists head to<br />
Silicon Valley<br />
http://www.enterprise.gov.ie/News/Irish_<br />
researchers_secure_coveted_prize_of_trip_to_<br />
Silicon_Valley_.html<br />
30/04/<strong>2012</strong> Techcentral.ie wripl winning pitch at Get<br />
Started Technology Venture<br />
Programme<br />
30/04/<strong>2012</strong> TechCentral ezine wripl winning pitch at Get<br />
Started Technology Venture<br />
Programme<br />
Irish researchers secure trip<br />
to Silicon Valley<br />
Irish researchers secure trip<br />
to Silicon Valley<br />
http://www.techcentral.ie/article.<br />
aspxid=18832<br />
30/04/<strong>2012</strong> www.studentnews.ie wripl winning pitch at Get<br />
Started Technology Venture<br />
Programme<br />
Irish science researchers<br />
land key trip to Silicon Valley<br />
to meet technology chiefs<br />
http://langtechnews.hivefire.com/<br />
articles/146423/irish-localisation-company-toadd-mt-crowd-sourcin/<br />
http://www.studentnews.ie/irish-scienceresearchers-land-key-trip-to-silicon-valley-tomeet-technology-chiefs-5724<br />
April/<br />
May <strong>2012</strong><br />
edition<br />
Multilingual Magazine Localistion standards The localization standards<br />
ecosystem (article by Dr<br />
David Filip, <strong>CNGL</strong> at UL)
138<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Date Media Outlet Event Headline Link<br />
03/05/<strong>2012</strong> Department of Jobs,<br />
Enterprise & Innovation<br />
website (http://www.<br />
enterprise.gov.ie)<br />
All Ireland Linguistics<br />
Olympiad<br />
Double Gold for Belfast<br />
Schools in All Ireland<br />
Linguistics Olympiad<br />
http://www.enterprise.gov.ie/News/Double_<br />
Gold_for_Belfast_Schools_in_All_Ireland_<br />
Linguistics_Olympiad.html<br />
03/05/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />
Olympiad<br />
All-Ireland Linguistics Final Page SS 6<br />
12/05/<strong>2012</strong> South Belfast News All Ireland Linguistics<br />
Olympiad<br />
17/05/<strong>2012</strong> Northern Standard All Ireland Linguistics<br />
Olympiad<br />
Olympiad gold for<br />
Wellington team<br />
Photo: Mrs Geraldine Kelly<br />
making a presentation to<br />
Zoe Vance for her success<br />
in the Linguistics Olympiad<br />
© Rory Geary/Northern<br />
Standard<br />
Page 17<br />
Page 24<br />
24/05/<strong>2012</strong> Department of Jobs,<br />
Enterprise & Innovation<br />
website (http://www.<br />
enterprise.gov.ie)<br />
Google parsing challenge<br />
DCU-Paris 13 Team excels in<br />
Google parsing challenge<br />
http://www.enterprise.gov.ie/News/DCU-<br />
Paris_13_Team_excels_in_Google_Parsing_<br />
Challenge.html<br />
31/5/012 Ballincollig Today All Ireland Linguistics<br />
Olympiad<br />
31/05/<strong>2012</strong> Mid Cork Today All Ireland Linguistics<br />
Olympiad<br />
Photo: Among the<br />
winners at the Ballincollig<br />
Community School’s <strong>Annual</strong><br />
Awards Night was Grainne<br />
Hutchinson (Ovens),<br />
bronze award at All Ireland<br />
Linguistics Olympiad<br />
Photo: Among the<br />
winners at the Ballincollig<br />
Community School’s <strong>Annual</strong><br />
Awards Night was Grainne<br />
Hutchinson (Ovens),<br />
bronze award at All Ireland<br />
Linguistics Olympiad<br />
Page 8<br />
Page 8<br />
13/06/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
LRC Summer School<br />
Irish mobile app developers<br />
urged to localise their apps<br />
http://www.siliconrepublic.com/new-media/<br />
item/27729-irish-mobile-app-developers/<br />
13/06/<strong>2012</strong> Techcentral.ie LRC Summer School Irish mobile app developers<br />
must think global, says LRC<br />
http://www.techcentral.ie/19122/irishmobile-app-developers-must-think-global-sayslrc#ixzz1xfU9YK00<br />
13/06/<strong>2012</strong> TechCentral ezine LRC Summer School Irish mobile app developers<br />
must think global, says LRC<br />
13/06/<strong>2012</strong> Department of Jobs,<br />
Enterprise & Innovation<br />
website (http://www.<br />
enterprise.gov.ie)<br />
LRC Summer School<br />
Irish mobile app developers<br />
must think global, says<br />
Localisation Research Centre<br />
http://www.enterprise.gov.ie/News/Irish_<br />
mobile_app_developers_must_think_global_<br />
says_Localisation_Research_Centre.html<br />
13/06/<strong>2012</strong> Polish Interpreting (www.<br />
polish-interpreting.co.uk)<br />
LRC Summer School<br />
Irish mobile app developers<br />
urged to localise …<br />
http://polish-interpreting.co.uk/<strong>2012</strong>/06/13/<br />
irish-mobile-app-developers-urged-to-localise/<br />
14/06/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
W3C Multilingual Web<br />
Workshop<br />
Internet experts in Dublin to<br />
talk about multilingual web<br />
http://www.siliconrepublic.com/innovation/<br />
item/27759-internet-experts-in-dublin/<br />
14/06/<strong>2012</strong> Department of Jobs,<br />
Enterprise & Innovation<br />
website (http://www.<br />
enterprise.gov.ie)<br />
W3C Multilingual Web<br />
Workshop<br />
<strong>CNGL</strong> researchers at<br />
heart of efforts to facilitate<br />
Internationalisation of Web<br />
http://www.enterprise.gov.ie/News/<strong>CNGL</strong>_<br />
researchers_at_heart_of_efforts_to_facilitate_<br />
Internationalisation_of_Web.html<br />
17/06/<strong>2012</strong> Sunday Business Post Xcelerator/DCU<br />
collaboration<br />
Translation is finally brought<br />
up to speed
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 139<br />
Date Media Outlet Event Headline Link<br />
30/06/<strong>2012</strong> Limerick Leader – County<br />
Edition<br />
Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
Cetra Ireland’s new office Page 20<br />
30/06/<strong>2012</strong> Limerick Leader Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
Cetra Ireland’s new office Page 20<br />
30/06/<strong>2012</strong> Limerick Leader West<br />
Edition<br />
Influence of LRC in<br />
attracting Cetra European<br />
base to Limerick<br />
Cetra Ireland’s new office Page 20<br />
11/07/<strong>2012</strong> Multilingual E-Zine Xcelerator/DCU<br />
collaboration<br />
02/08/<strong>2012</strong> East Cork Journal International Linguistics<br />
Olympiad<br />
Commercialisation Fund<br />
Project<br />
Cork participating in<br />
International Linguistics<br />
Olympiad<br />
http://www.multilingual.com/<br />
mlNewsArchiveDetail.phpid=2521<br />
Page 16<br />
03/08/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
International Linguistics<br />
Olympiad<br />
Four Irish students in<br />
Slovenia to battle it out in<br />
Linguistics Olympiad<br />
http://www.siliconrepublic.com/innovation/<br />
item/28660-four-irish-students-in/<br />
03/08/<strong>2012</strong> World Irish (www.worldirish.<br />
com)<br />
International Linguistics<br />
Olympiad<br />
Four Irish Students Compete<br />
in International Linguistics<br />
Olympiad in Slovenia<br />
http://m.worldirish.com/listening-post/view/<br />
four-irish-students-compete-in-internationallinguistics-olympiad-in-slovenia-1641<br />
10/09/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
Innovation Showcase<br />
<strong>CNGL</strong> Localisation<br />
Innovation Showcase <strong>2012</strong><br />
http://www.siliconrepublic.com/events/<br />
event/2859-cngl-localisation-in<br />
11/09/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
LRC Conference<br />
Localisation conference in<br />
Limerick to focus on social<br />
trends<br />
http://www.siliconrepublic.com/innovation/<br />
item/29199-localisation-conference-in/<br />
13/09/<strong>2012</strong> www.newswhip.com LRC Conference Localisation conference in<br />
Limerick to focus on social<br />
trends<br />
21/09/<strong>2012</strong> Irish Independent Language Advocacy Only one in 25 primary<br />
pupils learn a language<br />
21/09/<strong>2012</strong> Limerick Post LRC Conference Twitter trends to aid<br />
translation<br />
http://www.newswhip.com/MoreInfo/<br />
Localisation-conference-in-Limerick-to-f/7480567<br />
11<br />
Page 86<br />
21/09/<strong>2012</strong> Galway City Tribune KantanMT spinout<br />
recruitment drive<br />
Cloud-based operation<br />
seeks people ‘hungry for a<br />
challenge’<br />
Page 10<br />
25/09/<strong>2012</strong> Tech Central (www.<br />
techcentral.ie)<br />
META-NET White Paper<br />
Most European languages<br />
not ready for ‘digital age’<br />
http://www.techcentral.ie/article.<br />
aspxid=19947<br />
25/09/<strong>2012</strong> Silicon Republic (www.<br />
siliconrepublic.com)<br />
Qun Liu joins <strong>CNGL</strong><br />
Prof Qun Liu, Professor Of<br />
Machine Translation<br />
http://www.siliconrepublic.com/careers/<br />
appointments/984-prof-qun-liu-centre-for<br />
26/09/<strong>2012</strong> The Sociable (http://<br />
sociable.co)<br />
META-NET White Paper<br />
Most European languages<br />
“unlikely to survive in the<br />
digital age”<br />
http://sociable.co/technology/most-europeanlanguages-unlikely-to-survive-in-the-digital-age/<br />
26/09/<strong>2012</strong> Multilingual E-Zine Qun Liu joins <strong>CNGL</strong> Centre for Next Generation<br />
Localisation appoints<br />
Professor of Machine<br />
Translation<br />
26/09/<strong>2012</strong> Gaelport META-NET White Paper Bagairt don Ghaeilge sa ré<br />
dhigiteach<br />
http://www.multilingual.com/<br />
mlNewsArchiveDetail.phpid=2526#8441<br />
http://www.gaelport.com/<br />
nuachtNewsItemID=8677
140<br />
Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />
APPENDIX 2: OUTPUTS<br />
Date Media Outlet Event Headline Link<br />
27/09/<strong>2012</strong> Radio na Gaeltachta META-NET White Paper Cormac ag a cuig http://www.rte.ie/radio/radioplayer/<br />
rteradioweb.html#!rii=17%3A3402740%3A1159<br />
8%3A27%2D09%2D<strong>2012</strong>%3A<br />
28/09/<strong>2012</strong> Newstalk – Splanc META-NET White Paper Agallamh le Ailbhe Ní<br />
Chasaide<br />
30/09/<strong>2012</strong> The Sunday Times META-NET White Paper Briefing Digital Irish: Lost for<br />
Words<br />
http://www.newstalk.ie/programmes/all/<br />
splanc/<br />
Page 16<br />
October/<br />
November<br />
<strong>2012</strong> Issue<br />
Multilingual Magazine Localisation Localization for the long tail:<br />
Part 1 (article by Dr David<br />
Filip, <strong>CNGL</strong> at UL)<br />
03/10/<strong>2012</strong> Siliconrepublic.com META-NET White Paper Irish language at risk of<br />
digital extinction, research<br />
shows<br />
19/11/<strong>2012</strong> South East Radio Cipherion Translations Mark Rodgers of Cipherion<br />
Translations on fruits of<br />
collaboration with <strong>CNGL</strong> at<br />
DCU (17 mins 45 secs)<br />
http://www.siliconrepublic.com/innovation/<br />
item/29483-irish-language-at-risk-of/<br />
https://www.youtube.com/<br />
watchv=zEhEPaPzZXU<br />
December<br />
<strong>2012</strong> Issue<br />
Multilingual Magazine Localisation Localization for the long tail:<br />
Part 2 (article by Dr David<br />
Filip, <strong>CNGL</strong> at UL)<br />
02/12/<strong>2012</strong> Sunday Business Post Emizar Emizar<br />
19/12/<strong>2012</strong> Multilingual E-Zine LORG parser LORG natural language<br />
parser<br />
http://www.multilingual.com/<br />
mlNewsArchiveDetail.phpid=2532#8521<br />
25/12/<strong>2012</strong> Antrim Times All Ireland Linguistics<br />
Olympiad<br />
Successful year for Antrim<br />
Grammar<br />
Page 6
Centre for Next Generation Localisation<br />
Dublin City University<br />
Dublin 9, Ireland<br />
Tel: +353-1-700 6700<br />
Fax: +353-1-700 6702<br />
Email: info@cngl.ie<br />
www.cngl.ie