Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Preface<br />
About six years ago, almost by accident, I ended up engaging in an academic research<br />
career. It all started with my Master’s dissertation, <strong>the</strong> final, and probably <strong>the</strong> most<br />
important, stage <strong>of</strong> my Master’s degree. Then, I was not planning to dedicate more<br />
than one year <strong>of</strong> my life to research. But even one year later, when I started working<br />
as a researcher for Linguateca, it was far from my thoughts that I would soon enroll<br />
on a PhD.<br />
Briefly, <strong>the</strong> main goal <strong>of</strong> my Master’s work was to, given a rhythmic sequence,<br />
generate matching lyrics, in Portuguese. My intention was always to work with my<br />
mo<strong>the</strong>r tongue – not only because I felt that <strong>the</strong> results would be more understandable<br />
and funnier for <strong>the</strong> people surrounding me, but also because I used to write a<br />
few Portuguese lyrics for my former band. I was thus very interested in investigating<br />
how far an automatic lyricist could go.<br />
However, working with Portuguese revealed to be a challenging task. Since <strong>the</strong><br />
beginning <strong>of</strong> <strong>the</strong> work, we noticed that <strong>the</strong>re was a lack <strong>of</strong> language resources for<br />
Portuguese and it was not easy to find <strong>the</strong> few existing ones. For instance, at<br />
that time, we could not find a public comprehensive lexicon for providing words<br />
and information on <strong>the</strong>ir morphology and possible inflections. Not to mention a<br />
semantics-oriented lexicon. Since <strong>the</strong>n, I decided I wanted to contribute with something<br />
useful, that would hopefully fulfill <strong>the</strong> aforementioned shortage <strong>of</strong> resources.<br />
More or less at <strong>the</strong> same time, I had my first contact with Linguateca, a distributed<br />
language resource centre for Portuguese, responsible not only for cataloguing existing<br />
resources, but also for developing and providing free access to <strong>the</strong>m.<br />
I was very lucky that, before <strong>the</strong> end <strong>of</strong> my Master’s, Linguateca opened a position<br />
that I applied for. The main goal <strong>of</strong> this position was to develop PAPEL,<br />
a lexical-semantic resource for Portuguese, automatically extracted from a dictionary.<br />
After my Master’s, I was hired for that precise task. While working for<br />
Linguateca, I started to have a deeper contact with o<strong>the</strong>r researchers working on<br />
<strong>the</strong> computational processing <strong>of</strong> Portuguese. I started to gain some experience on<br />
natural language processing (NLP), especially on semantic information extraction,<br />
and I became passionate for research in this area. So much that, today, I do not see<br />
myself doing something completely unrelated.<br />
The work with Linguateca was very important for my training as a researcher<br />
in NLP. It was so enriching that I felt that, with what I had learned, I could do,<br />
and learn, more. And <strong>the</strong>re is so much to do to contribute to <strong>the</strong> development <strong>of</strong><br />
Portuguese NLP, that I wanted to continue my work, which I did, after embarking<br />
on my PhD. This <strong>the</strong>sis presents <strong>the</strong> result <strong>of</strong> a four year PhD where, starting with<br />
what we learned with PAPEL, we created a larger resource, <strong>Onto</strong>.<strong>PT</strong>, by exploiting<br />
o<strong>the</strong>r sources, and we developed a model for organising this resource in an alternative<br />
way, which might suit better concept-oriented NLP.