Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Statistical Language Models based on Neural Networks - Faculty of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>of</strong> the task, as <strong>of</strong>ten the easiest way how to obtain good results is to choose crude but fast<br />
techniques, and train models <strong>on</strong> as much data as available. This strategy is however not<br />
getting any closer to solving the problems, rather avoiding them for as l<strong>on</strong>g as possible.<br />
For many tasks today, the amount <strong>of</strong> the available training data is so huge that further<br />
progress by adding another data is not very likely.<br />
Another reas<strong>on</strong> why advanced techniques are not used in practice is importance <strong>of</strong> the<br />
achieved results: it is comm<strong>on</strong>ly known that most <strong>of</strong> the published papers report <strong>on</strong>ly<br />
negligible improvements over basic baselines. Even the best techniques rarely affect the<br />
word error rate <strong>of</strong> speech recogniti<strong>on</strong> systems by more than 10% relatively - and that is<br />
hardly observable difference from the user perspective. However, even small difference can<br />
be huge in the l<strong>on</strong>g term - competiti<strong>on</strong>s are <strong>of</strong>ten w<strong>on</strong> by a slight margin. Also, even if<br />
the improvements are small and hardly observable, it is likely that in the l<strong>on</strong>ger term, the<br />
majority <strong>of</strong> users will tend to prefer the best system.<br />
While I see integrati<strong>on</strong> <strong>of</strong> neural net language models into producti<strong>on</strong> systems as the<br />
next step for the language modeling research, there is still much to do in the basic research.<br />
Based <strong>on</strong> the history <strong>of</strong> the language modeling research that has been <strong>of</strong>ten rather chaotic,<br />
it might be fruitful to first define a roadmap. While detailed proposal for future research<br />
is out <strong>of</strong> scope <strong>of</strong> this work, the main points are:<br />
• The involved models should be computati<strong>on</strong>ally much less restricted than the tradi-<br />
ti<strong>on</strong>al <strong>on</strong>es; it should be clear that a compact soluti<strong>on</strong> to simple problems can exist<br />
in the model space<br />
• The progress should be measured <strong>on</strong> increasingly more complex tasks (for example,<br />
finding the most likely word in an incomplete sentence, as in [83])<br />
• The tasks and the training data should be coherent and publicly available<br />
While such research would not be competitive with the comm<strong>on</strong> techniques in the<br />
short term, it is certain that a progress bey<strong>on</strong>d models such as finite state machines is<br />
needed. It has been popular to claim that we need orders <strong>of</strong> magnitude more powerful<br />
computers, and also much more training data to make progress towards AI - I find this<br />
doubtful. In my opini<strong>on</strong>, what needs to be addressed is the capability <strong>of</strong> the machine<br />
learning techniques to efficiently discover new patterns.<br />
112