13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ISBN: 978-972-8939-25-0 © 2010 IADISInput Input InputL i L i+1 …Output OutputL i+nOutputFigure 3. Iterated learning model (ILM)The initial learner (L i ) forms a hypothesis from the input, then uses that hypothesis to produce output forthe next generation (L i + 1 ), and the pattern repeats to the i + nth generation.data in order to be able to reproduce it correctly and if reproduced correctly, it will remain unstructured.However, providing only a subset of the total possible input to the learners, creates a “transmissionbottleneck” and produces a different result. Because the transmission bottleneck prevents exposure to all themeaning-signal pairs possible in a language, learners are forced to invent new signals as output whenprompted to reproduce a signal for an unfamiliar meaning. These inventions become part of the input for thenext learner, changing the language at each generation, and providing opportunities for accidental similaritiesbetween the signals of similar meanings to creep into the data, Learners can then interpret these similaritiesas weakly structured patterns, leading to a semi-structured hypothesis of the language. The learners then usethat hypothesis when prompted to invent new output for unfamiliar meanings, thus extending the structure inthe new inventions. After many generations, the structure becomes well enough established that a learner canform a hypothesis that correctly reproduces the entire language even when exposed to only a subset of thelanguage as input and will produce output that reliably leads a new learner to the same hypothesis.4. CRITERIA FOR APPLICATION OF THE ILMThe ILM can explain the evolution of structure in language because of 3 important criteria that are also trueof Wikipedia, making the ILM an excellent candidate to explain the evolution of structure and increasingnumbers of section headings and infoboxes, despite the poorly recognized benefits or the protests of infoboxcritics. The first important criterion is iterated learning, meaning that the output of one learner forms theinput for another. Just as all language users learned to speak from exposure to the language of others,Wikipedia editors learn how to make a Wikipedia article based on exposure to existing Wikipedia articles. Iftheir input includes structure, such as section headings and infoboxes, then their hypothesis of how to makean article will capture patterns of when and how to use that structure, and will later be used to add structurewhen creating output. Wikipedia now provides templates, or formalized hypotheses of appropriate structure,making it easier to include section headings and infoboxes, thus reinforcing the patterns in the data.Second, like language, Wikipedia has a transmission bottleneck. In many ILM simulations, thetransmission bottleneck is a formal limit on the percentage of all possible training data a learner recievesbefore being asked to reproduce the language. There is no limit on the number of well-formed utterances inevery natural language (Nowak, Komarova, & Niyogi, 2002), essentially making natural languages aninfinite set of meaning-signal pairs. Since all speakers produce output before being exposed to the infinitepossible meaning-signal pairs, there is always an “implied bottleneck” (Vogt, 2005). Likewise, althoughthere is not an infinite number of Wikipedia articles, exponential growth since 2002 (Voss, 2005) means thateditors cannot possibly be exposed to all the available input before creating new articles. The impliedbottlenecks for language users and Wikipedia editors force them to produce output based on a hypothesisformed from exposure to a subset of the total possible input, rather than exposure to a correct or ideal form tobe copied.Third, the perception of structure in the meaning space may be a requirement for the evolution ofstructure in the signal (Smith, 2003). Many ILM simulations represent language as a mapping frommeanings, such as combinations of objects and actions (Kirby, 2002) or points in a multi-dimensionalmeaning space (Smith, 2003) to signals, usually strings of characters from a limited set. The meaning spaceis structured because relationships between pairs of meanings that share actions or objects, or that are neareach other in the multi-dimensional space, make them more similar than other pairs of meanings that <strong>do</strong> notshare features or proximity. Through the interaction of iterated learning and the transmission bottleneck, therelationship across meanings comes to be mirrored by structure in the signal. Like meaning in languages,342

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!