27.06.2013 Views

Proceedings of the 12th European Conference on Knowledge ...

Proceedings of the 12th European Conference on Knowledge ...

Proceedings of the 12th European Conference on Knowledge ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Benedikt Schmidt, Todor Stoitsev and Max Mühlhäuser<br />

sentences in natural language) and unstructured textual elements (e.g. single words in natural<br />

language). The text has been separated in two sets. On <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e hand, applicati<strong>on</strong> related text, like<br />

menu labels, help texts etc. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, text that is part <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> work process, e.g. <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

a text file viewed during work executi<strong>on</strong>s. In <str<strong>on</strong>g>the</str<strong>on</strong>g> following, we will focus <strong>on</strong> textual informati<strong>on</strong> that is<br />

part <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> work process.<br />

Generally, document-centric views <strong>on</strong> textual c<strong>on</strong>tent dominate <str<strong>on</strong>g>the</str<strong>on</strong>g> literature (Steyvers & Griffiths,<br />

2010). In a work process textual informati<strong>on</strong> is displayed based <strong>on</strong> passages <str<strong>on</strong>g>of</str<strong>on</strong>g> resources from<br />

different repositories: <str<strong>on</strong>g>the</str<strong>on</strong>g> private system, shared repositories <str<strong>on</strong>g>of</str<strong>on</strong>g> a group/community/company or <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

internet. Generally, it is unforeseeable which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>tent might occur in a work process. As <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

textual c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g> a work process is a complex mixture <str<strong>on</strong>g>of</str<strong>on</strong>g> different, unc<strong>on</strong>nected informati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

document-centric view is not applicable. One difference is especially important: <str<strong>on</strong>g>the</str<strong>on</strong>g> temporalizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

text in <str<strong>on</strong>g>the</str<strong>on</strong>g> work process. A piece <str<strong>on</strong>g>of</str<strong>on</strong>g> text gets associated with <str<strong>on</strong>g>the</str<strong>on</strong>g> time-span, during which it was in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

focus <str<strong>on</strong>g>of</str<strong>on</strong>g> an applicati<strong>on</strong> (e.g. Micros<str<strong>on</strong>g>of</str<strong>on</strong>g>t Word displaying <str<strong>on</strong>g>the</str<strong>on</strong>g> text in a focused window). This durati<strong>on</strong><br />

can be interpreted as an identifier <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relevance <str<strong>on</strong>g>of</str<strong>on</strong>g> text (assuming that <str<strong>on</strong>g>the</str<strong>on</strong>g> user was actively<br />

working with <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>tent).<br />

3.3 Topics and Text similarity<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> following, two types <str<strong>on</strong>g>of</str<strong>on</strong>g> regularities <str<strong>on</strong>g>of</str<strong>on</strong>g> textual c<strong>on</strong>tent that occur within work processes are<br />

examined. The first regularity addresses <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> textual c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g> different users, when <str<strong>on</strong>g>the</str<strong>on</strong>g>y<br />

are executing <str<strong>on</strong>g>the</str<strong>on</strong>g> same task. As <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no restricti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> that users may use during<br />

task executi<strong>on</strong>, it is an important questi<strong>on</strong>, whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r people, working in similar envir<strong>on</strong>ment and having<br />

comparable backgrounds, use <str<strong>on</strong>g>the</str<strong>on</strong>g> same type <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong>. The sec<strong>on</strong>d regularity addresses topics<br />

that are hidden in <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>tent. Topic could be a guiding <str<strong>on</strong>g>the</str<strong>on</strong>g>me which structures a task<br />

executi<strong>on</strong> process.<br />

3.3.1 Similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> textual c<strong>on</strong>tent<br />

To examine <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> text involved in <str<strong>on</strong>g>the</str<strong>on</strong>g> task executi<strong>on</strong> processes, we modeled <str<strong>on</strong>g>the</str<strong>on</strong>g> textual<br />

c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g> tasks in a vector space, using Latent Semantic Analysis (LSA). LSA realizes a bag-<str<strong>on</strong>g>of</str<strong>on</strong>g>words<br />

approach by using a matrix with columns representing task instances and a word count for all<br />

different words given in <str<strong>on</strong>g>the</str<strong>on</strong>g> row vectors. Time was c<strong>on</strong>sidered as relevance factor, i.e. text presented<br />

for a l<strong>on</strong>ger period <str<strong>on</strong>g>of</str<strong>on</strong>g> time to a user was c<strong>on</strong>sidered more important than text presented for a short<br />

period <str<strong>on</strong>g>of</str<strong>on</strong>g> time. Text display durati<strong>on</strong> was used to weight <str<strong>on</strong>g>the</str<strong>on</strong>g> word count in <str<strong>on</strong>g>the</str<strong>on</strong>g> rows. Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

matrix a semantic space <str<strong>on</strong>g>of</str<strong>on</strong>g> lower dimensi<strong>on</strong> is created with a vector for each task. Task instances are<br />

compared by calculating <str<strong>on</strong>g>the</str<strong>on</strong>g> cosine similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> respective vectors (with 1 denoting total similarity<br />

and 0 denoting no similarity). The idea is that instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same tasks should have a high<br />

similarity value, compared to instances <str<strong>on</strong>g>of</str<strong>on</strong>g> different tasks.<br />

Each task instance has been compared with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r existing task instances. Resulting similarities are<br />

shown in Figure 5 as 40x40 matrix with an entry for each task comparis<strong>on</strong>. The red boxes in <str<strong>on</strong>g>the</str<strong>on</strong>g> figure<br />

stand for <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> a task instance with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same task. From left to right<br />

and top to down <str<strong>on</strong>g>the</str<strong>on</strong>g> eight instances <str<strong>on</strong>g>of</str<strong>on</strong>g> task 1 are shown, followed by eight instances <str<strong>on</strong>g>of</str<strong>on</strong>g> task 2 etc.<br />

The black diag<strong>on</strong>al shows <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a task with itself, resulting in a similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> 1.<br />

The figure shows that for <str<strong>on</strong>g>the</str<strong>on</strong>g> tasks 1, 3, 4 and 5 <str<strong>on</strong>g>the</str<strong>on</strong>g> mean similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same task is<br />

notably higher than <str<strong>on</strong>g>the</str<strong>on</strong>g> mean similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> a task instance with instances <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tasks. For tasks 1, 3<br />

and 5 <str<strong>on</strong>g>the</str<strong>on</strong>g> topic was precisely defined in <str<strong>on</strong>g>the</str<strong>on</strong>g> task executi<strong>on</strong> processes and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> initial resources<br />

was highlighted (e.g. applicati<strong>on</strong> forms). Task 4 instances have smaller similarity value than task 1, 3<br />

and 5 as no resources are proposed and <str<strong>on</strong>g>the</str<strong>on</strong>g> individuals <strong>on</strong>ly have a similar goal. Task 2 does not<br />

have a high similarity am<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> task instances. As task 2 asked <str<strong>on</strong>g>the</str<strong>on</strong>g> participants to collect papers <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>ir research domain very different types <str<strong>on</strong>g>of</str<strong>on</strong>g> papers were searched, reviewed and selected.<br />

Overall, instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same task have a high similarity value, in spite <str<strong>on</strong>g>of</str<strong>on</strong>g> different executors. The<br />

detail <str<strong>on</strong>g>of</str<strong>on</strong>g> task descripti<strong>on</strong> influences <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity.<br />

896

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!