NASA Scientific and Technical Aerospace Reports
NASA Scientific and Technical Aerospace Reports
NASA Scientific and Technical Aerospace Reports
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
We explore the use of Optimal Mixture Models to represent topics. We analyze two broad classes of mixture models:<br />
set-based <strong>and</strong> weighted. We provide an original proof that estimation of set-based models is NP-hard, <strong>and</strong> therefore not<br />
feasible. We argue that weighted models are superior to set-based models, <strong>and</strong> the solution can be estimated by a simple<br />
gradient descent technique. We demonstrate that Optimal Mixture Models can be successfully applied to the task of document<br />
retrieval. Our experiments show that weighted mixtures outperform a simple language modeling baseline. We also observe that<br />
weighted mixtures are more robust than other approaches of estimating topical models.<br />
DTIC<br />
Information Retrieval; Mathematical Models; Optimization<br />
20060001853 Massachusetts Univ., Amherst, MA USA<br />
A Conditional R<strong>and</strong>om Field for Discriminatively-Trained Finite-State String Edit Distance<br />
McCallum, Andrew; Bellare, Kedar; Pereira, Fern<strong>and</strong>o; Jan. 1, 2005; 9 pp.; In English<br />
Report No.(s): AD-A440386; No Copyright; Avail.: Defense <strong>Technical</strong> Information Center (DTIC)<br />
The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence<br />
analysis, <strong>and</strong> other domains. This paper presents discriminative string-edit CRF’s a finite-state conditional r<strong>and</strong>om field model<br />
for edit sequences between strings. Conditional r<strong>and</strong>om fields have advantages over generative approaches to this problem,<br />
such as pair HMMs or the work of Ristad <strong>and</strong> Yianilos, because as conditionally-trained methods, they enable the use of<br />
complex, arbitrary actions <strong>and</strong> features of the input strings. As in generative models, the training data does not have to specify<br />
the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive<br />
<strong>and</strong> negative instances of string pairs. We present positive experimental results on several data sets.<br />
DTIC<br />
R<strong>and</strong>om Variables; Strings<br />
20060001854 Connecticut Univ., Storrs, CT USA<br />
Mapping Flows onto Networks to Optimize Organizational Processes<br />
Levchuk, Georgiy M.; Levchuk, Yuri N.; Pattipati, Krishna R.; Kleinman, David L.; Jan. 1, 2005; 25 pp.; In English; Original<br />
contains color illustrations<br />
Contract(s)/Grant(s): N00014-00-1-0101<br />
Report No.(s): AD-A440387; No Copyright; Avail.: Defense <strong>Technical</strong> Information Center (DTIC)<br />
Interdependence of tasks in a mission necessitates information flow among the organizational elements (agents) assigned<br />
to these tasks. This information flow introduces communication delays. An effective task schedule that minimizes the total<br />
execution time, including task processing <strong>and</strong> coordination delays, is an important issue in designing an organization <strong>and</strong> its<br />
task processing strategy. This paper defines the structure of information-dependent tasks, <strong>and</strong> describes an approach to map<br />
this structure to a network of organizational elements (agents). Since the general problem of scheduling tasks with<br />
communication is NP-hard, only fast heuristic (e.g., list scheduling <strong>and</strong> linear clustering) algorithms are discussed. The authors<br />
modify the priority calculation for list scheduling methods, matching the critical path with a network of heterogeneous agents.<br />
They then present their algorithm, termed Heterogeneous Dynamic Bottom Level (HDBL), <strong>and</strong> compare it with various<br />
list-scheduling heuristics. The results show that HDBL exhibits superior performance to all list scheduling algorithms,<br />
providing an improvement of over 25% in schedule length for communication-intensive task graphs.<br />
DTIC<br />
Comm<strong>and</strong> <strong>and</strong> Control; Information Transfer; Mapping; Networks; Optimization; Organizations; Scheduling<br />
20060001863 Maryl<strong>and</strong> Univ., College Park, MD USA<br />
Searching the Web with SHOE<br />
Heflin, Jeff; Hendler, James; Jan. 1, 2000; 7 pp.; In English; Original contains color illustrations<br />
Contract(s)/Grant(s): DAAL01-97-K-0135<br />
Report No.(s): AD-A440405; No Copyright; Avail.: Defense <strong>Technical</strong> Information Center (DTIC)<br />
Although search engine technology has improved in recent years, there are still many types of searches that return<br />
unsatisfactory results. This situation can be greatly improved if web pages use a semantic markup language to describe their<br />
content. We have developed SHOE, a language for this purpose, <strong>and</strong> in this paper describe a scenario for how the language<br />
could be used by search engines of the future. A major challenge to this system is designing a query tool that can exploit the<br />
power of a knowledge base while still being simple enough for the casual user. We present the SHOE Search tool, which<br />
225