22.04.2013 Views

Dr. Vasant Honavar Department of Computer Science honavar@cs ...

Dr. Vasant Honavar Department of Computer Science honavar@cs ...

Dr. Vasant Honavar Department of Computer Science honavar@cs ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Topics in Grammatical Inference and Computational Learning Theory<br />

Grammatical Inference, variously referred to as automata induction, grammar induction, and automatic language<br />

acquisition, refers to the process <strong>of</strong> learning <strong>of</strong> grammars and languages from data. Machine learning <strong>of</strong> grammars finds a<br />

variety <strong>of</strong> applications in syntactic pattern recognition, adaptive intelligent agents, diagnosis, computational biology,<br />

systems modeling, prediction, natural language acquisition, data mining and knowledge discovery. Regular grammars are<br />

the simplest class <strong>of</strong> formal grammars in the Chomsky hierarchy. An understanding <strong>of</strong> the issues and problems<br />

encountered in learning regular languages (or equivalently, identification <strong>of</strong> the corresponding deterministic finite<br />

automaton (DFA)) are therefore likely to provide insights into the problem <strong>of</strong> learning more general classes <strong>of</strong> languages.<br />

Under the standard complexity theoretic assumption P ≠NP, Pitt and Warmuth showed that no polynomial time algorithm<br />

can be guaranteed to produce a DFA that has approximately the same number <strong>of</strong> states as the target DFA from a set <strong>of</strong><br />

labeled examples corresponding to a DFA. When examples are drawn at random (as in the PAC setting), results proved<br />

by Kearns and Valiant suggest that an efficient algorithm for learning DFA would entail efficient algorithms for solving<br />

problems such as breaking the RSA cryptosystem, factoring Blum integers, and detecting quadratic residues, all <strong>of</strong> which<br />

are known to be hard under standard cryptographic assumptions.<br />

Against the background <strong>of</strong> strong negative results we investigated the feasibility <strong>of</strong> learning regular languages from<br />

examples under additional assumptions concerning the distribution from which the examples are drawn, thereby<br />

addressing the problem posed by Pitt, in his seminal paper: Are DFA PAC-identifiable if examples are drawn from the<br />

uniform distribution, or some other known simple distribution? We showed that:<br />

(a) The class <strong>of</strong> simple DFA (i.e., DFA whose canonical representations have logarithmic Kolmogorov complexity) is<br />

efficiently PAC learnable under the Solomon<strong>of</strong>f Levin universal distribution (Parekh and <strong>Honavar</strong>, 1999)<br />

(b) If the examples are sampled at random according to the universal distribution by a teacher that is<br />

knowledgeable about the target concept, the entire class <strong>of</strong> DFA is efficiently PAC learnable under the universal<br />

distribution, that is, DFA are efficiently learnable under the PACS Model (Parekh and <strong>Honavar</strong>, 1999; Parekh<br />

and <strong>Honavar</strong>, 2001)<br />

(c) Any concept that is learnable under Gold’s model for learning from characteristic samples, Goldman and<br />

Mathias’ polynomial teachability model, and the model for learning from example based queries is also learnable<br />

under the PACS model (Parekh and <strong>Honavar</strong>, 2000; 2001).<br />

Multi-Agent Systems for Integrated Host and Network Based Intrusion Detection (in collaboration with Johnny Wong<br />

and Les Miller, funded in part by a <strong>Department</strong> <strong>of</strong> Defense grant)<br />

This research was aimed at the development <strong>of</strong> approaches for monitoring complex Distributed Systems (e.g., computer<br />

systems, communication networks, power systems) for coordinated attacks using information from multiple are equipped<br />

with sensors and measurement devices. Both host and network-based approaches were investigated as part <strong>of</strong> this<br />

research. Results <strong>of</strong> this research include:<br />

(a) New tools for formal specification <strong>of</strong> intrusions using colored Petri nets and s<strong>of</strong>tware fault trees (Helmer et al., 2002)<br />

(b) Design and implementation <strong>of</strong> a multi-agent system for detection <strong>of</strong> coordinated or concerted attacks on distributed<br />

computing systems in particular by monitoring different processes, resources, users, events, and extract and<br />

integrate relevant information from disparate sources over multiple space and time scales (Wong et al., 2001; Helmer<br />

et al., 2003; Wang et al., 2005)<br />

(c) Development <strong>of</strong> data mining approaches for learning predictive rules for anomaly and misuse detection (Helmer et<br />

al., 2002).<br />

Agent-Based Approaches to Cooperative Traffic Management in Large Communication Networks (in collaboration<br />

with Johnny Wong and Armin Mikler)<br />

With the unprecedented growth in size and complexity <strong>of</strong> modern communication networks, the development <strong>of</strong> intelligent<br />

and adaptive approaches to system management (including such functions as routing, congestion control, traffic and load<br />

management, etc.) present several research challenges. Routing in a communication network refers to the task <strong>of</strong><br />

Dec 2005<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!