Dr. Vasant Honavar Department of Computer Science honavar@cs ...

More documents

Recommendations

Info

Topics in Grammatical Inference and Computational Learning Theory Grammatical Inference, variously referred to as automata induction, grammar induction, and automatic language acquisition, refers to the process of learning of grammars and languages from data. Machine learning of grammars finds a variety of applications in syntactic pattern recognition, adaptive intelligent agents, diagnosis, computational biology, systems modeling, prediction, natural language acquisition, data mining and knowledge discovery. Regular grammars are the simplest class of formal grammars in the Chomsky hierarchy. An understanding of the issues and problems encountered in learning regular languages (or equivalently, identification of the corresponding deterministic finite automaton (DFA)) are therefore likely to provide insights into the problem of learning more general classes of languages. Under the standard complexity theoretic assumption P ≠NP, Pitt and Warmuth showed that no polynomial time algorithm can be guaranteed to produce a DFA that has approximately the same number of states as the target DFA from a set of labeled examples corresponding to a DFA. When examples are drawn at random (as in the PAC setting), results proved by Kearns and Valiant suggest that an efficient algorithm for learning DFA would entail efficient algorithms for solving problems such as breaking the RSA cryptosystem, factoring Blum integers, and detecting quadratic residues, all of which are known to be hard under standard cryptographic assumptions. Against the background of strong negative results we investigated the feasibility of learning regular languages from examples under additional assumptions concerning the distribution from which the examples are drawn, thereby addressing the problem posed by Pitt, in his seminal paper: Are DFA PAC-identifiable if examples are drawn from the uniform distribution, or some other known simple distribution? We showed that: (a) The class of simple DFA (i.e., DFA whose canonical representations have logarithmic Kolmogorov complexity) is efficiently PAC learnable under the Solomonoff Levin universal distribution (Parekh and Honavar, 1999) (b) If the examples are sampled at random according to the universal distribution by a teacher that is knowledgeable about the target concept, the entire class of DFA is efficiently PAC learnable under the universal distribution, that is, DFA are efficiently learnable under the PACS Model (Parekh and Honavar, 1999; Parekh and Honavar, 2001) (c) Any concept that is learnable under Gold’s model for learning from characteristic samples, Goldman and Mathias’ polynomial teachability model, and the model for learning from example based queries is also learnable under the PACS model (Parekh and Honavar, 2000; 2001). Multi-Agent Systems for Integrated Host and Network Based Intrusion Detection (in collaboration with Johnny Wong and Les Miller, funded in part by a Department of Defense grant) This research was aimed at the development of approaches for monitoring complex Distributed Systems (e.g., computer systems, communication networks, power systems) for coordinated attacks using information from multiple are equipped with sensors and measurement devices. Both host and network-based approaches were investigated as part of this research. Results of this research include: (a) New tools for formal specification of intrusions using colored Petri nets and software fault trees (Helmer et al., 2002) (b) Design and implementation of a multi-agent system for detection of coordinated or concerted attacks on distributed computing systems in particular by monitoring different processes, resources, users, events, and extract and integrate relevant information from disparate sources over multiple space and time scales (Wong et al., 2001; Helmer et al., 2003; Wang et al., 2005) (c) Development of data mining approaches for learning predictive rules for anomaly and misuse detection (Helmer et al., 2002). Agent-Based Approaches to Cooperative Traffic Management in Large Communication Networks (in collaboration with Johnny Wong and Armin Mikler) With the unprecedented growth in size and complexity of modern communication networks, the development of intelligent and adaptive approaches to system management (including such functions as routing, congestion control, traffic and load management, etc.) present several research challenges. Routing in a communication network refers to the task of Dec 2005 12
propagating a message from its source towards its destination. Such a routing algorithm may be required to meet a diverse set of often conflicting performance requirements (e.g., average message delay, network utilization, etc.), thus making it an instance of a multi-criterion optimization problem. In practice, routing decisions in large communication networks are based on imprecise and uncertain knowledge of the current network state. This imprecision is a function of the network dynamics, the memory available for storage of network state information at each node, the frequency of, and propagation delay associated with, update of such state information. Motivated by these considerations, we set out to investigate efficient strategies for routing in very large networks that do not rely on the maintenance and update of a global network state. This research draws on techniques in knowledge representation, decision-theoretic methods, heuristics, as well as techniques of adaptive control to develop powerful tools for the design of intelligent, adaptive, and autonomous communication networks. Some specific results of this research include (Mikler, Honavar, and Wong, 1996; Mikler, Honavar, and Wong, 2001): (a) A novel knowledge representation scheme which enables each node in the network to maintain and update a small knowledge base of constant size (independent of the size of the network). This knowledge base summarizes the state of the network from the point of view of the routing agent for that node. It provides an accurate picture of the network in the immediate neighborhood of the agent and a spatio-temporally averaged summary of the network state in distant neighborhoods. (b) A utility-theoretic approach to routing that allows flexible tradeoff between delay for a specific message and the overall network load (and hence expected delay for all routed messages). This mechanism takes advantage of the fact that the number of available paths (and hence the flexibility of routing decisions) grows as a function of distance between the source and destination (c) Theoretical and experimental results demonstrating several desirable properties of the proposed approach including minimization of delay and load balancing over the entire network without access to accurate global network state information. PUBLICATIONS AND PRESENTATIONS Books 1. Caragea, D. and Honavar, V. Knowledge Acquisition from Heterogeneous, Distributed, Autonomous Data Sources. (2005). To appear. 2. Patel, M., Honavar, V. & Balakrishnan, K. (Ed.) (2001). Advances in Evolutionary Synthesis of Intelligent Agents. Cambridge, MA: MIT Press. 3. Honavar, V. & Slutzki, G. (Ed.) (1998). Grammatical Inference Vol. 1433. Lecture Notes in Computer Science. Berlin: Springer-Verlag. 4. Honavar, V. & Uhr, L. (1994) (Ed). Artificial Intelligence and Neural Networks: Steps Toward Principled Integration. New York, NY: Academic Press. 5. Banzaf, W., Daida, J., Eiben, A. Garzon, M., Honavar, V., Jakiela, M., & Smith, R. (Ed.) (1999). Proceedings of the Genetic and Evolutionary Computation Conference. San Mateo, CA: Morgan Kaufmann. 6. W. Langdon, E. Cantu-Paz, K. Mathias, R. Roy, D. Davis, R. Poli, K. Balakrishnan, V. Honavar, G. Rudolph, J. Wegener, L. Bull, M. Potter, A. Schultz, J. Miller, E. Burke, N. Jonoska. (2002). (Ed). Proceedings of the Genetic and Evolutionary Computing Conference. Palo Alto, CA: Morgan Kaufmann. 7. H. J. Caulfield, S.-H. Chen, H.-D. Cheng, R. Duro, V. Honavar, E. E. Kerre, M. Lu, M. G. (2002). Romay, T. K. Shih, D. Ventura, P. P. Wang, and Y. Yang, editors, 2002. Proceedings 6th Joint Conference on Information Sciences, JCIS / Association for Intelligent Machinery. Refereed Journal Papers 1. Andorf, C., Dobbs, D., and Honavar, V. (2005) Reduced Alphabet Representations of Amino Acid Sequences for Protein Function Classification. Information Sciences. In press. 2. Terribilini, M., Lee, J.H., Yan, C., Jernigan, R., Honavar, V., and Dobbs, D. (2005). Computational Prediction of Protein-RNA Interfaces. RNA Journal. To appear. Dec 2005 13
Page 1 and 2: VASANT G. HONAVAR Dr. Vasant Honava
Page 3 and 4: 1999 IGERT Review Panel National Sc
Page 5 and 6: informatics, social informatics, cr
Page 7 and 8: 16. Artificial Intelligence Applica
Page 9 and 10: c. Experimental and theoretical cha
Page 11: Representative Completed Research P
Page 15 and 16: 24. Yang, J., Parekh, R., and Honav
Page 17 and 18: Biological Data Management (BIDM 20
Page 19 and 20: 39. Silvescu, A. and Honavar, V. (2
Page 21 and 22: 75. Parekh, R. & Honavar, V. (1996)
Page 23 and 24: 13. Balakrishnan, K. & Honavar, V.
Page 25 and 26: INVITED LECTURES AND TUTORIALS Plen
Page 27 and 28: Graduate students who join my lab t
Page 29 and 30: scientist at Pioneer Hi-Bred and hi
Page 31 and 32: 33. Richa Agrawala Computer Science
Page 33 and 34: PRE-COLLEGE STUDENT RESEARCH SUPERV
Page 35 and 36: lecture notes serving as supplement
Page 37 and 38: China 2003 Organizer and Chair Comp
Page 39 and 40: Connection Science Genetic Programm

Dr. Vasant Honavar Department of Computer Science honavar@cs ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?