Selected Publications

SELECTED PUBLICAIONS (with abstract)• “Evaluating Contents-Link Coupled Web Page Clustering for Web Search Results”, YitongWang and Masaru Kitsuregawa in Proceeding of 11 th International conference on Information andKnowledge Management (CIKM ‘2002), McLean, VA, USA. pp. 499-506 of ACM Press.Abstract: Clustering is currently one of the most important and efficient techniques for dealing (e.g.resources locating, information interpreting) with massive amount of heterogeneous information on the web.Unlike clustering in other fields, web page clustering separates unrelated pages and clusters related pages(to a specific topic) into semantically meaningful groups, which is useful for discrimination, summarization,organization and navigation of unstructured web pages. We have proposed a contents-link coupledclustering algorithm that clusters web pages by combining contents and link analysis. In this paper, weparticularly study the effects of out-links (from the web pages), in-links (to the web page) and terms on thefinal clustering results as well as how to effectively combine these three parts to improve the quality ofclustering results. We apply it to cluster web search results. Preliminary experiments and evaluations areconducted on various topics. As the experimental results show, the proposed clustering algorithm iseffective and promising.• “On Combining Link and Contents Information for Web page Clustering”, Yitong Wang andMasaru Kitsuregawa in Proceeding of 12 th International conference on Database and Expert SystemsApplications (DEXA2002), Aix-en-Provence, France, Sep. 2 nd -6 th . . pp. 902-913, Springer (LectureNotes in Computer Science), Vol.2118.Abstract Clustering is currently one of the most crucial techniques for dealing (e.g.. resources locatinginformation interpreting) with massive amount of heterogeneous information on the web, which is beyondhuman being’ capacity to digest. In this paper, we discuss the shortcomings of pervious approaches andpresent a unifying clustering algorithm to cluster web search results for a specific query topic by combininglink and contents information. Especially, we investigate how to combine link and contents analysis inclustering process to improve the quality and interpretation of web search results .The proposed approachautomatically clusters the web search results into high quality, semantically meaningful groups in a concise,easy-to-interpret hierarchy with tagging terms. Preliminary experiments and evaluations are conducted andthe experimental results show that the proposed approach is effective and high quality.• “Examining the Quality of Link-Contents Coupled Clustering for Web Pages”, Yitong Wang andMasaru Kitsuregawa in the Database Workshop of Information Processing Society (DBWS’ 2002),Vol.102, No. 208, pp143-148, July. 2002, Tokyo, Japan.Abstract While web data has created big challenges for data management (retrieval andinterpretation) and data mining, efforts from different fields have been attempted. Current well-knownclustering algorithms cannot offer a good solution to web page clustering both in efficiency andeffectiveness. In this paper, we first separately check in detail about link-based clustering and term-basedclustering, especially about their advantages and weaknesses and then present a unifying clusteringalgorithm by combining link and contents information and apply it to web search results. In particular, weexamine the quality of the proposed link-contents coupled clustering approach. Experiments andcomparisons are conduced on various topics to show that the proposed approach is very effective.• “Combining Link and Contents in Clustering Web Search Results to Improve InformationInterpretation”, Yitong Wang and Masaru Kitsuregawa in the Proceedings of 2002 Data EngineeringWorkshop (DEWS'2002), pp. C4-2, March 4-6, Kurasiki, JapanAbstract With information proliferate on the web, it is far beyond human’s ability to digest this huge,heterogeneous information, e.g. locating related resources as well as providing accordingly interpretation.While web search engine could retrieve information on the Web for a specific topic, users have to step along ordered list in order to locate the needed information, which is often tedious and frustrating. In thispaper, we investigate how to combine link and contents analysis in clustering web search results to improve

information interpretation for a specific topic. By filtering some irrelevant pages, the proposed approachclusters high quality pages in web search results into semantically meaningful groups with additionaltagging to facilitate usersaccessing and understanding. We especially study the effect of similaritythreshold and then give a heuristic yet effective way to set initial similarity threshold. Preliminaryexperiments and evaluations are conducted to investigate its effectiveness.• “Use Link-based Clustering to Improve Web Search Results”, Yitong Wang and MasaruKitsuregawa, Proceeding of 2 nd International conference on Web Information System Engineering(WISE’2001), Dec, Japan. pp. 115-124, IEEE Computer SocietyAbstract While web search engine could retrieve information on the Web for a specific topic, users have tostep a long ordered list in order to locate the needed information, which is often tedious and less efficient.In this paper, we propose a new link-based clustering approach to cluster search results returned from Websearch engine by exploring both co-citation and coupling. Unlike document clustering algorithms in IR thatare based on common words/phrases shared among documents, our approach is based on common linksshared by pages. We also extend standard clustering algorithm, K-means, to make it more natural to handlenoises and apply it to web search results. In particular, we study the effect of parameters introduced in thealgorithm: similarity threshold and merging threshold and then give our recommendation. By filtering someirrelevant pages, our approach clusters high quality pages in web search results into semanticallymeaningful groups. Preliminary experiments and evaluations are conducted to investigate its effectiveness.The experimental results show that link-based clustering of web search results is promising and beneficial.• “Link Based Clustering of Web Search Results”, Yitong Wang and Masaru Kitsuregawa inProceeding of 2 nd International Conference on Advances in Web-Age Information Management(WAIM’ 2001), China, pp.225-236, Springer (Lecture Notes in Computer Science), Vol.2118.Abstract: With information proliferation on the Web, how to obtain high-quality information from the Webhas been one of hot research topics in many fields like Database, IR as well as AI. Web search engine is themost commonly used tool for information retrieval; however, its current status is far from satisfaction. Inthis paper, we propose a new approach to cluster search results returned from Web search engine using linkanalysis. Unlike document clustering algorithms in IR that based on common words/phrases sharedbetween documents, our approach is base on common links shared by pages using co-citation and couplinganalysis. We also extend standard clustering algorithm K-means to make it more natural to handle noisesand apply it to web search results. By filtering some irrelevant pages, our approach clusters high qualitypages into groups to facilitate users’ accessing and browsing. Preliminary experiments and evaluations areconducted to investigate its effectiveness. The experiment results show that clustering on web search resultsvia link analysis is promising• “Clustering Web Search Results with Link Analysis”, Yitong Wang and Masaru Kitsuregawa, inthe Proceedings of 2001 Data Engineering Workshop (DEWS'2001), March 8-10, pp. C4-3, IzuAtagawa, JapanAbstract How to obtain high-quality information from the Web has been one of core research challenges inmany fields like Database, IR as well as AI. Web search engine is the most commonly used tool forinformation retrieval; however, its current status is far from satisfaction. In this paper, we propose a newapproach to cluster search results returned from Web search engine using link analysis. Our approach isbase on common links shared by pages using co-citation and coupling analysis. We also extend standardclustering algorithm K-means to make it more natural to handle noises and apply it to web search results.By filtering some irrelevant pages, our approach clusters high quality pages into groups to facilitate users’accessing and browsing. Preliminary experiments are conducted to investigate its effectiveness. Theexperiment results show that clustering on web search results via link analysis is potentially beneficial.

• “Query on Semi-Structured Data”, Yitong Wang, Jinhai Chen and Baile Shi, Journal of Software,vol. 6, pp358-362, June 1999 (in Chinese)Abstract: With the development of technique of Internet and Multimedia, more and more data areavailable electronically, e.g. SGML document, email etc. In this paper, we systematically exploresemi-structured data query including queries’ syntax and semantics. We first present some weaknessesof current techniques for querying on semi-structured data and propose a novel method called TOS(Target Object Set) transformation, which binds the common prefix of select clause and where clausewith one object variable. By doing so, we can use query processor and optimizer of SQL and OQLdirectly to process the transferred query and effectively solve the problems existed in OQLtransformation currently used for semi-structured data: query efficiency, query’s expressing power andconstruction of query results. We also extend the Projection operator to package query result for theclosure feature of query.• “Semi-Structured Data's Object Schema and Its Extension”, Yitong Wang, Xuebiao Xu and BaileShi, Journal of Software, Vol. 6, pp339-344, 1999 (in Chinese)Abstract: Database schema is an important tool to know about the database structure, guild queryprocessing and query optimization. But for semi-structured data, it has no pre-defined, stable, separatedschema. In this paper, we present two ways to know about the structures of semi-structured database:schema guild and basic component guild. We also explore algorithms about generating and maintainingthem. Our algorithms make use of computed results, more efficient and than the ones currently used. Inorder to solve dynamic and sensitivity of schema guild, especially when there exist many objects’ sharingand cycles, the generation of schema guild will be very costly, we present a extended model calledβ-schema guild. β-Schema’s generation and maintenance will be more efficient and its structure is alsosimpler and stable. The expense of β-schema is accuracy, which we proved that it is acceptable andfeasible.• “The Design and Implementation of JB-OODB: an Object-Oriented Database Toolkit forJB/CASE”, Xuebiao Xu, Ning Gu, Yitong Wang and Baile Shi, Chinese Journal of AdvancedSoftware Research, Vol.6, No.3, pp218-235, 1999ABSTRACT In this paper, the authors present several aspects of design and implementation of JB-OODBT,an Objected-Oriented Database Toolkit for JB/CASE, which is an integrated CASE system underdevelopment in Peking University, P. R. China. Since most of data in JB/CASE7s repositories is stored in avariety of database systems, the aim of JB-OODBT is not only to provide the APIs for the databaseapplications in JB/CASE, but also to establish a uniform object paradigm and to provide a set of tools forefficient object manipulation/query above these underlying heterogeneous, autonomous, distributed datasources in a client/server environment. At first, architecture of JB-OODBT and kernel implementationmechanisms is introduced briefly. Then JBCOM, a common object model adopted in JB-OODBTcompatible with ODMG2.0’s Object Model and OMG CORBA’ s Data Model is presented, which providessome basic OO features as attribute, method, inheritance and reference to JB/CASE users. As to theinterface, the authors give a detailed description of three types of interface, namely JB Integrator,Embedded JB/C++ API and JBOQL, which provide CASE users with a set of tools and functionalities forevery phase of OODBMS application development. In addition, several kernel implementation techniquesof JB-OODBT, including schema translation, buffer management, version management and transactionmanagement are given in this paper.• “Semi-Structured Data Model and Object Schema”, Yitong Wang, Wei Wang and Baile Shi,Journal of Computer Science, pp 1-5, Oct 1998. Best Paper in 15 th National Database Conference (InChinese)Abstract: With the development of Internet and related techniques, more and more data are availableelectronically. These data are irregular and have not fixed schema in advance. They cannot adapt totraditional Database technology directly. In this paper, we explore data model and object schema of

semi-structured data. Its content mainly consists of definition and features of semi-structured data,modeling of it and more important, schema discovery about it to facilitate the query and other databaseservices. Finally, we conclude this paper with some suggestions for further work.• “Schema merging for the Multiple Data Sources Interoperation based on AttributeEquivalence”, Xuebiao Xu, Hongliang Liu, Yitong Wang, Ning Gu and Baile Shi, Journal ofComputer Research and Advancement, Vol.36, No. 7,1998, pp859-864. (In Chinese)Abstract: In multiple data sources interoperation systems, schema integration of heterogeneous datasources is a main way to provide a uniform interface between data query/manipulation and API(Application Program Interface) for users. However, it still lacks the relatively matured theory. We beginwith the smallest unit-attribute, define the concepts of attribute equivalence and class equivalence ondifferent classes bottom up. Then we make use of equivalence class and equivalence attribute to present thebasic ways of integration up down and moreover explore a new way to uniform object view schema ofheterogeneous data sources.• “The Application and Its Extension of the Theory of Rough Set in Data Mining”, Yitong Wang,Xuebiao Xu and Baile Shi, Computer Engineering, Vol. 23, pp. 69-74, 1997 (In Chinese)Abstract: Data Mining is a new technique that finds non-trivial knowledge in the Database. Rough Set is atool to handle uncertainty. We explore the application of Rough Set in data mining in two ways: First, whengenerating relatively deterministic rules, Rough Set theory offers us a tool to simplify the process of mining,improve the mining efficiency, and make the generated rules simpler and more concise. Second, ingenerating relatively non-deterministic rules, it will help us to create “non-deterministic information”. Itdepends on the discernibly function of the attributes. In order to improve insensitivity to noise, we give anextension model of Rough Set to make the algorithm more robust against existence of noise.• “The Research on the View in the OO System”, Xu Xuebiao, Wang Peiqiang , Yitong Wang andBaile Shi, Computer Engineering, Vol.23, pp. 63-68, 1997 (In Chinese)Abstract: In this paper, we give a detailed comparison of views in relational databases and OODB, andthen the role of view in both systems is presented. We also described requirement of view in OODB in moredetail in order to match the function model of views in OODB and its implementation, which is compatiblewith ODMG standard and will be accomplished in our OODB system.

Selected Publications

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?