Selected Publications

More documents

Recommendations

Info

information interpretation for a specific topic. By filtering some irrelevant pages, the proposed approachclusters high quality pages in web search results into semantically meaningful groups with additionaltagging to facilitate usersaccessing and understanding. We especially study the effect of similaritythreshold and then give a heuristic yet effective way to set initial similarity threshold. Preliminaryexperiments and evaluations are conducted to investigate its effectiveness.• “Use Link-based Clustering to Improve Web Search Results”, Yitong Wang and MasaruKitsuregawa, Proceeding of 2 nd International conference on Web Information System Engineering(WISE’2001), Dec, Japan. pp. 115-124, IEEE Computer SocietyAbstract While web search engine could retrieve information on the Web for a specific topic, users have tostep a long ordered list in order to locate the needed information, which is often tedious and less efficient.In this paper, we propose a new link-based clustering approach to cluster search results returned from Websearch engine by exploring both co-citation and coupling. Unlike document clustering algorithms in IR thatare based on common words/phrases shared among documents, our approach is based on common linksshared by pages. We also extend standard clustering algorithm, K-means, to make it more natural to handlenoises and apply it to web search results. In particular, we study the effect of parameters introduced in thealgorithm: similarity threshold and merging threshold and then give our recommendation. By filtering someirrelevant pages, our approach clusters high quality pages in web search results into semanticallymeaningful groups. Preliminary experiments and evaluations are conducted to investigate its effectiveness.The experimental results show that link-based clustering of web search results is promising and beneficial.• “Link Based Clustering of Web Search Results”, Yitong Wang and Masaru Kitsuregawa inProceeding of 2 nd International Conference on Advances in Web-Age Information Management(WAIM’ 2001), China, pp.225-236, Springer (Lecture Notes in Computer Science), Vol.2118.Abstract: With information proliferation on the Web, how to obtain high-quality information from the Webhas been one of hot research topics in many fields like Database, IR as well as AI. Web search engine is themost commonly used tool for information retrieval; however, its current status is far from satisfaction. Inthis paper, we propose a new approach to cluster search results returned from Web search engine using linkanalysis. Unlike document clustering algorithms in IR that based on common words/phrases sharedbetween documents, our approach is base on common links shared by pages using co-citation and couplinganalysis. We also extend standard clustering algorithm K-means to make it more natural to handle noisesand apply it to web search results. By filtering some irrelevant pages, our approach clusters high qualitypages into groups to facilitate users’ accessing and browsing. Preliminary experiments and evaluations areconducted to investigate its effectiveness. The experiment results show that clustering on web search resultsvia link analysis is promising• “Clustering Web Search Results with Link Analysis”, Yitong Wang and Masaru Kitsuregawa, inthe Proceedings of 2001 Data Engineering Workshop (DEWS'2001), March 8-10, pp. C4-3, IzuAtagawa, JapanAbstract How to obtain high-quality information from the Web has been one of core research challenges inmany fields like Database, IR as well as AI. Web search engine is the most commonly used tool forinformation retrieval; however, its current status is far from satisfaction. In this paper, we propose a newapproach to cluster search results returned from Web search engine using link analysis. Our approach isbase on common links shared by pages using co-citation and coupling analysis. We also extend standardclustering algorithm K-means to make it more natural to handle noises and apply it to web search results.By filtering some irrelevant pages, our approach clusters high quality pages into groups to facilitate users’accessing and browsing. Preliminary experiments are conducted to investigate its effectiveness. Theexperiment results show that clustering on web search results via link analysis is potentially beneficial.
• “Query on Semi-Structured Data”, Yitong Wang, Jinhai Chen and Baile Shi, Journal of Software,vol. 6, pp358-362, June 1999 (in Chinese)Abstract: With the development of technique of Internet and Multimedia, more and more data areavailable electronically, e.g. SGML document, email etc. In this paper, we systematically exploresemi-structured data query including queries’ syntax and semantics. We first present some weaknessesof current techniques for querying on semi-structured data and propose a novel method called TOS(Target Object Set) transformation, which binds the common prefix of select clause and where clausewith one object variable. By doing so, we can use query processor and optimizer of SQL and OQLdirectly to process the transferred query and effectively solve the problems existed in OQLtransformation currently used for semi-structured data: query efficiency, query’s expressing power andconstruction of query results. We also extend the Projection operator to package query result for theclosure feature of query.• “Semi-Structured Data's Object Schema and Its Extension”, Yitong Wang, Xuebiao Xu and BaileShi, Journal of Software, Vol. 6, pp339-344, 1999 (in Chinese)Abstract: Database schema is an important tool to know about the database structure, guild queryprocessing and query optimization. But for semi-structured data, it has no pre-defined, stable, separatedschema. In this paper, we present two ways to know about the structures of semi-structured database:schema guild and basic component guild. We also explore algorithms about generating and maintainingthem. Our algorithms make use of computed results, more efficient and than the ones currently used. Inorder to solve dynamic and sensitivity of schema guild, especially when there exist many objects’ sharingand cycles, the generation of schema guild will be very costly, we present a extended model calledβ-schema guild. β-Schema’s generation and maintenance will be more efficient and its structure is alsosimpler and stable. The expense of β-schema is accuracy, which we proved that it is acceptable andfeasible.• “The Design and Implementation of JB-OODB: an Object-Oriented Database Toolkit forJB/CASE”, Xuebiao Xu, Ning Gu, Yitong Wang and Baile Shi, Chinese Journal of AdvancedSoftware Research, Vol.6, No.3, pp218-235, 1999ABSTRACT In this paper, the authors present several aspects of design and implementation of JB-OODBT,an Objected-Oriented Database Toolkit for JB/CASE, which is an integrated CASE system underdevelopment in Peking University, P. R. China. Since most of data in JB/CASE7s repositories is stored in avariety of database systems, the aim of JB-OODBT is not only to provide the APIs for the databaseapplications in JB/CASE, but also to establish a uniform object paradigm and to provide a set of tools forefficient object manipulation/query above these underlying heterogeneous, autonomous, distributed datasources in a client/server environment. At first, architecture of JB-OODBT and kernel implementationmechanisms is introduced briefly. Then JBCOM, a common object model adopted in JB-OODBTcompatible with ODMG2.0’s Object Model and OMG CORBA’ s Data Model is presented, which providessome basic OO features as attribute, method, inheritance and reference to JB/CASE users. As to theinterface, the authors give a detailed description of three types of interface, namely JB Integrator,Embedded JB/C++ API and JBOQL, which provide CASE users with a set of tools and functionalities forevery phase of OODBMS application development. In addition, several kernel implementation techniquesof JB-OODBT, including schema translation, buffer management, version management and transactionmanagement are given in this paper.• “Semi-Structured Data Model and Object Schema”, Yitong Wang, Wei Wang and Baile Shi,Journal of Computer Science, pp 1-5, Oct 1998. Best Paper in 15 th National Database Conference (InChinese)Abstract: With the development of Internet and related techniques, more and more data are availableelectronically. These data are irregular and have not fixed schema in advance. They cannot adapt totraditional Database technology directly. In this paper, we explore data model and object schema of
Page 1: SELECTED PUBLICAIONS (with abstract

Selected Publications

Create successful ePaper yourself

Delete template?

Save as template?