11.04.2013 Views

Semantic Matching - University of New Brunswick

Semantic Matching - University of New Brunswick

Semantic Matching - University of New Brunswick

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Semantic</strong> <strong>Matching</strong><br />

Dr. Virendra C. Bhavsar<br />

Pr<strong>of</strong>essor and Director, Advanced Computational Res. Lab.<br />

Faculty <strong>of</strong> Computer Science<br />

<strong>University</strong> <strong>of</strong> <strong>New</strong> <strong>Brunswick</strong> (UNB)<br />

Fredericton, Canada<br />

bhavsar@unb.ca<br />

Thanks:<br />

BCS Student: Marcel Ball<br />

MCS Students: Anurag Singh, Jin Jing, Sebastien Mathieu,, Jie Li<br />

PhD Student: Lu Yang<br />

Post-Doctoral Fellows: Dr. Biplab Sarker and Dr. Manish Joshi<br />

Collaborators: Dr. Riyanarto Sarno and Dr. Harold Boley<br />

June 14, 2010<br />

1


Virendra C. Bhavsar<br />

• UNB: since 1983; > 35 years <strong>of</strong> s<strong>of</strong>tware research<br />

development experience<br />

• Interests: real-time embedded systems, computer<br />

graphics, s<strong>of</strong>tware engineering, natural language<br />

processing, databases, bioinformatics, parallel computing,<br />

artificial intelligence, …<br />

• Bioinformatics - Canadian Potato Genomics Project<br />

• Atlantic Computational Excellence Network (ACEnet):<br />

~30 million Atlantic Canada project in high performance<br />

computing<br />

• <strong>Semantic</strong> <strong>Matching</strong><br />

2


Outline<br />

• Syntactic <strong>Matching</strong><br />

• <strong>Semantic</strong> <strong>Matching</strong><br />

• <strong>Semantic</strong> <strong>Matching</strong>: Taxonomy, Ontology and<br />

Partonomy<br />

• UNB <strong>Semantic</strong> <strong>Matching</strong> Engines – Applications<br />

• Conclusion<br />

3


Exact String <strong>Matching</strong><br />

• Binary result 0.0 or 1.0<br />

Permutation <strong>of</strong> strings<br />

Example 1<br />

Syntactic <strong>Matching</strong><br />

“Java Programming” versus “Programming in Java”<br />

Number <strong>of</strong> identical words<br />

Maximum length <strong>of</strong> the two strings<br />

For two node labels “a b c” and “a b d e”, their similarity is:<br />

2<br />

4<br />

= 0.5<br />

4


Example 2<br />

Node labels “electric chair” and “committee chair”<br />

1<br />

2<br />

= 0.5 meaningful?<br />

• Syntactic <strong>Matching</strong> does not consider additional<br />

domain knowledge<br />

Syntactic <strong>Matching</strong><br />

•<strong>Semantic</strong> matching techniques are needed for the<br />

above problems<br />

5


<strong>Semantic</strong> <strong>Matching</strong> Applications<br />

• <strong>Semantic</strong> searching, e.g. Google<br />

• e-Business<br />

• e-Learning<br />

• Matchmaking portals<br />

• Information Retrieval<br />

• Web Services<br />

• Information Integration<br />

• <strong>Semantic</strong> Web<br />

6


<strong>Semantic</strong> <strong>Matching</strong><br />

• Examples<br />

{Car : Truck} {Toyota Corrolla : Toyota Camry}<br />

{Car : Automobile} {Car : Apple}<br />

• <strong>Semantic</strong> Similarity versus <strong>Semantic</strong> Distance<br />

<strong>Matching</strong> <strong>of</strong>: words, short texts, documents,<br />

schemas/structures, pictures, videos<br />

• Taxonomy<br />

• Partonomy<br />

• Ontology<br />

7


Taxonomy<br />

• Practice and science <strong>of</strong> classification<br />

8


Ontology<br />

• Domain Ontology: Explicit formal<br />

specifications <strong>of</strong> the terms in a domain and<br />

relations among them<br />

Upper Ontology: Across domains<br />

9


Concept Similarity in a Taxonomy<br />

Taxonomy<br />

A B<br />

Given a taxonomy and two<br />

concepts (e.g., A and B),<br />

find the semantic similarity<br />

<strong>of</strong> the two concepts<br />

10


{Apple} 3.945<br />

Concept Similarity in a Taxonomy<br />

{Boxberry} 7.576<br />

{Produce, Green goods} 3.034<br />

{Fruit} 3.374<br />

{Berry} 4.907<br />

{Cranberry} 6.285<br />

{Banana} 5.267<br />

11


• More and more on-line transactions (e.g. e-Bay, Kijiji, etc.)<br />

• Buyers and sellers input key words and/or specify values<br />

for some product features<br />

• A list <strong>of</strong> recommended sellers (with product advertisements)<br />

and/or buyers (with product requests) is presented<br />

• Flat representation <strong>of</strong> products cannot represent the<br />

hierarchical „part-<strong>of</strong>‟ relationship <strong>of</strong> product parts<br />

• Match-making is not precise<br />

• Negotiation space is large<br />

Motivation<br />

12


User<br />

e-Business Applications<br />

Web<br />

Browser<br />

User Info<br />

User Pr<strong>of</strong>iles<br />

…<br />

User Agents<br />

…<br />

Matcher 1<br />

Main Server<br />

e-Market<br />

…<br />

• e-business, e-learning …<br />

• Buyer-Seller matching<br />

• Metadata for buyers and sellers<br />

• Keywords/keyphrases<br />

Agents<br />

…<br />

Matcher n<br />

To other sites<br />

(network)<br />

13


<strong>Semantic</strong> <strong>Matching</strong> ─ A Taxonomy Tree<br />

• The taxonomy tree <strong>of</strong> “Programming Techniques” according<br />

to the ACM Computing Classification System<br />

•Arc Weights<br />

Programming Techniques<br />

General<br />

0.6<br />

0.8 0.5 0.5<br />

0.7<br />

0.9<br />

Concurrent<br />

Object-Oriented<br />

Programming<br />

Applicative<br />

Programming<br />

Automatic 0.7<br />

Programming<br />

Programming<br />

0.5<br />

Sequential<br />

Programming<br />

Distributed<br />

Programming<br />

Parallel<br />

Programming<br />

14


Partonomy<br />

• Tree representation for product/service descriptions<br />

• Weights<br />

Black<br />

Color<br />

0.3<br />

Make<br />

Car<br />

Ford<br />

0.2<br />

Year<br />

0.5<br />

2002<br />

15


Similarity <strong>of</strong> Buyer and Sellers<br />

Black<br />

buyer seller1<br />

Car<br />

Color Year<br />

0.1<br />

Make<br />

0.1<br />

0.8<br />

Color<br />

0.2<br />

Red<br />

Ford<br />

seller2<br />

Car<br />

Make<br />

0.2<br />

Ford<br />

Year<br />

0.6<br />

2002<br />

2002<br />

0.925<br />

Red<br />

Car<br />

Color Year<br />

0.05<br />

Make<br />

0.05<br />

0.9<br />

0.85 0.65<br />

Color<br />

0.6<br />

Red<br />

Ford<br />

seller3<br />

Car<br />

Make<br />

0.1<br />

Ford<br />

2002<br />

Year<br />

0.3<br />

2002<br />

16


<strong>Semantic</strong> <strong>Matching</strong> ─ Local Similarity<br />

• Local similarity measures for leaf nodes<br />

• “Price” type<br />

• “Date” type<br />

• . . .<br />

17


<strong>Semantic</strong> <strong>Matching</strong> ─ Price <strong>Matching</strong> Algorithm<br />

• Pseudo code <strong>of</strong> the price-range similarity algorithm<br />

PriceRangeSim ([B pref, B max], [S min, S pref])<br />

Begin<br />

If S pref


UNB Similarity Engines -<br />

Implementation<br />

• Java Implementation<br />

• Testing on systematically varied cases<br />

19


End<br />

user<br />

user<br />

input<br />

(1)<br />

Partonomy Tree Similarity Engine ─<br />

eLearning Application<br />

• eduSource e-Learning project<br />

•Learning Object Metadata Generator: LOMGen<br />

UI<br />

(Java)<br />

Search<br />

Results<br />

WOO<br />

(2)<br />

RuleML file<br />

Recommended<br />

results (8)<br />

prefilter parameters<br />

(Query URI)<br />

Similarity<br />

Engine<br />

(Java)<br />

(3)<br />

WOO RuleML (6) (7)<br />

files<br />

Translator<br />

(XSLT)<br />

prefiltered<br />

CanCore files<br />

Keyword Table<br />

(4) (5)<br />

Prefilter<br />

(SQL)<br />

CanCore<br />

files<br />

CANLOM<br />

(XML)<br />

Administrator<br />

input<br />

Administrator<br />

partial<br />

CanCore<br />

files<br />

(c)<br />

LOMGen<br />

(Java)<br />

(b)<br />

DATABASE<br />

(Access)<br />

HTML<br />

files<br />

(a)<br />

LOR<br />

(HTML)<br />

20


Partonomy Tree Similarity Algorithm<br />

─ Similarity Algorithm<br />

• Partonomy similarity [Bhavsar et al. 2004]<br />

Fragments <strong>of</strong> learning object trees [Boley et al. 2005] for learning object<br />

matching (http://www.cs.unb.ca/agentmatcher)<br />

t lom<br />

t´<br />

educational<br />

general<br />

edu-set gen-set tec-set<br />

format<br />

language title<br />

platform<br />

0.5<br />

0.5<br />

Introduction<br />

to Oracle<br />

0.5<br />

technical<br />

0.3334 0.3333 0.3333<br />

en<br />

0.5<br />

HTML WinXP<br />

general<br />

lom<br />

gen-set tec-set<br />

language format<br />

title<br />

platform<br />

(si (wi + w'i )/2)<br />

A(si ) ≥ si (A(si )(wi + w'i )/2)<br />

en<br />

0.2<br />

0.3<br />

0.8<br />

Basic<br />

Oracle<br />

technical<br />

0.7<br />

0.1<br />

0.9<br />

* WinXP<br />

* : Don’t Care<br />

21


Partonomy Tree Similarity Engine<br />

─ Matchmaking Application<br />

• Teclantic protal http://www.teclantic.ca<br />

•ca)<br />

22


Current Work<br />

• Weighted Tree <strong>Semantic</strong> Tree Similarity Engines<br />

•<strong>Semantic</strong> searching<br />

• Weighted Graph Similarity Engines<br />

• Multi-core and cluster implementations<br />

• Matchmaking portals<br />

23


Conclusion<br />

• UNB Weighted Tree Similarity Engines<br />

• <strong>Semantic</strong> Global and Local <strong>Matching</strong><br />

• Applications: e-Learning, e-Business, Matchmaking<br />

portals, …<br />

• Looking for licensing and adapting the UNB technology to<br />

commercial partners<br />

24


5 Journal papers<br />

10 Conference papers<br />

1 Book Chapter<br />

4 MCS Theses<br />

1 PhD Thesis<br />

Publications<br />

25


Looking for a Post-doctoral Fellow<br />

to start working right now!<br />

Thank you !<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!