Semantic Matching - University of New Brunswick
Semantic Matching - University of New Brunswick
Semantic Matching - University of New Brunswick
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Semantic</strong> <strong>Matching</strong><br />
Dr. Virendra C. Bhavsar<br />
Pr<strong>of</strong>essor and Director, Advanced Computational Res. Lab.<br />
Faculty <strong>of</strong> Computer Science<br />
<strong>University</strong> <strong>of</strong> <strong>New</strong> <strong>Brunswick</strong> (UNB)<br />
Fredericton, Canada<br />
bhavsar@unb.ca<br />
Thanks:<br />
BCS Student: Marcel Ball<br />
MCS Students: Anurag Singh, Jin Jing, Sebastien Mathieu,, Jie Li<br />
PhD Student: Lu Yang<br />
Post-Doctoral Fellows: Dr. Biplab Sarker and Dr. Manish Joshi<br />
Collaborators: Dr. Riyanarto Sarno and Dr. Harold Boley<br />
June 14, 2010<br />
1
Virendra C. Bhavsar<br />
• UNB: since 1983; > 35 years <strong>of</strong> s<strong>of</strong>tware research<br />
development experience<br />
• Interests: real-time embedded systems, computer<br />
graphics, s<strong>of</strong>tware engineering, natural language<br />
processing, databases, bioinformatics, parallel computing,<br />
artificial intelligence, …<br />
• Bioinformatics - Canadian Potato Genomics Project<br />
• Atlantic Computational Excellence Network (ACEnet):<br />
~30 million Atlantic Canada project in high performance<br />
computing<br />
• <strong>Semantic</strong> <strong>Matching</strong><br />
2
Outline<br />
• Syntactic <strong>Matching</strong><br />
• <strong>Semantic</strong> <strong>Matching</strong><br />
• <strong>Semantic</strong> <strong>Matching</strong>: Taxonomy, Ontology and<br />
Partonomy<br />
• UNB <strong>Semantic</strong> <strong>Matching</strong> Engines – Applications<br />
• Conclusion<br />
3
Exact String <strong>Matching</strong><br />
• Binary result 0.0 or 1.0<br />
Permutation <strong>of</strong> strings<br />
Example 1<br />
Syntactic <strong>Matching</strong><br />
“Java Programming” versus “Programming in Java”<br />
Number <strong>of</strong> identical words<br />
Maximum length <strong>of</strong> the two strings<br />
For two node labels “a b c” and “a b d e”, their similarity is:<br />
2<br />
4<br />
= 0.5<br />
4
Example 2<br />
Node labels “electric chair” and “committee chair”<br />
1<br />
2<br />
= 0.5 meaningful?<br />
• Syntactic <strong>Matching</strong> does not consider additional<br />
domain knowledge<br />
Syntactic <strong>Matching</strong><br />
•<strong>Semantic</strong> matching techniques are needed for the<br />
above problems<br />
5
<strong>Semantic</strong> <strong>Matching</strong> Applications<br />
• <strong>Semantic</strong> searching, e.g. Google<br />
• e-Business<br />
• e-Learning<br />
• Matchmaking portals<br />
• Information Retrieval<br />
• Web Services<br />
• Information Integration<br />
• <strong>Semantic</strong> Web<br />
6
<strong>Semantic</strong> <strong>Matching</strong><br />
• Examples<br />
{Car : Truck} {Toyota Corrolla : Toyota Camry}<br />
{Car : Automobile} {Car : Apple}<br />
• <strong>Semantic</strong> Similarity versus <strong>Semantic</strong> Distance<br />
<strong>Matching</strong> <strong>of</strong>: words, short texts, documents,<br />
schemas/structures, pictures, videos<br />
• Taxonomy<br />
• Partonomy<br />
• Ontology<br />
7
Taxonomy<br />
• Practice and science <strong>of</strong> classification<br />
8
Ontology<br />
• Domain Ontology: Explicit formal<br />
specifications <strong>of</strong> the terms in a domain and<br />
relations among them<br />
Upper Ontology: Across domains<br />
9
Concept Similarity in a Taxonomy<br />
Taxonomy<br />
A B<br />
Given a taxonomy and two<br />
concepts (e.g., A and B),<br />
find the semantic similarity<br />
<strong>of</strong> the two concepts<br />
10
{Apple} 3.945<br />
Concept Similarity in a Taxonomy<br />
{Boxberry} 7.576<br />
{Produce, Green goods} 3.034<br />
{Fruit} 3.374<br />
{Berry} 4.907<br />
{Cranberry} 6.285<br />
{Banana} 5.267<br />
11
• More and more on-line transactions (e.g. e-Bay, Kijiji, etc.)<br />
• Buyers and sellers input key words and/or specify values<br />
for some product features<br />
• A list <strong>of</strong> recommended sellers (with product advertisements)<br />
and/or buyers (with product requests) is presented<br />
• Flat representation <strong>of</strong> products cannot represent the<br />
hierarchical „part-<strong>of</strong>‟ relationship <strong>of</strong> product parts<br />
• Match-making is not precise<br />
• Negotiation space is large<br />
Motivation<br />
12
User<br />
e-Business Applications<br />
Web<br />
Browser<br />
User Info<br />
User Pr<strong>of</strong>iles<br />
…<br />
User Agents<br />
…<br />
Matcher 1<br />
Main Server<br />
e-Market<br />
…<br />
• e-business, e-learning …<br />
• Buyer-Seller matching<br />
• Metadata for buyers and sellers<br />
• Keywords/keyphrases<br />
Agents<br />
…<br />
Matcher n<br />
To other sites<br />
(network)<br />
13
<strong>Semantic</strong> <strong>Matching</strong> ─ A Taxonomy Tree<br />
• The taxonomy tree <strong>of</strong> “Programming Techniques” according<br />
to the ACM Computing Classification System<br />
•Arc Weights<br />
Programming Techniques<br />
General<br />
0.6<br />
0.8 0.5 0.5<br />
0.7<br />
0.9<br />
Concurrent<br />
Object-Oriented<br />
Programming<br />
Applicative<br />
Programming<br />
Automatic 0.7<br />
Programming<br />
Programming<br />
0.5<br />
Sequential<br />
Programming<br />
Distributed<br />
Programming<br />
Parallel<br />
Programming<br />
14
Partonomy<br />
• Tree representation for product/service descriptions<br />
• Weights<br />
Black<br />
Color<br />
0.3<br />
Make<br />
Car<br />
Ford<br />
0.2<br />
Year<br />
0.5<br />
2002<br />
15
Similarity <strong>of</strong> Buyer and Sellers<br />
Black<br />
buyer seller1<br />
Car<br />
Color Year<br />
0.1<br />
Make<br />
0.1<br />
0.8<br />
Color<br />
0.2<br />
Red<br />
Ford<br />
seller2<br />
Car<br />
Make<br />
0.2<br />
Ford<br />
Year<br />
0.6<br />
2002<br />
2002<br />
0.925<br />
Red<br />
Car<br />
Color Year<br />
0.05<br />
Make<br />
0.05<br />
0.9<br />
0.85 0.65<br />
Color<br />
0.6<br />
Red<br />
Ford<br />
seller3<br />
Car<br />
Make<br />
0.1<br />
Ford<br />
2002<br />
Year<br />
0.3<br />
2002<br />
16
<strong>Semantic</strong> <strong>Matching</strong> ─ Local Similarity<br />
• Local similarity measures for leaf nodes<br />
• “Price” type<br />
• “Date” type<br />
• . . .<br />
17
<strong>Semantic</strong> <strong>Matching</strong> ─ Price <strong>Matching</strong> Algorithm<br />
• Pseudo code <strong>of</strong> the price-range similarity algorithm<br />
PriceRangeSim ([B pref, B max], [S min, S pref])<br />
Begin<br />
If S pref
UNB Similarity Engines -<br />
Implementation<br />
• Java Implementation<br />
• Testing on systematically varied cases<br />
19
End<br />
user<br />
user<br />
input<br />
(1)<br />
Partonomy Tree Similarity Engine ─<br />
eLearning Application<br />
• eduSource e-Learning project<br />
•Learning Object Metadata Generator: LOMGen<br />
UI<br />
(Java)<br />
Search<br />
Results<br />
WOO<br />
(2)<br />
RuleML file<br />
Recommended<br />
results (8)<br />
prefilter parameters<br />
(Query URI)<br />
Similarity<br />
Engine<br />
(Java)<br />
(3)<br />
WOO RuleML (6) (7)<br />
files<br />
Translator<br />
(XSLT)<br />
prefiltered<br />
CanCore files<br />
Keyword Table<br />
(4) (5)<br />
Prefilter<br />
(SQL)<br />
CanCore<br />
files<br />
CANLOM<br />
(XML)<br />
Administrator<br />
input<br />
Administrator<br />
partial<br />
CanCore<br />
files<br />
(c)<br />
LOMGen<br />
(Java)<br />
(b)<br />
DATABASE<br />
(Access)<br />
HTML<br />
files<br />
(a)<br />
LOR<br />
(HTML)<br />
20
Partonomy Tree Similarity Algorithm<br />
─ Similarity Algorithm<br />
• Partonomy similarity [Bhavsar et al. 2004]<br />
Fragments <strong>of</strong> learning object trees [Boley et al. 2005] for learning object<br />
matching (http://www.cs.unb.ca/agentmatcher)<br />
t lom<br />
t´<br />
educational<br />
general<br />
edu-set gen-set tec-set<br />
format<br />
language title<br />
platform<br />
0.5<br />
0.5<br />
Introduction<br />
to Oracle<br />
0.5<br />
technical<br />
0.3334 0.3333 0.3333<br />
en<br />
0.5<br />
HTML WinXP<br />
general<br />
lom<br />
gen-set tec-set<br />
language format<br />
title<br />
platform<br />
(si (wi + w'i )/2)<br />
A(si ) ≥ si (A(si )(wi + w'i )/2)<br />
en<br />
0.2<br />
0.3<br />
0.8<br />
Basic<br />
Oracle<br />
technical<br />
0.7<br />
0.1<br />
0.9<br />
* WinXP<br />
* : Don’t Care<br />
21
Partonomy Tree Similarity Engine<br />
─ Matchmaking Application<br />
• Teclantic protal http://www.teclantic.ca<br />
•ca)<br />
22
Current Work<br />
• Weighted Tree <strong>Semantic</strong> Tree Similarity Engines<br />
•<strong>Semantic</strong> searching<br />
• Weighted Graph Similarity Engines<br />
• Multi-core and cluster implementations<br />
• Matchmaking portals<br />
23
Conclusion<br />
• UNB Weighted Tree Similarity Engines<br />
• <strong>Semantic</strong> Global and Local <strong>Matching</strong><br />
• Applications: e-Learning, e-Business, Matchmaking<br />
portals, …<br />
• Looking for licensing and adapting the UNB technology to<br />
commercial partners<br />
24
5 Journal papers<br />
10 Conference papers<br />
1 Book Chapter<br />
4 MCS Theses<br />
1 PhD Thesis<br />
Publications<br />
25
Looking for a Post-doctoral Fellow<br />
to start working right now!<br />
Thank you !<br />
26