7 IR models based on predicate logic

<strong>IR</strong> <strong>models</strong> <strong>based</strong> on predicate logic 1 

7 <strong>IR</strong> <strong>models</strong> <strong>based</strong> on predicate 

logic 

7.1 General considerations 

7.1.1 <strong>IR</strong> as inference 

q - query 

d – document 

retrieval: 

search for documents which imply the query: 

d → q 

example: 

d = {t 1 , t 2 , t 3 } 

q = {t 1 , t 3 } 

logical view: 

d = t 1 ∧ t 2 ∧ t 3 

q = t 1 ∧ t 3 

⇒: d → q 

Norbert Fuhr


advantage of inference-<strong>based</strong> approach: 

step from term-<strong>based</strong> to knowledge-<strong>based</strong> retrieval 

e.g. easy incorporation of additional knowledge 

example: 

d: ’squares’ 

q: ’rectangles’ 

thesaurus: ’squares’ → ’rectangles’ 

⇒: d → q 

Norbert Fuhr


7.1.2 <strong>IR</strong> as uncertain inference 

d: ’quadrangles’ 

q: ’rectangles’ 

⇒ uncertain knowledge required 

’quadrangles’ 0.3 

→ ’rectangles’ 

[Rijsbergen 86]: 

<strong>IR</strong> as uncertain inference 

Retrieval ˆ= 

estimate probability P (d → q) = P (q|d) 

q 

t 1 

t 4 

t 2 

t 5 

t 3 

t 6 

d 

Norbert Fuhr


7.1.3 Propositional vs. predicate logic 

spatio-temporal relationships: 

• document attributes 

query: documents published after 1990? 

?- pubyear(D,Y) & Y>1990 

• multimedia retrieval 

Norbert Fuhr 

conventional indexing (<strong>based</strong> on propositional logic): 

d = {tree, house} 

query: Is there a tree on the left of the house? 

⇒ query cannot be expressed in propositional logic 

predicate logic: 

d: tree(t1). house(h1). tree(t2). 

left(t1, h1). left(h1,t2). 

?- tree(X) & house(Y) & left(X,Y).


Ontologies 

Thesaurus 

polygon 

regular 

polygon 

triangle quadrangle ... 

rectangle 

regular 

triangle 

square 

thesaurus knowledge: 

can be expressed in propositional logic 

square = quadrangle ∧ regular-polygon 

description logics 

• instances of concepts 

• roles (relationships) between concepts/instances 

Norbert Fuhr


Advantages of predicate logic: 

• modelling of spatial and temporal relationships 

(e.g. for multimedia retrieval) 

• instances of concepts 

≈ combination of controlled vocabulary and free text search 

• roles/relationships between concepts or instances 

→ higher expressiveness for concept definition and description of document 

content 

Norbert Fuhr


7.2 RDF 

7.2.1 RDF: basic concepts 

Resource object on the WWW, e.g. Web page, database 

naming of resources: Uniform Resource Identifier (URI) 

Literal special type of resource, with string value, no explicit URI 

Property aspect / attribute / characteristics / relation 

Statement resource + named property + value of property 

(subject, predicate, object) 

visits 

Norbert Pisa 

Norbert Fuhr


RDF example 

organized−by 

ISSDL 

M.Agosti 

isPartOf 

Name Email 

"Maristella Agosti" "agosti@..." 

<strong>IR</strong>−Course 

title 

"Introduction to <strong>IR</strong>" 

teaches 

N.Fuhr 

Name 

"Norbert Fuhr" 

Email 

"fuhr@cs.uni−..." 

Norbert Fuhr


RDF schemas 

similar to semantic networks / description logics 

describes relationships between types of resources and/or properties 

• fundamental concepts 

– rdfs:Resource 

– rdf:Property 

– rdfs:Class 

• schema definition concepts 

– rdf:type 

– rdfs:subClassOf 

– rdfs:subPropertyOf 

– rdfs:seeAlso 

– rdfs:isDefinedBy 

Norbert Fuhr


RDFS example: resource hierarchy 

rdfs:Resource 

rdfs:Class 

xyz:MotorVehicle 

xyz:Van 

xyz:Truck 

xyz:PassengerVehicle 

xyz:MiniVan 

rdfs:subClassOf 

rdf:Type 

Norbert Fuhr


RDFS example: resource and property hierarchies 

rdf:Property 

rdfs:Class 

rdf:type 

rdf:type 

rdf:type 

visits 

Person 

visits 

Place 

rdfs:subPropertyOf 



tourist−visit business−visit ISSDL−Tutor 

business−visit 

Conf.−Loc. 

rdf:type rdf:type 

business−visit 

N. Fuhr Pisa 

Norbert Fuhr


Norbert Fuhr


RDF example: image description 

picture 

contains 

rdf:type 

parc 

above 

artifact 


sculpture 

right−of 

man woman 

swan cherub 

socle 

Norbert Fuhr


Retrieval with RDF 

?X 

artifact 

right−of 

?Y ?Z 

man woman 

Norbert Fuhr


Norbert Fuhr


Subsumption 

retrieval as inference (implication) in description logic: subsumption 

find implicit subclasses of query concept 

Subsumption in RDF: 

resource r2 has property rdfs:subClassOf r1 if 

1. r2 is subclass of all superclasses of r1 

2. each property of r1 subsumes the corresponding property of r2 

a property p2 is subsumed by a property p1 if 

1. the properties are equal, or the statement r2 rdfs:subPropertyOf r1 holds. 

2. the range of p2 is subsumed by the range of p1. 

Norbert Fuhr


7.3 Modelling <strong>IR</strong> in Datalog 

7.3.1 Introduction 

Datalog: 

• horn predicate logic 

(most <strong>IR</strong> <strong>models</strong> <strong>based</strong> on propositional logic) 

• no functions 

• restricted forms of negation allowed 

• sound and complete evaluation algorithms 

Norbert Fuhr


ground facts: 

docTerm(d1,ir). 

docTerm(d1,db). 

docTerm(d2,ir). 

docTerm(d2,oop). 

rules: 

irdoc(D) :- docTerm(D,ir). 

iranddb(D) :- docTerm(D,ir) & docTerm(D,db). 

irnotdb(D) :- docTerm(D,ir) & not(docTerm(D,db)). 

recursive rules: 

link(d1,d2). link(d2,d3). link(d3,d1). 

linked(X,Y) :- link(X,Y). 

linked(X,Y) :- linked(X,Z) & link(Z,Y). 

queries: 

?- docTerm(D,ir). 

?- docTerm(D,ir) & docTerm(D,db). 

?- docTerm(D,ir) \& not(docTerm(D,db)). 

Norbert Fuhr


7.3.2 Hypertext structure 

docTerm(d1,ir). docTerm(d1,db). 

link(d1,d2). link(d2,d3). link(d3,d1). 

about(D,T) :- docTerm(D,T). 

about(D,T) :- link(D,D1) & about(D1,T). 

d3 

docterm 

d1 

d2 

link 

ir 

db 

?- about(D,ir) 

Norbert Fuhr


7.3.3 Aggregation 

book 

chapter 

section 

part(D,P) :- chapter(D,P). 

part(D,P) :- section(D,P). 

retrieve node if at least one part is about the search 

term: 

about(D,T) :- part(D,P) & about(P,T). 

retrieve node if all its parts are about the search term: 

about(D,T) :- part(D,X) & about(X,T) & 

not(anypart(D,T)). 

anypart(D,T):- part(D,P)& not(about(P,T)). 

Norbert Fuhr


7.4 Probabilistic Datalog 

7.4.1 Motivation 

powerful retrieval logic 

• expressiveness of Datalog 

– predicate logic 

(spatio-temporal relationships, instances of 

concepts) 

– recursion 

(structured documents, hypertext links, terminological 

structures) 

• uncertain inference: 

probabilistic inference 

Norbert Fuhr


Syntax 

ground facts with probabilistic weights 

0.9 docTerm(d1,ir). 

0.5 docTerm(d1,db). 


0.3 docTerm(d2,oop). 

?- docTerm(D,ir). 

gives 

d1 0.9 

d2 0.8 


gives 

d1 0.45 

Norbert Fuhr


Example: Image retrieval 

<strong>IR</strong>IS (Univ. Bremen): automatic indexing of images with semantic concepts 

imgobj(O,I,N,X1,X2,Y1,Y2) 

O image object 

I image 

N name of semantic concept 

L,R,B,T bounding rectangle of image object 

query: images with water in front of stones 

?- imgobj(OA,I,water,L1,R1,B1,T1) , 

imgobj(OB,I,stone,L2,R2,B2,T2), 

B1


Norbert Fuhr


7.4.2 Semantics of probabilistic Datalog 

Extensional vs. intensional semantics 



0.7 link(d2,d1). 


about(D,T) :- link(D,D1) & about(D1,T) 

q(D) :- about(D,ir) & about(D,db). 

extensional semantics: 

weight of derived fact as function of weights of subgoals 

P (q(d2)) = P (about(d2,ir)) · P (about(d2,db)) = 

(0.7 · 0.9) · (0.7 · 0.5) 

Problem: 

“improper treatment of correlated sources of evidence” 

[Pearl] 

→ extensional semantics only correct for tree-like 

inference structures 

intensional semantics: 

weight of IDB fact as function of weights of underlying 

ground facts 

Norbert Fuhr


Implementation of intensional semantics 

event keys and event expressions 

0.9 docTerm(d1,ir). [dT(d1,ir)] 

0.5 docTerm(d1,db). [dT(d1,db)] 

0.8 docTerm(d2,ir). [dT(d2,ir)] 

0.3 docTerm(d2,oop). [dT(d2,oop)] 

0.7 link(d2,d1). [l(d2,d1)] 


gives 

d1 [dT(d1,ir) & dT(d1,db)] 0.9 · 0.5 = 0.45 


about(D,T) :- link(D,D1) & about(D1,T) 

?- about(D,ir) & about(D,db). 

gives 

d2 

[l(d2,d1) & dT(d1,ir) & l(d2,d1) & 

dT(d1,db)] 

0.7 · 0.9 · 0.5 = 0.315 

d1 [dT(d1,ir) & dT(d1,ir)] 0.7 · 0.5 = 0.35 

Norbert Fuhr



about(D,T) :- link(D,D1) & about(D1,T). 

d3 

0.8 

0.4 

docterm 

d1 

0.5 

d2 

link 

0.9 

0.5 

ir 

db 

?- about(D,ir) 

d1 [dT(d1,ir) | l(d1,d2) & l(d2,d3) & 

l(d3,d1) & 

dT(d1,ir) | ...] 0.900 

d3 [l(d3,d1) & dT(d1,ir)] 0.720 

d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir)] 0.288 

?- about(D,ir) & about(D,db) 

d1 [dT(d1,ir) & dT(d1,db)] 0.450 

d3 [l(d3,d1) & dT(d1,ir) & l(d3,d1) & 

dT(d1,db)] 0.360 

d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir) & 

dT(d1,db)] 0.144 

Norbert Fuhr


computation of probabilities for event expressions 

1. transformation of expression into disjunctive normal 

form 

2. application of sieve formula: 

c i – conjunct of event keys 

P (c 1 ∨ . . . ∨ c n ) = 

n∑ 

(−1) i−1 

i=1 

∑ 

1≤j 1


Interpretation of probabilistic weights 

possible worlds semantics 


P (W 1 ) = 0.9: {docTerm(d1,ir)} 

P (W 2 ) = 0.1: {} 

Norbert Fuhr




possible interpretations: 

I 1 : 

P (W 1 ) = 0.45: {docTerm(d1,ir)} 

P (W 2 ) = 0.45: {docTerm(d1,ir), 

docTerm(d1,db)} 

P (W 3 ) = 0.05: {docTerm(d1,db)} 

P (W 3 ) = 0.05: {} 

I 2 : 

P (W 1 ) = 0.5: {docTerm(d1,ir)} 

P (W 2 ) = 0.4: {docTerm(d1,ir), docTerm(d1,db)} 

P (W 3 ) = 0.1: {docTerm(d1,db)} 

I 3 : 

P (W 1 ) = 0.4: {docTerm(d1,ir)} 

P (W 2 ) = 0.5: {docTerm(d1,ir), docTerm(d1,db)} 

P (W 3 ) = 0.1: {} 

probabilistic logic: 

0.4 ≤ P (docTerm(d1, ir)&docTerm(d1, db)) ≤ 0.5 

probabilistic Datalog with independence assumptions: 

P (docTerm(d1, ir)&docTerm(d1, db)) = 0.45 

Norbert Fuhr


Disjoint events 

example: imprecise attribute values 

# py(dk,av). 

0.2 py(d3,89). 

0.7 py(d3,90). 

0.1 py(d3,91). 

interpretation: 

P (W 1 ) = 0.2: {py(d3,89)} 

P (W 2 ) = 0.7: {py(d3,90)} 

P (W 3 ) = 0.1: {py(d3,91)} 

?- py(X,Y) & Y > 89. 

d3 [p(d3,90) | p(d3,91)] 0.7 + 0.1 = 0.8 

Norbert Fuhr


Probabilistic search term weighting 

via disjoint events 

0.8 docTerm(d1,db). 0.7 docTerm(d1,ir). 

# qtw(av). 

0.4 qtw(db). 0.6 qtw(ir). 

s(D) :- qtw(X) & docTerm(D,X). 

?- s(D). 

0.4 qtw(db) 0.6 qtw(ir) 

0.7 docTerm(d1,ir) 

0.8 docTerm(d1,db) 

d1 [q(db) & dT(d1,db) | q(ir) & dT(d1,ir)] 

0.4 · 0.8 + 0.6 · 0.7 = 0.74 

Norbert Fuhr


Probabilistic rules 

rules for deterministic facts: 

0.7 likes-sports(X) :- man(X). 

0.4 likes-sports(X) :- woman(X). 

man(peter). 


P (W 1 ) = 0.7: {man(peter), 

likes-sports(peter)} 

P (W 2 ) = 0.3: {man(peter)} 

rules for uncertain facts: 

# sex(dk,av). 

0.7 l-s(X) :- sex(X,male). 

0.4 l-s(X) :- sex(X,female). 

0.5 sex(X,male) :- human(X). 

0.5 sex(X,female) :- human(X). 

human(peter). 


P (W 1 ) = 0.35: {sex(peter,male), l-s(peter)} 

P (W 2 ) = 0.15: {sex(peter,male)} 

P (W 3 ) = 0.20: {sex(peter,female), l-s(peter)} 

P (W 4 ) = 0.30: {sex(peter,female)} 

Norbert Fuhr


Vague predicates 

pc(m1,486/dx50,8,540,900). 

pc(m2,pe60,16,250,1000). 

pc(m3,pe90,16,540,1100). 

?- pc(MOD, CPU, MEM, DISK, PRICE), PRICE < 1000 

vague predicate ˆ< (builtin) 

1.00 ˆ


applications of vague predicates: 

• vague fact conditions 

• proper name search (string similarity) 

(also OCRed text) 

• multimedia <strong>IR</strong> 

(e.g. audio retrieval, image retrieval) 

Norbert Fuhr

7 IR models based on predicate logic

Create successful ePaper yourself

Delete template?

Save as template?