Learning to Rank for Information Retrieval

Learning to Rank 

for Information Retrieval 

Tie-Yan Liu 

Lead Researcher 

Microsoft Research Asia 

4/23/2008 Tie-Yan Liu @ Renmin University 

1

Outline 

• Ranking in Information Retrieval 

• Learning to Rank 

– Introduction to Learning to Rank 

– Pointwise approach 

– Pairwise approach 

– Listwise approach 

• Summary 


3

Introduction to Ranking in IR 

4/23/2008 Tie-Yan Liu @ Renmin University 4

Inside Search Engine 

Query 

User Interface 

Online Part 

Caching 

Relevance Ranking 

Inverted 

Index 

Index Builder 

Page Ranks 

Link Analysis 

Cached 

Pages 

Web Page Parser 

Pages 

Links & 

Anchors 

Link Map 

Web Graph Builder 

Web Graph 

Page & Site 

Statistics 

Crawler 

Offline Part 


Ranking in Information Retrieval 

Document index 

D 

d i 

Query 

q {www2008} 

Ranking 

model 

d 

d 

q 

k 

q 

2 

 

Ranking of documents 

d 

q 

1 

www2008.org 

www2008.org/submissions/index. html 

 

 

www.iw3c2.org/blog/w elcome/www2008 


6

Widely-used Judgments 

• Binary judgment 

– Relevant vs. Irrelevant 

• Multi-level ratings 

– Perfect > Excellent > Good > Fair > Bad 

• Pairwise preferences 

– Document A is more relevant than document B with 

respect to query q 


7

Evaluation Measures 

• MRR (Mean Reciprocal Rank) 

q i 

– For query , rank position of the first relevant document: 

– MRR: average of 1/ over all queries 

• WTA (Winners Take All) 

r i 

– If top ranked document is relevant: 1; otherwise 0 

– average over all queries 

• MAP (Mean Average Precision) 

• NDCG (Normalized Discounted Cumulative Gain) 

r i 


8

MAP 

• Precision at position n 

P@ 

n 

• Average precision 

AP 

 

n 

#{relevant documents in top n results} 

n 

P@ 

n 

I{document 

n is 

#{relevant documents} 

AP = 

relevant} 

1 

 

 

1 

0.76 

• MAP: averaged over all queries in the test set 

1 

3 

2 

3 

 

3 

 

5 

 


9

NDCG 

• NDCG at position n: 

N( 

n) 

Z 

n 

n 

 

j1 

(2 

r( 

j) 

1) / log(1 j) 

• NDCG averaged for all queries in test set 


More about Evaluation Measures 

• Query-level 

– Averaged over all queries. 

– The measure is bounded, and every query contributes 

equally to the averaged measure. 

• Position 

– Position discount is used here and there. 

• Rating 

– Different ratings may correspond to different gains. 


Learning to Rank for IR 

Overview 

Pointwise Approach 

Pairwise Approach 

Listwise Approach 


Conventional IR Models 

Category 

Example Models 

Boolean model 

Similarity-based 

models 

Vector space model 

Latent Semantic Indexing 

… 

BM25 model 

Probabilistic models 

Hyperlink-based 

models 

Language model for IR 

… 

HITS 

PageRank 

…… 


13

Problems with Conventional Ranking 

• For a particular model 

Models 

– Parameter tuning is difficult, especially when there are 

many parameters to tune. 

• For comparison between two models 

– Given a test set, when one model is over-tuned (overfitting) 

while the other is not, how can we compare them? 

• For a collection of models 

– There are hundreds of models proposed in the literature. It 

is non-trivial how to combine them. 


Machine Learning Can Help 

• Machine learning is an effective tool 

– To automatically tune parameters 

– To combine multiple evidences 

– To avoid over-fitting (regularization framework, 

structure risk minimization, etc.) 


documents 

Framework of Learning to Rank for IR 

 

 

 

 

 

 

 

queries 

d 

d 

q 

(1) 

(1) 

1 

(1) 

2 

(1) 

d n 

,4 

 

,2 

 

 

 

 

,1 

(1) 

 

 

 

 

 

 

 

 

d 

d 

d 

Training Data 

q 

q 

( d1 

,?),( d2 

,?) 

......,( 

d n 

Test data 

,?) 

( m) 

( m) 

1 

( m) 

2 

 

( m) 

( m ) 

n 

,5 

 

,3 

 

 

,2 

 

Labels: multiple-level 

ratings for example 

Learning 

System 


System 


 

 

 

 

 

 

 

d 

d 

d 

i1 

i2 

in 

f 

, 

, 

, 

Model 

( q, 

d, 

w) 

f ( q, 

d 

f ( q, 

d 

 

f ( q, 

d 

i1 

i2 

in 

, w) 

 

 

, w) 

 

 

 

, w) 

 

 

16

Connection with Conventional 

Machine Learning 

• With certain assumptions, learning to rank can be 

transformed (reduced) to conventional machine 

learning problems 

– Regression (assume labels to be scores) 

– Classification (assume labels to have no orders) 

– Pairwise classification (assume document pairs to be 

independent of each other) 


17

Difference from Conventional 

Machine Learning 

• Unique properties of ranking in IR 

– Role of query 

• Query determines the logical structure of the ranking 

problem. 

• Loss function should be defined on ranked lists w.r.t. a 

query, since IR measures are computed at query level. 

– Relative order is important 

• No need to predict category, or value of f(x) 

– Position 

• Top-ranked objects are more important 


18

Categorization of Learning to Rank Methods 

Pointwise 

Approach 



Reduced to 

classification 

or regression 

Discriminative 

model for IR (SIGIR 

2004) 

McRank (NIPS 

2007) 

… 

Ranking SVM (ICANN 1999) 

RankBoost (JMLR 2003) 

LDM (SIGIR 2005) 

RankNet (ICML 2005) 

Frank (SIGIR 2007) 

GBRank (SIGIR 2007) 

QBRank (NIPS 2007) 

MPRank (ICML 2007)… 

New problem IRSVM (SIGIR 2006) 

MHR (SIGIR 2007) 

… 

AdaRank (SIGIR 2007) 

SVM-MAP (SIGIR 2007) 

SoftRank (LR4IR 2007) 

GPRank (LR4IR 2007) 

CCA (SIGIR 2007) 

LambdaRank (NIPS 2006) 

RankCosine (IP&M 2007) 

ListNet (ICML 2007) 

ListMLE (ICML 2008)… 


19


Overview 





Overview of Pointwise Approach 

• Reduce ranking to regression or classification 

on single documents. 

• Examples: 

– Discriminative model for IR. 

– McRank: learning to rank using multi-class 

classification and gradient boosting. 


Discriminative Model for IR 

(R. Nallapati, SIGIR 2004) 

• Features extracted for each document 

• Treat relevant documents as positive examples, 

irrelevant documents as negative examples. 

• Use SVM or MaxEnt to perform discriminative 

learning. 


McRank 

(P. Li, et al. NIPS 2007) 

• Loss in terms of DCG is upper bounded (although 

lossly) by multi-class classification error 

• Use gradient boosting to perform multi-class 

classification 

– K trees per boosting iteration 

– Tree outputs converted to probabilities using logistic 

function 

– Convert classification to ranking: 


Discussions 

• Assumption 

– Relevance is absolute and query-independent. 

• However, in practice, relevance is query-dependent 

– Relevance labels are assigned by comparing the 

documents associated with the same query. 

– An irrelevant document for a popular query might have 

higher term frequency than a relevant document for a rare 

query. 



Overview 





Overview of Pairwise Approach 

• No longer assume absolute relevance. 

• Reduce ranking to classification on document pairs 

(i.e. pairwise classification) w.r.t. the same query. 

• Examples 

– RankNet, Frank, RankBoost, Ranking SVM, etc. 

q 

( i) 

( i) 

d 

1 

,5 

 

( i) 

d ,3 

2 

 

 

( i) 

 

( i ) 

d 

,2 

n 

Transform 

( i) 

( i) 

 

( i) 

( i) 

 

( i) 

( i) 

d , d , d , d 

 

( i ) ,..., d , d ( i ) 

1 

2 

1 

q 

( i) 

5 > 3 5 > 2 3 > 2 

n 

2 

n 


26

• Target posteriors: 

• Modeled posteriors: 

• Define 

RankNet 

(C. Burges, et al. ICML 2005) 

• New loss function: cross entropy (CE loss) 

• Model output probabilities using logistic function: 


27

RankNet (2) 

• Use Neural Network as model, and gradient 

descent as algorithm, to optimize the crossentropy 

loss 

• Evaluate on single documents: output a 

relevance score for each document w.r.t. a 

new query 


RankBoost 

(Y. Freund, et al. JMLR 2003) 

• Given: initial distribution D 1 over 

• For t=1,…, T: 

 

 

– Train weak learning using distribution D t 

– Get weak ranking h t : X R 

– Choose t 

R 

Dt ( x0, x1)exp – Update: 

t( ht ( x0) ht 

( x1)) 

 

Dt 

1( x0, x1) 

 

Zt 

where 

 

Z D ( x , x )exp ( h ( x ) h ( x )) 

t t 0 1 t t 0 t 1 

x0, 

x1 

T 

• Output the final ranking: 

 

H ( x) tht( x) 

Use AdaBoost to perform pairwise classification 

t1 

 


29

Ranking SVM 

(R. Herbrich, et al. ICANN 1999) 

min 

z 

w, 

 

i 

 

i 

1 

2 

w, 

x 

0 

|| w || 

(1) 

i 

 

2 

x 

C 

(2) 

i 

f ( x; 

wˆ) 

wˆ 

, x 

i 

1 

i 

x (1) -x (2) as instance of learning 

Use SVM to perform binary 

classification on these instances, 

to learn model parameter w 

Use w for testing 

Use SVM to perform pairwise classification 


30

Other Work in this Category 

• Frank: A Fidelity Loss for Information Retrieval, (M. Tsai, T. Liu, 

et al. SIGIR 2007) 

• Linear discriminant model for information retrieval (LDM), (J. 

Gao, H. Qi, et al. SIGIR 2005) 

• A Regression Framework for Learning Ranking Functions using 

Relative Relevance Judgments (GBRank), (Z. Zheng, H. Zha, et 

al. SIGIR 2007) 

• A General Boosting Method and its Application to Learning 

Ranking Functions for Web Search (QBRank), (Z. Zheng, H. 

Zha, et al. NIPS 2007) 

• Magnitude-preserving Ranking Algorithms (MPRank), (C. 

Cortes, M. Mohri, et al. ICML 2007) 


Discussions 

• Progress made as compared to pointwise 

approach 

– No longer assume absolute relevance 

• However, 

– Unique properties of ranking in IR have not been 

fully modeled. 


Problems with Pairwise Approach (1) 

• Number of instance pairs varies according to query 



• Negative effects of making errors on top 

positions 

d: definitely relevant, p: partially relevant, n: not relevant 

ranking 1: p d p n n n n one wrong pair Worse 

ranking 2: d p n p n n n one wrong pair Better 



• Different rank pairs correspond to different ranking 

criteria 

– Perfect: 

• Uniqueness, 

• domain knowledge 

• Authority 

– Excellent / Good 

– Bad 

• Content match 

• Detrimental 


As a Result, … 

• Users only see ranked list of 

documents, but not document 

pairs 

• It is not clear how pairwise 

loss correlates with IR 

evaluation measures, which 

are defined on query level. 

Pairwise loss vs. (1-NDCG@5) 

TREC Dataset 


Incremental Solution to Problems (1)(2) 

IRSVM: Change Loss of Ranking SVM 

(Y. Cao, J. Xu, T. Liu, et al. SIGIR 2006) 

Query-level normalizer 

Implicit Position Discount 

Larger weight for more critical pairs 


37

Incremental Solution to Problems (3) 

Multi-Hyperplane Ranker 

(T. Qin, T. Liu, et al. SIGIR 2007) 

• Suppose there are k-level ratings 

– m=k*(k-1)/2 different types of document pairs: 

• (rank 1, rank 2), (rank 1, rank 3), (rank 2, rank 3), … 

• Divide-and-conquer approach 

– Train m hyperplanes using Ranking SVM, with each 

hyperplane only focusing on one type of document pairs. 

– Combine ranking results given by these hyperplanes to 

compose final ranked list, by means of rank aggregation. 


38

More Fundamental Solution 

Listwise Approach to Ranking 

• Directly operate on ranked lists 

– Directly optimize IR evaluation measures 

• AdaRank, 

• SVM-MAP, 

• SoftRank, 

• RankGP, … 

– Define listwise loss functions 

• RankCosine, 

• ListNet, 

• ListMLE, … 



Overview 





Overview of Listwise Approach 

• Instead of reducing ranking to regression and 

classification, perform learning directly on document 

list. 

– Treats ranked lists as learning instances. 

• Two major categories 

– Directly optimize IR evaluation measures 

– Define listwise loss functions 


Directly Optimize IR Evaluation Measures 

• It is a natural idea to directly optimize what is used to 

evaluate the ranking results. 

• However, it is non-trivial. 

– Evaluation measures such as NDCG are usually nonsmooth 

and non-differentiable. 

– It is challenging to optimize such objective functions, since 

most of the optimization techniques were developed to 

work with smooth and differentiable cases. 


42

Tackle the Challenges 

• Use optimization technologies designed for smooth and 

differentiable problems 

– Fit the optimization of evaluation measure into the logic of 

existing optimization technique (AdaRank) 

– Optimize a smooth and differentiable upper bound of 

evaluation measure (SVM-MAP) 

– Soften (approximate) the evaluation measure so as to make it 

smooth and differentiable, and easy to optimize (SoftRank) 

• Use optimization technologies designed for non-smooth 

and non-differentiable problems 

– Genetic programming (RankGP) 


43

• Use Boosting to 

perform the 

optimization. 

• Embed IR evaluation 

measure in the 

distribution update 

of Boosting. 

AdaRank 

(J. Xu, et al. SIGIR 2007) 


Theoretical Analysis 

• Training error will be continuously reduced during 

learning phase when

Practical Solution 

• In practice, we cannot 

guarantee

SVM-MAP 

(Y. Yue, et al. SIGIR 2007) 

• Use SVM framework for optimization 

• Incorporate IR evaluation measures in the listwise constraints. 

• Sum of slacks ∑ξ upper bounds 1-E(y,y’). 

min 

1 

2 

w 

2 

 

C 

N 

 

y' 

y : F( 

q, 

d, 

y) 

F( 

q, 

d, 

y') 

(1 E( 

y', 

y)) 

 

i 

 

i 


Challenges in SVM-MAP 

• For Average Precision, the true labeling is a ranking where 

the relevant documents are all ranked in the front, e.g., 

• An incorrect labeling would be any other ranking, e.g., 

• Exponential number of rankings, thus an exponential 

number of constraints! 


48

Structural SVM Training 

• STEP 1: Perform optimization on only current working set of 

constraints. 

• STEP 2: Use the model learned in STEP 1 to find the most violated 

constraint from the exponential set of constraints. 

– This step can also be very fast since MAP is invariant on the order of 

documents within a relevance class 

• STEP 3: If the constraint returned in STEP 2 is more violated than 

the most violated constraint in the working set, add it to the 

working set. 

STEP 1-3 is guaranteed to loop for at most a polynomial number of 

iterations. [Tsochantaridis et al. 2005] 


49

SoftRank 

(M. Taylor, et al. LR4IR 2007) 

• Add additive noise to the ranking result, and thus 

make NDCG “soft” enough for optimization. 

Define the probability of a document being ranked at a particular position 


From Scores to Rank Distribution 

• Considering the score for each document s j as 

random variable 

• Probability of doc i is ranked before doc j 

• Expected rank 

• Rank Distribution p j (r) 


51

Soft NDCG 

• From hard to soft 

• Neural networks for optimization 


RankGP 

(J. Yeh, et al. LR4IR 2007) 

• Define ranking function as a tree 

• Use genetic programming to learn the optimal tree 

• GP is widely used to optimize non-smoothing nondifferentiable 

objective. 

• Crossover, mutation, reproduction, tournament selection. 

• Fitness function = IR evaluation measure. 


53

Other Work in this Category 

• Learning to Rank with Nonsmooth Cost Functions , 

(C.J.C. Burges, R. Ragno, et al. NIPS 2006 ) 

• A combined component approach for finding 

collection-adapted ranking functions based on 

genetic programming (CCA), (H. Almeida, M. 

Goncalves, et al. SIGIR 2007). 


Discussions 

• Progress has been made as compared to pointwise and 

pairwise approaches 

• Still many unsolved problems 

– Is the upper bound in SVM-MAP tight enough? 

– Is SoftNDCG highly correlated with NDCG? 

– Is there any theoretical justification on the correlation 

between the implicit loss function used in LambdaRank 

and NDCG? 

– Multiple measures are not necessarily consistent with each 

other: which to optimize? 


Define Listwise Loss Function 

• Indirectly optimizing IR evaluation measures 

– Consider properties of ranking for IR. 

• Examples 

– RankCosine: use cosine similarity between score vectors as 

loss function 

– ListNet: use KL divergence between permutation 

probability distributions as loss function 

– ListMLE: use negative log likelihood of the ground truth 

permutation as loss function 

• Theoretical analysis on listwise loss functions 


RankCosine 

(T. Qin, T. Liu, et al. IP&M 2007) 

• Loss function define as 

• Properties 

– Insensitive to number of documents / document 

pairs per query 

– Emphasize correct ranking on by setting high 

scores for top in ground truth. 

– Bounded: 


Optimization 

• Additive model as ranking function 

• Loss over queries 

• Gradient descent for optimization 


58

More Investigation on Loss Functions 

• A Simple Example: 

– function f: f(A)=3, f(B)=0, f(C)=1 ACB 

– function h: h(A)=4, h(B)=6, h(C)=3 BAC 

– ground truth g: g(A)=6, g(B)=4, g(C)=3 ABC 

• Question: which function is closer to ground truth? 

– Based on pointwise similarity: sim(f,g) < sim(g,h). 

– Based on pairwise similarity: sim(f,g) = sim(g,h) 

– Based on cosine similarity between score vectors? 

f: g: h: 

sim(f,g)=0.85 

Closer! 

sim(g,h)=0.93 

• However, according to position discount, f should be closer to g! 


59

Permutation Probability Distribution 

• Question: 

– How to represent a ranked list? 

• Solution 

– Ranked list ↔ Permutation probability distribution 

– More informative representation for ranked list: 

permutation and ranked list has 1-1 correspondence. 


60

Luce Model: Defining Permutation Probability 

• Probability of permutation π is defined as 

Ps 

( ) 

• Example: 

P f 

 

ABC 

 

 

 

 

n 

( j) 

n 

1 ( 

s 

k 

j ( k ) 

 

j 

 

( 

s 

 

f (A) 

f (A) f (B) f (A) 

) 

) 

 

 

 

f (B) 

f (B) f (C) 

 

 

 

 

 

f 

f 

(C) 

(C) 

 

 

P(A ranked No.1) 

P(B ranked No.2 | A ranked No.1) 

= P(B ranked No.1)/(1- P(A ranked No.1)) 

P(C ranked No.3 | A ranked No.1, B ranked No.2) 


61

Properties of Luce Model 

• Nice properties 

– Continuous, differentiable, and convex w.r.t. score s. 

– Suppose g(A) = 6, g(B)=4, g(C)=3, P(ABC) is largest and 

P(CBA) is smallest; for the ranking based on f: ABC, swap 

any two, probability will decrease, e.g., P(ABC) > P(ACB) 

– Translation invariant, when 

( s) 

exp( s) 

– Scale invariant, when 

( s) 

 

as 


62

Distance between Ranked Lists 

φ = exp 

Using KL-divergence 

to measure difference 

between distributions 

dis(f,g) = 0.46 

dis(g,h) = 2.56 

4/23/2008 

Tie-Yan Liu @ Renmin University 

63

ListNet Algorithm 

(Z. Cao, T. Qin, T. Liu, et al. ICML 2007) 

• Loss function = KL-divergence between two 

permutation probability distributions (φ = exp) 

L( 

w) 

 

 

 

t 

 

• Model = Neural Network 

• Algorithm = Gradient Descent 

 

 

 

 

n 

exp( s 

y( 

j ) 

t 

) 

 

log 

 

 

 

t 

 

exp( w 

X 

n 

n 

qQ 

G( j ,..., jk 

) 1 exp( s 

 

ut 

y( 

j 

w X 

u )) 

1 exp( 

ut 

y( 

ju 

) 

n 

y( 

j ) 

1 ) 

t 

) 

 

 

 

 


64

Experimental Results 

More correlated! 

Pairwise (RankNet) 

Listwise (ListNet) 

Training Performance on TREC Dataset 


65

Limitations of ListNet 

• Too complex in practice. 

• Performance of ListNet depends on the 

mapping function from ground truth label to 

scores 

– Ground truth are permutations (ranked list) or 

group of permutations. 

– Ground truth labels are not naturally scores. 


Solution 

• Never assume ground truth to be scores. 

• Given the ranking model, compute the 

likelihood of a ground truth permutation, and 

maximize it to learn the model parameter. 


Likelihood Loss 

• Top-k Likelihood 

• Regularized Likelihood Loss ( ) 


ListMLE Algorithm 

(F. Xia, T. Liu, et al. ICML 2008) 

• Loss function = likelihood loss 

• Model = Neural Network 

• Algorithm = Stochastic Gradient Descent 

TD2004 Dataset 


Summary 

• Ranking is a new application, and also a new 

machine learning problem. 

• Learning to rank: from pointwise, pairwise to listwise 

approach. 

• Many open questions. 


poinwise pairwise listwise 

* Papers published by our group 

RankingSVM 

RankBoost 

ListNet ListMLE 

RankCosine 

AdaRank 

SVM-MAP 

SoftRank 

RankGP 

CCA 

LambdaRank 

SRA 

FRank 

RankNet MHR 

MPRank 

QBRank 

LDM IRSVM 

GBRank 

Relational 


Rank 

Refinement 

Discriminative 

IR model 

MCRank 

2000 2001 2002 2003 2004 2005 2006 2007 


References 

• R. Herbrich, T. Graepel, et al. Support Vector Learning for Ordinal Regression, ICANN1999. 

• T. Joachims, Optimizing Search Engines Using Clickthrough Data, KDD 2002. 

• Y. Freund, R. Iyer, et al. An Efficient Boosting Algorithm for Combining Preferences, JMLR 2003. 

• R. Nallapati, Discriminative model for information retrieval, SIGIR 2004. 

• J. Gao, H. Qi, et al. Linear discriminant model for information retrieval, SIGIR 2005. 

• C.J.C. Burges, T. Shaked, et al. Learning to Rank using Gradient Descent, ICML 2005. 

• I. Tsochantaridis, T. Joachims, et al. Large Margin Methods for Structured and Interdependent 

Output Variables, JMLR, 2005. 

• F. Radlinski, T. Joachims, Query Chains: Learning to Rank from Implicit Feedback, KDD 2005. 

• I. Matveeva, C.J.C. Burges, et al. High accuracy retrieval with multiple nested ranker, SIGIR 2006. 

• C.J.C. Burges, R. Ragno, et al. Learning to Rank with Nonsmooth Cost Functions , NIPS 2006 

• Y. Cao, J. Xu, et al. Adapting Ranking SVM to Information Retrieval, SIGIR 2006. 

• Z. Cao, T. Qin, et al. Learning to Rank: From Pairwise to Listwise Approach, ICML 2007. 


72

References (2) 

• T. Qin, T.-Y. Liu, et al, Multiple hyperplane Ranker, SIGIR 2007. 

• T.-Y. Liu, J. Xu, et al. LETOR: Benchmark dataset for research on learning to rank for information 

retrieval, LR4IR 2007. 

• M.-F. Tsai, T.-Y. Liu, et al. FRank: A Ranking Method with Fidelity Loss, SIGIR 2007. 

• Z. Zheng, H. Zha, et al. A General Boosting Method and its Application to Learning Ranking 

Functions for Web Search, NIPS 2007. 

• Z. Zheng, H. Zha, et al, A Regression Framework for Learning Ranking Functions Using Relative 

Relevance Judgment, SIGIR 2007. 

• M. Taylor, J. Guiver, et al. SoftRank: Optimising Non-Smooth Rank Metrics, LR4IR 2007. 

• J.-Y. Yeh, J.-Y. Lin, et al. Learning to Rank for Information Retrieval using Generic Programming, 

LR4IR 2007. 

• T. Qin, T.-Y. Liu, et al, Query-level Loss Function for Information Retrieval, Information Processing 

and Management, 2007. 

• X. Geng, T.-Y. Liu, et al, Feature Selection for Ranking, SIGIR 2007. 

• Y. Yue, T. Finley, et al. A Support Vector Method for Optimizing Average Precision, SIGIR 2007. 


73

References (3) 

• Y.-T. Liu, T.-Y. Liu, et al, Supervised Rank Aggregation, WWW 2007. 

• J. Xu and H. Li, A Boosting Algorithm for Information Retrieval, SIGIR 2007. 

• F. Radlinski, T. Joachims. Active Exploration for Learning Rankings from Clickthrough Data, KDD 2007. 

• P. Li, C. Burges, et al. McRank: Learning to Rank Using Classification and Gradient Boosting, NIPS 2007. 

• Z. Zheng, H. Zha, et al. A Regression Framework for Learning Ranking Functions using Relative Relevance 

Judgments, SIGIR 2007. 

• Z. Zheng, H. Zha, et al. A General Boosting Method and its Application to Learning Ranking Functions for 

Web Search, NIPS 2007. 

• C. Cortes, M. Mohri, et al. Magnitude-preserving Ranking Algorithms, ICML 2007. 

• H. Almeida, M. Goncalves, et al. A combined component approach for finding collection-adapted ranking 

functions based on genetic programming, SIGIR 2007. 

• T. Qin, T.-Y. Liu, et al, Learning to Rank Relational Objects and Its Application to Web Search, WWW 2008. 

• R. Jin, H. Valizadegan, et al. Ranking Refinement and Its Application to Information Retrieval, WWW 

2008. 

• F. Xia. T.-Y. Liu, et al. Listwise Approach to Learning to Rank – Theory and Algorithm, ICML 2008. 

• Y. Lan, T.-Y. Liu, et al. On Generalization Ability of Learning to Rank Algorithms, ICML 2008. 


Advertisement 

• SIGIR 2008 workshop on learning to rank 

– http://research.microsoft.com/users/LR4IR-2008/ 

– Submission due: May 16. 

• SIGIR 2008 tutorial on learning to rank 

– Include more new algorithms 

– Include more theoretical discussions 


Advertisement (2) 

• LETOR: Benchmark Dataset for Research on Learning to Rank 

– http://research.microsoft.com/users/LETOR/ 

– T. Liu, J. Xu, et al. LR4IR 2007 


tyliu@microsoft.com 

http://research.microsoft.com/users/tyliu/

Learning to Rank for Information Retrieval

Create successful ePaper yourself

Delete template?

Save as template?