Pair comparing method - VTI

vti.se
  • No tags were found...

Pair comparing method - VTI

Publisher:Publication:VTI rapport 495APublished:2004Project code:80355SE-581 95 Linköping SwedenAuthor:Mats WiklundProject:Vehicle Speed and Flow in DifferentRoadway ConditionsSponsor:Swedish National Road AdministrationTitle:Pair comparing methodAbstract (background, aims, methods, results) max 200 words:When studying how a response variable depends on an explanatory variable it is convenient to createpairs of matched observations, i.e. pairs consisting of two observations that are homogenous, exceptfor the studied explanatory variables and the response variable. So, the matching procedure is amethod with the purpose to control for different nuisance variables. Normally the analysis of suchstudies is straightforward, but it is possible to complicate it.Usually an expert that is not necessarily a statistician creates the matched pairs. In that case it isalso customary that this expert finds one pair member suitable in several pairs, which means troublefor the analysing statistician. The expert understanding for the statistical problems is however limitedin most cases.Therefore it has been necessary to develop a statistical method for analysing the results from astudy with matched pairs, where it is allowed for an observation unit to be member of several pairs.An obvious consequence of that is that results from pairs with one common pair member will bedependent. It is therefore necessary to formulate an analysing method that is applicable when thereare dependencies between different pairs.Another consequence from allowing the same object to occur in several pairs is that it may lead toa data set with redundancy. This would be the situation if objects a and b form one pair, a and canother and finally b and c a third pair. The third pair is just the difference between the two firsts.Then it is necessary to find a method that remove redundant pair like b and c as well.ISSN: Language: No. of pages:0347-6030 English 22


attributes it should be possible for us to overcome the worst of arrogant anthropocentricview points and survive more adaptively. After all, the biosphere's integrity is, in the finalanalysis, also the integrity of the human species as part of creation.Is It Competition Or Cooperation?'Man is by nature a social creature: an individual who is unsocial naturally and notaccidentally is either beneath our notice or more than human. Society is something innature that precedes the individual. Anyone who either cannot lead the common life or isso self-sufficient as not to need to and therefore does not partake of society is either abeast or a God'. (Aristotle, 328 B.C.E).In human history we have passed the stage of overcoming our natural environment; weno longer fear for food or shelter. As a consequence we can now pause and examineourselves a little more objectively and revalue the intrinsic quality of the ethicssupporting modern social/cultural existence. However, being preoccupied with extantethics is only useful if this preoccupation is also accompanied by increased intellectualmaturity and flexibility. We desperately need to show evidence of maturity as excellentlyexpressed by Darryl Macer's words 'A mature society is one which has developed someof the social and behavioral tools to balance bioethical principles, and apply them to newsituations raised by technology' (Macer, 1998; page 84). In substance, a mature society isone that makes use of both its emotional and intellectual intelligences. We have no needto remind ourselves that knowledge is power, and that the combination of knowledge andpower demands a mature duty of care. A society that is able to ethically balanceknowledge and power is courageous; a society that is prepared to adopt new ways ofbeing is mature. Scientists have to take care that their research advances do not gobeyond their reach; they must also ensure that the broader society is given the opportunityto weigh the benefits of their research against the risks by maintaining an open andhonest dialogue. As our scientific, medical and technological skills continue to advance,we as a society must develop the capacity for foresight, compassion and ethicalconsideration and, as described above, there is no better example to follow than thatgiven us by Nature herself.In order to address the pattern of human domination over Nature it may be useful toanalyze our psychosocial conditioning and critically examine the prototype mind-set thatsees everything in terms of competition. To demonstrate, I'll detail recent scientificbackflips in relation to the 'theory of sperm competition'. Sperm competition theory positsa postcopulatory perspective of sexual selection that occurs under polyandrous (femalepolygamy) conditions. Put simply the theory states that sperm wars are generated wheresperm from different individuals compete for the same egg, as do different ejaculates anddifferent sperm within the same ejaculate. In order to maximize their chance ofreproductive success at times of intense sperm competition, promiscuous males areexpected to invest more in ejaculate characteristics; such as, higher sperm production pergram of testicular tissue, bigger testes and larger ejaculate volumes. Fresh evidence,however, has revealed that reproductive success is attained by adaptively balancingcompetition and intra-ejaculate sperm cooperation. Thus; in situations of female


ForewardIn the project “Vehicle Speed and Flow in Different Roadway Conditions”drivers’ behaviour under prevailing roadway conditions were studied. The designfor gathering field data was chosen such that pairs of comparable observationswere established. In order to analyze the results of the study it was necessary todevelop a suitable statistical method. Mats Wiklund was asked to develop such amethod. This report contains the documentation of that statistical method.Linköping, August 2004Carl-Gustaf WallmanVTI rapport 495A


VTI rapport 495A


ContentsPageSummary 5Sammanfattning 71 Background and purpose 92 The model 103 Elimination of redundant observations 134 Fitting procedure 154.1 Creating a tree 154.2 Cholesky decomposition 164.3 Statistical analysis 195 Some final comments and suggestions of furtherdevelopment 206 Acknowledgements 217 References 22VTI rapport 495A


VTI rapport 495A


Pair comparing methodby Mats WiklundSwedish National Road and Transport Research Institute (VTI)SE-581 95 Linköping SwedenSummaryWhen studying how a response variable depends on an explanatory variable it isconvenient to create pairs of matched observations, i.e. pairs consisting of twoobservations that are homogenous, except for the studied explanatory variablesand the response variable. Sometimes it is customary to find one object suitable asa member of several different pairs. Therefore it has been necessary to develop astatistical method for analysing the results from a study with matched pairs, whereit is permitted that one observation unit is a member of several pairs. An obviousconsequence of that is that results from pairs with one common pair member maybe statistically dependent. Another is that an object that occurs in several pairsmay cause redundancy in data.Redundancy in the data set would be the situation if objects a and b form onepair, a and c another and finally b and c a third pair. The third pair is just the differencebetween the two firsts. Then it is necessary to find a method that removeredundant pair like b and c as well. In the statistical analysis the outcome of theresponse variable for the members within the same pair will be compared. In thiscase the difference within each pair will be determined. Then the difference forpair (a, b) and for pair (a, c) is statistically dependent.A method based graph theory in order to find and delete redundant pairs isdescribed. The pairs define nodes in a graph with edges between pairs that haveone common member. Redundancy in the graph is equivalent with the presence ofcycles in the graph. Cycles are identified by traversing the graph using a depthfirstsearch method.In the statistical analysis a linear model is fitted to the observed data by leastsquare method. However the least square expression involves a non-diagonalcovariance matrix, since there are dependencies between pairs. Homoscedasticityis assumed, i.e. the response variables have constant variance, say σ 2 . Thevariance for the difference of the response variables between the two members ofthe same pair is then 2σ 2 and the corresponding covariance between two pairs thathave one common member is either σ 2 or -σ 2 . It is then possible to find thecorrelation matrix for the pairs.A Cholesky decomposition of the correlation matrix is accomplished. Theinverse of resulting lower diagonal matrix L is the used for weighting theobservations. Then the statistical analysis is straightforward. Coefficients in thelinear model is estimated by ordinary least square method.VTI rapport 495A 5


6 VTI rapport 495A


Parjämförelsemetodenav Mats WiklundStatens väg- och transportforskningsinstitut (VTI)581 95 LinköpingSammanfattningVid studier av hur en beroende variabel beror förklarande variabler kan det varapassande att bilda matchade par av observationer, dvs. där parmedlemmarna ärhomogena förutom avseende den beroende variabeln och de förklarande variablersom studeras. Ibland är det önskvärt att låta ett observationsobjekt vara medlem iflera olika matchade par. Därför har det varit nödvändigt att utveckla en statistiskmetod för analys av studier med matchade par, där ett observationsobjekt tillåtsvara medlem i flera par. En uppenbar konsekvens av att tillåta att flera par har engemensam medlem är att resultaten från olika par kan vara statistiskt beroende. Enannan är att ett observationsobjekt som förekommer i flera olika par kan orsakaredundans hos datamängden.Redundans i datamängden finns om observationsobjekten a och b bildar ett par,a och c ett annat och b och c ett tredje. Det tredje paret är då differensen av de tvåförsta. Därför är det nödvändigt att utveckla en metod för att finna och exkluderaredundanta par som b och c. I den statistiska analysen jämförs utfallet av denberoende variabeln för ett pars båda medlemmar. I det här fallet bestämsdifferensen av den beroende variabeln. Då är differensen för par (a,b) och för par(a,c) statistiskt beroende.En metod som bygger på grafteori har utvecklats för att finna och exkluderaredundanta par har utvecklats. Paren definierar noder i en graf med länkar emellanom paren har en gemensam medlem. En redundant graf är ekvivalent med en grafsom innehåller cykler. Dessa cykler återfinns genom att söka genom grafen meden ”depth-first” metod.I den statistiska analysen anpassas en linjär regressionsmodell till observeradedata med minsta-kvadratmetoden. Dock innehåller minsta-kvadratuttrycket enicke-diagonal kovariansmatris, då en del par är beroende. Homoscedastisitet anta,dvs. att den beroende variabeln har konstant varians, säg σ 2 . Variansen fördifferensen mellan den beroende variabeln för ett pars båda medlemmar är då 2σ 2och motsvarande kovarians mellan två par som har en gemensam medlem ärantingen σ 2 eller -σ 2 . Det är då möjligt att bestämma korrelationsmatrisen förparen.En Cholesky-dekomposition av korrelationsmatrisen utförs. Iversen till denresulterande under-diagonalmatrisen L används för viktning av observationer. Dåkan den statistiska analysen genomföras standardmässigt. Den linjära modellenskoefficienter skattas med den vanliga minsta-kvadratmetoden.VTI rapport 495A 7


8 VTI rapport 495A


1 Background and purposeThe costs for winter maintenance of Swedish roads are very high. In order tomake the operations as effective as possible it is important to have a deep knowledgein how different winter road conditions affect the behaviour of the roadusers.For instance, how do the road users react when there is black ice on the road?The poorer road grip should mean that they reduce their speed. However it isinteresting to know if they reduce their speed to such extent that the accident riskis kept at the same level. In fact, it is possible that the speed reduction will be sohigh the accident risk is decreased. If that is the case then one might ask if it isreasonable to clear the ice, e.g. by spreading salt. Not clearing the ice will thenresult in higher costs for transports and travels, i.e. cancelled or delayed trips, butthere will be a benefit since the number of traffic accident will decrease. On theother hand clearing the ice by spreading salt might result in environmental damagedbut no delayed or cancelled trips. It is at least not obvious how to choose themost effective winter maintenance action.One should therefore draw the conclusion that it is necessary to know whichimpact different winter road conditions have on vehicle speed and flow.An ambitious study started late in the autumn of 1998. In a sample of roadsections the winter road surface condition was observed as frequent as once everyhour when necessary. For every hour vehicle flow and vehicles mean speed wasmeasured as well. Matched pairs of hours was then created in such way that itcould have been assumed that the mean speed and flow would be the same if thewinter road conditions would have been the same. E.g. Tuesday from 9 to 10 a.m.could be paired with Tuesday between 9 and 10 a.m. one week later.One purpose for this report is to develop a statistical method for analysingmatched pair studies, where different matched pairs might be statisticallydependent. Another purpose is to find a numerical algorithm that identifiesredundant pairs.VTI rapport 495A 9


2 The modelThe pairs are numerated by i = 1, 2, …, n. In order to keep track on which objectthat belongs to the different pairs an index variable, , is introduced such that itindicate the identity of member 1 in pair i andindicate the identity of pairmember 2. We don’t allow any object to form a pair with itself i.e. ui1≠ ui2.Further we only allow a pair once in the data set, so two pairs comprise at leastthree different individuals.Let be the value of the response variable for member 1 in pair i and forYi1Y i 2Y ilpair member 2. For instance,one hour. Now Y ilvariables such thatEu i2u i1, l = 1, 2, is the mean speed for vehicles duringis assumed to have linear regression on a set of explanatory[ Y ] = α + β x + K + β x = α + x′ βili1il 1pilpIn this case many of the explanatory variables are dummies for different types ofwinter road condition, where preferably dry bare ground is chosen as the referencelevel. The intercept α contains the effect of the matching variables and is thenithe same for both members in one pair. But the intercept αimay differ betweendifferent pair.Let Di= Yi1 − Yi2be the difference of the response variable for the two pairmembers. Then the expected value of the difference between the first and thesecond pair member isE[ D ] = β ( x − x ) + K + β ( x − x ) = ( x ′ − x′)βi1i11i21pii1pSo the differences have linear regression on the corresponding differences of theexplanatory variables. In fact it is handier to define new explanatory variables asthe corresponding differences between the original variables, i.e. lettingzijxi1 j− xi2j= .ili2pi1i2Then the model is(1) [ ] = β z + K + β z = βE D z′ .i1 i 1pipix iljNote that there is no constant or intercept in the linear model.Suppose for instance that is 1 if road surface condition j prevails for memberl in pair i and 0 else. In that casez ijis defined byzij⎧ 1⎪= ⎨−1⎪⎩ 0if road surface conditionif road surface conditionelsej prevails at uj prevails at ui1i2but not at ui2but not at ui1.10 VTI rapport 495A


Let the reference road surface condition be 1 and therefore not included in model(1). Then the interpretation of the parameter βjis the expected change of theresponse variable Y, e.g. speed, when the road surface condition changes from 1to j.The differences D , D ,K1 2are however not necessarily independent. Assuminghomoscedasticity, i.e.Var2[ Y ] = σijfor all i and l. Then the covariance for two pair differences iswhereCovCov( D , D ) = Cov( Y , Y ) − Cov( Y , Y ) − Cov( Y , Y ) + Cov( Y Y )i( Y , Y )iljkji1 j1i1j 2i2j1i2,2⎧ σ if i = j and k = l= ⎨⎩0elseThat render a correlation matrix defined byj 2(2) Corr( D , D )ij= cij⎧−0.5⎪0= ⎨⎪0.5⎪⎩1if either uif uikif i = j≠ ujlif either ui1i1= uj2or ui2= ufor k,l = 1,2= u or u = uj2i2j1j2The column vector containing the pair differences( D , D2,D ) ′=1KnD ,will then have the correlation matrix { } i , jC = and the covariance matrix2V = 2σ C . The regression coefficients β are estimated, see for instanceMcCullagh and Nelder (1989), by finding βˆ that minimises the square sum′−−1( D − Zβ) V ( D − Zβ) = ( D − Zβ) C ( D − Zβ)2where Z is the design matrixc ij1 1′σZ⎡z⎢⎢z=⎢ M⎢⎢⎣zn11211zzz1222Mn1LLOLz2p ⎤z⎥2 p⎥M ⎥⎥znp⎥⎦VTI rapport 495A 11


A Cholesky decomposition of the correlation matrix C can then be carried out, seefor instance Dahlquist and Björck (1974). Then C = LL', where L is an lowertriangular matrix whose inverse matrix is rather straightforward to determine. Thesquare sum above may then be rewritten as−1 −1′ −1−1(3) ( L D − L Zβ ) ( L D − L Zβ )−1i.e. after weighting the observations by L , the studied variable as well as theexplanatory variables, we are faced with a standard linear problem, which can beanalysed by standard statistical software packages. The minimum is find by solvingthe normal equations− ′′1 −1−1−1(4) ( L Z ) L D = ( L Z ) L Zβand all statistical inference follow standard procedures. However, that exercisepresupposes that redundant observations are deleted from the data set.12 VTI rapport 495A


3 Elimination of redundant observationsAs mentioned above if objects a and b are considered as a matching pair as wellas objects a and c, then b and c can’t be included as a matching pair, since theobserved result of that pair can be fully determined by the two others. To understandthis letu11= au12= bu21= au22= cu31= bu32= cThat impliesY =Y =11Y 2112Y 31Y =22Y 32and then it follows thatD( Y11− Y12) = D2−13= Y31− Y32= Y12− Y22= Y11− Y22−DIn all three pairs are included then the covariance matrix C as well as the Hessian1 −1matrix Z ′ Z will be singular. If Z ′ Z is singular then ( L Z ) L Z in (4) above alsois singular. Then the set of equations in (4) will have infinitely many solutions.So whenever the data set include pairs that are fully determined by the othersone needs to exclude such pairs. We need a method to find such pairs. Oneapproach is to use graph theory.The pairs can be represented in an undirected graph G = (V, E), where V arenodes that represents the objects or individuals and E are edges that represents thepair, i.e. if u,v ∈Vforms a matched pair then there is an edge between them andthey are called neighbours. Heggernes and Matstoms (1998) give a concise introductionto graph theory.A path between node u and v is a sequence of nodes [ u, w1 , w2, K , wk , v]thatare connected by edges ( u, w1 ),( w1, w2),K ,( wk , v). If there’s a path from u to vthen v is reachable from u and vice versa in an undirected graph.A graph is connected if there is a path between every pair of nodes in thegraph. If U ⊆ V and F ⊆ E then H = (U, F) is a subgraph of G = (V, E). If G isn’tconnected then it can be partitioned in a unique way into connected subgraphscalled connected components of G.When the edges represent pairs it is possible to determine any pair differencewithin a path and consequently there will be two ways to do that if there are todifferent paths connecting two nodes or objects.− ′VTI rapport 495A 13


If there are two non-identical paths between two nodes then there is cycle in thegraph. A graph without cycles is called a tree. It can therefore be concluded thatthe graph in this case must be a tree in order to ensure that the covariance matrixC is non-singular.It is necessary to find an algorithm that searches through a graph and identifiesredundant edges. Then unnecessary edges can be removed so that there will be nocycles although pairs of nodes that where reachable in the original graph will stillbe reachable.14 VTI rapport 495A


4 Fitting procedureEach pair is represented by a row in the data matrix. Two columns contain theidentifications of the pair members, while the others contains the differences inthe response variable, D , and the differences of explanatory variables, z , , z .ii1 K4.1 Creating a treeIn the first part redundant pairs are excluded. In order to do that the two columnscontaining the identifications of the pair members are extracted. A symmetricmatrix, G, is created, where the number of rows (and columns) equals thenumbers of unique pair members, i.e. each individual corresponds to one row andone column. If the individual i and j belongs to the same pair, then elements i, jand j, i of G are 1 else 0. Since the members in a pair must not be the sameindividual G have zeros as diagonal elements.Now edges in G should be eliminated so that the resulting graph T will be atree and have the same nodes as G and so that any pair of nodes in G that aremutually reachable will be so in T as well. To this end the graph G is searched ortraversed, i.e. each node is visited. In this case an algorithm called depth-firstsearch is used, which is described by Heggernes and Matstoms (1998). In depthfirstsearch the graph is searched recursively.Assume that the search currently has reached node u. Then u is marked as“visited”. If there are neighbours of u that aren’t marked “visited”, then one ofthose, say v, is chosen arbitrary. The edge between u and v is included in T and therecursive procedure starts all over at the node v. If v has no neighbours that areunmarked then the recursion steps back to the node u. This continues until no unmarkedneighbours are left. Then a connected component in T is completed. Ifthere are remaining unmarked nodes left in G, then one of those is chosen arbitraryand a new connected component in T is created. The procedure continuesuntil all nodes in G are marked. The new graph T is a tree since only one visit ineach node is permitted. It is also easy to check that connected nodes in G are connectedin T as well. The idea of depth-firsts search is described in figure 1.ip1243Figure 1 The depth-first search procedure has traversed along the path“1”-“2”-“3”-“4”. Since the neighbour “2” of “4” has already been marked as“visited” the edge between them will not be included in the tree, T. Since thereare no more neighbours of “4” to investigate the procedure steps back to “3”.Since each pair is represented by 1 in the graph G it is straightforward to identifythe pairs that are not represented in T and eliminate the corresponding rows in thedata matrix.VTI rapport 495A 15


Table 1 shows the first 45 pairs from a data set that contains 7,233 pairs beforeeliminating redundant pairs. Only 11 of the first 45 pairs remain after eliminatingredundant pair, se table 2. As a matter of fact only 801 of totally 7,233 pairs in thedata set remained after the elimination. It can be seen that in both tables time isused as identification variable. The explanatory variables are different winter roadsurface conditions described in table 3. There are more road surface variables, butfor the 45 first pairs they all have the outcome zero, i.e. those road surfaceconditions didn’t occur among the first 45 pairs.4.2 Cholesky decompositionIn the second part the pairs that remain in T determine the correlation matrix C by(2). Pairs that belong to different connected components of T will be statisticallyindependent while there will be dependence between pairs within the samecomponent. The matrix C will then consist of sub matrixes on the diagonal eachcorresponding to a connected component of T. Assume, for instance, that Tconsists of p connected components and that connected component k,k = 1, 2, …, p, consists of m k pairs. Then the correlation matrix can be expressedas⎡C1⎢⎢0C =⎢ M⎢⎢⎣00C2M0LLOL0 ⎤0⎥⎥M ⎥⎥⎥⎦C pwhere C k is a correlation matrix corresponding to connected component k of T. C kis a m k × m k symmetric matrix. A Cholesky decomposition of each sub matrix C k isaccomplished, using standard routine in the Matlab software system, resulting inlower diagonal matrixes L k .The Cholesky decomposition of C is a lower diagonal matrix L such thatC = LL’ and where⎡L1⎢⎢0L =⎢ M⎢⎢⎣00L2M0LLOL0 ⎤0⎥⎥M ⎥⎥⎥⎦L p16 VTI rapport 495A


Table 1 The 45 first pairs from a data set, where redundant pairs haven’t beeneliminated or observations haven’t been weighted. ID1 and ID2 are pairmembers’ identification, SPEED is the studied variable, D, and the other columnsare explanatory variables referring to different winter road conditions.123456789101112131415161718192021222324252627282930313233343536373839404142434445ID1 ID2 DB WB PS R(B, MISC) SPEED16-NOV-1999 00:00 23-NOV-1999 23:00 1 0 0 -1 -10.3016-NOV-1999 00:00 24-NOV-1999 00:00 1 0 0 -1 9.4016-NOV-1999 00:00 24-NOV-1999 01:00 1 0 0 -1 11.4016-NOV-1999 00:00 24-NOV-1999 23:00 1 -1 0 0 7.3016-NOV-1999 00:00 25-NOV-1999 00:00 1 -1 0 0 -11.3016-NOV-1999 00:00 25-NOV-1999 01:00 1 -1 0 0 -6.6016-NOV-1999 01:00 24-NOV-1999 00:00 1 0 0 -1 11.2016-NOV-1999 01:00 24-NOV-1999 01:00 1 0 0 -1 13.2016-NOV-1999 01:00 24-NOV-1999 02:00 1 0 0 -1 8.5016-NOV-1999 01:00 25-NOV-1999 00:00 1 -1 0 0 -9.5016-NOV-1999 01:00 25-NOV-1999 01:00 1 -1 0 0 -4.8016-NOV-1999 04:00 24-NOV-1999 03:00 1 0 0 -1 1.1016-NOV-1999 04:00 24-NOV-1999 04:00 1 0 0 -1 13.3016-NOV-1999 04:00 24-NOV-1999 05:00 1 0 0 -1 3.4016-NOV-1999 04:00 25-NOV-1999 03:00 1 -1 0 0 -15.5016-NOV-1999 04:00 25-NOV-1999 04:00 1 -1 0 0 1.3016-NOV-1999 04:00 25-NOV-1999 05:00 1 -1 0 0 3.2016-NOV-1999 05:00 18-NOV-1999 06:00 1 0 -1 0 13.8016-NOV-1999 05:00 24-NOV-1999 04:00 1 0 0 -1 18.5016-NOV-1999 05:00 24-NOV-1999 05:00 1 0 0 -1 8.6016-NOV-1999 05:00 24-NOV-1999 06:00 1 0 0 -1 7.8016-NOV-1999 05:00 25-NOV-1999 04:00 1 -1 0 0 6.5016-NOV-1999 05:00 25-NOV-1999 05:00 1 -1 0 0 8.4016-NOV-1999 05:00 25-NOV-1999 06:00 1 -1 0 0 1.8016-NOV-1999 05:00 29-NOV-1999 06:00 1 0 -1 0 14.7016-NOV-1999 06:00 18-NOV-1999 06:00 1 0 -1 0 8.6016-NOV-1999 06:00 18-NOV-1999 07:00 1 0 -1 0 7.2016-NOV-1999 06:00 22-NOV-1999 07:00 1 0 0 -1 10.1016-NOV-1999 06:00 23-NOV-1999 07:00 1 0 0 -1 6.5016-NOV-1999 06:00 24-NOV-1999 05:00 1 0 0 -1 3.4016-NOV-1999 06:00 24-NOV-1999 06:00 1 0 0 -1 2.6016-NOV-1999 06:00 24-NOV-1999 07:00 1 0 0 -1 8.9016-NOV-1999 06:00 25-NOV-1999 05:00 1 -1 0 0 3.2016-NOV-1999 06:00 25-NOV-1999 06:00 1 -1 0 0 -3.4016-NOV-1999 06:00 25-NOV-1999 07:00 1 -1 0 0 .7016-NOV-1999 06:00 29-NOV-1999 06:00 1 0 -1 0 9.5016-NOV-1999 06:00 29-NOV-1999 07:00 1 0 -1 0 23.8016-NOV-1999 06:00 30-NOV-1999 07:00 1 0 -1 0 13.8016-NOV-1999 07:00 18-NOV-1999 06:00 1 0 -1 0 10.5016-NOV-1999 07:00 18-NOV-1999 07:00 1 0 -1 0 9.1016-NOV-1999 07:00 18-NOV-1999 08:00 1 0 -1 0 2.2016-NOV-1999 07:00 22-NOV-1999 07:00 1 0 0 -1 12.0016-NOV-1999 07:00 22-NOV-1999 08:00 1 0 0 -1 3.7016-NOV-1999 07:00 23-NOV-1999 07:00 1 0 0 -1 8.4016-NOV-1999 07:00 23-NOV-1999 08:00 1 0 0 -1 6.90VTI rapport 495A 17


Table 2 The remaining pairs from table 1, after eliminating redundant pairs andweighting the observations as well. ID1 and ID2 are pair members’ identification,SPEED is the studied variable, D, and the other columns are explanatoryvariables referring to different winter road conditions.1791213181927283941ID1 ID2 DB WB PS R(B, MISC) SPEED16-NOV-1999 00:00 23-NOV-1999 23:00 1.00 .00 .00 -1.00 -10.3016-NOV-1999 01:00 24-NOV-1999 00:00 1.00 .00 .00 -1.00 11.2016-NOV-1999 01:00 24-NOV-1999 02:00 .58 .00 .00 -.58 3.3516-NOV-1999 04:00 24-NOV-1999 03:00 1.00 .00 .00 -1.00 1.1016-NOV-1999 04:00 24-NOV-1999 04:00 .58 .00 .00 -.58 14.7216-NOV-1999 05:00 18-NOV-1999 06:00 1.00 .00 -1.00 .00 13.8016-NOV-1999 05:00 24-NOV-1999 04:00 .26 .00 .77 -1.03 4.8016-NOV-1999 06:00 18-NOV-1999 07:00 1.00 .00 -1.00 .00 7.2016-NOV-1999 06:00 22-NOV-1999 07:00 .58 .00 .58 -1.15 7.5116-NOV-1999 07:00 18-NOV-1999 06:00 .77 .00 -.26 -.52 7.0516-NOV-1999 07:00 18-NOV-1999 08:00 .65 .00 -1.09 .44 -3.08Table 3 Description of the variables displayed in tables 1, 2 and 4. All variablesexcept speed are indicator or dummy variables.AbbreviationDBMBWBBIPSTILSSLR(B, LS/SL)R(B, MISC)R(BI, PS/TI)R(BI, LS/SL)R(BI, MISC)SPEEDTranslationDry bare roadwayMoist bare roadwayWet bare roadwayBlack iceHard-packed snowThick iceLoose snowSlushRuts with bare surface in the ruts and hard-packed snow/thick iceoutside the rutsRuts with bare surface in the ruts and miscellaneous layers outside therutsRuts with black ice in the ruts and hard-packed snow/thick ice outsidethe rutsRuts with black ice in the ruts and loose snow/slush outside the rutsRuts with black ice in the ruts and miscellaneous layers outside the rutsMean speed for passing passenger carsIn order to accomplish the weighting in (3) it then remains to find the inversematrix of L. This is a bit simplified by the fact thatL−1−⎡L1⎢⎢ 0=⎢ M⎢⎢⎣010L−12M0LLOL0 ⎤⎥0 ⎥ .M ⎥−1⎥⎥⎦L pThen D and Z are weighted by multiplying the inverse of L to them.18 VTI rapport 495A


When comparing the dependent and explanatory variables in table 1 and 2 oneimmediately finds that they differ as a consequence of the weighting procedure,see pair number 9, 13, 19, 28, 39 and 41. After the weighting is accomplished thedependent and explanatory variables are given new values although the signremains the same.4.3 Statistical analysisThe weighted variables, L −1 D and L − 1 Z , can be transferred to any standard statisticalsoftware environment, e.g. SPSS, and analysed there. It be should outpointed that it follows from (1) that no constant or intercept may be included inthe fitted model.In table 4 shows an example of the output from the statistical analysis. Theresult indicate for instance that the mean speed will decrease with 5.1 km/h onthick ice compared to dry bare roadway.Table 4 Result of fitting the total data set from tables 1and 2 to model (1). Notethat there is no intercept. The interpretation if the coefficients is the expecteddifference in mean speed on the corresponding road surface condition and thecondition DB.Model1MBWBBIPSTILSSLR(B, LS/SL)R(B, MISC)R(BI, PS/TI)R(BI, LS/SL)R(BI, MISC)a. Dependent Variable: SPEEDCoefficients a,bUnstandardizedCoefficientsb. Linear Regression through the OriginStandardizedCoefficientsB Std. Error Beta t Sig.-4.82E-02 1.250 -.002 -.039 .9692.460 1.248 .082 1.972 .049-4.888 .943 -.344 -5.185 .000-10.074 .996 -.563 -10.110 .000-5.092 1.114 -.289 -4.572 .000-13.626 1.714 -.275 -7.949 .000-9.779 1.915 -.179 -5.106 .0004.917 4.940 .030 .995 .320-2.352 1.102 -.101 -2.135 .033-6.001 1.140 -.310 -5.262 .000-5.592 1.485 -.153 -3.766 .000-5.359 1.614 -.125 -3.321 .001VTI rapport 495A 19


5 Some final comments and suggestions offurther developmentThe described method is very flexible for analysis of matched pair designs. Thereason is that the researcher conducting the study does not need to consider thestatistical problems. However, in this case when the matching procedure followsstrict and well defined rules, there might be other and perhaps less complicatedapproaches. Nevertheless, they should all end up with the same results andconclusions.The homoscedasticity assumption, i.e. constant variance, can be doubted in thiscase. Since the assumption implies that the variance in mean speed does notdepend on the number of passing vehicles. The variance should be less for meansbased on more vehicles. However, the speed for two vehicles should not beconsidered as independent random variables, either, so the variance can not beinversely proportional to the number of passing vehicles. More research on thismatter is needed.20 VTI rapport 495A


6 AcknowledgementsI want to thank the Swedish National Road Administration (SNRA) who fundedthe project Winter Maintenance Management Model.I would also like to thank Pontus Matstoms, VTI, who proposed graph theoryand depth-first search method when I in despair first consulted him on theproblem with redundancy in the data set. Pontus also set me in contact withPinar Heggernes. Pinar advised me on how to use the depth-first search methodand also adapted the Matlab code that she has developed for my purposes.VTI rapport 495A 21


7 ReferencesDahlquist, G. and Björck, Å. (1974) Numerical methods. Prentice-Hall,Englewood Cliffs, N.J.Heggernes, P. and Matstoms, P. (1998) Partitioning a set of functions intocorrelated subsets. VTI rapport 416A, Swedish National Road and TransportResearch Institut. Linköping.McCullagh, P. and Nelder, J. A. (1989) Generalized linear models. (2nd ed.)Chapman and Hall, London.22 VTI rapport 495A

More magazines by this user
Similar magazines