13.07.2015 Views

Zoran Pasaric: Polychoric correlation coefficient in forecast ... - FMI

Zoran Pasaric: Polychoric correlation coefficient in forecast ... - FMI

Zoran Pasaric: Polychoric correlation coefficient in forecast ... - FMI

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Polychoric</strong> <strong>correlation</strong> <strong>coefficient</strong> <strong>in</strong><strong>forecast</strong> verification based onKxK cont<strong>in</strong>gency tables<strong>Zoran</strong> Pasarić and Josip JurasGeophysical Institute, Faculty of ScienceUniversity of Zagreb, CROATIAFourth International Verification Methods WorkshopHels<strong>in</strong>ki, 8 -10 June 2009


Outl<strong>in</strong>e The bivariate normal distribution (BND) and KxK table Example:- PCC for 11x11 tables of temperature change <strong>forecast</strong>s- Additional <strong>in</strong>formation: Biases and base rates- Reconstruction- Residual Summary of PCC More examples- QPF for the United States (6x6 tables)


CC = 0.9Bivariate normal distribution (BND)and K x K cont<strong>in</strong>gency table


Bivariate normal distribution (BND)and K x K cont<strong>in</strong>gency tableCC = 0.9 From CC towards the table


Bivariate normal distribution (BND)and K x K cont<strong>in</strong>gency tableCC = 0.9 From CC towards the table• From table towards the CC(ML method)• <strong>Polychoric</strong> Corelation Coefficient,PCC (Ritchie-Scott, 1918,Pearson,1922)


ExampleBrooks & Doswell (W&F,1996): Four 11 x 11 tables of temperature changesObservationCOLDER N.C. WARMERForecastWARMER N.C. COLDERNWSFOLFM-MOSNGM-MOSCONNWSFOCONForecast<strong>in</strong>gsystemForecast<strong>in</strong>gsystemLFM-MOSNGM-MOSCC(from B&D)0.910.870.880.90TCC, 5x5(ML method)0.9030.8480.8560.884TCC(ML method)0.9020.8560.8720.896PCC, 3x3(ML method)0.8830.8450.8250.862


ExampleBrooks & Doswell (W&F,1996): Four 11 x 11 tables of temperature changesObservationCOLDER N.C. WARMERForecastWARMER N.C. COLDERNWSFOLFM-MOSNGM-MOSCONNWSFOCONForecast<strong>in</strong>gsystemForecast<strong>in</strong>gsystemLFM-MOSNGM-MOSCC(from B&D)0.910.870.880.90TCC, 5x5(ML method)0.9030.8480.8560.884TCC(ML method)0.9020.8560.8720.896PCC, 3x3(ML method)0.8830.8450.8250.862


Differences: NWSFO - CONCOLDER N.C. WARMERWARMER N.C. COLDERTCC: 0.896 → 0.902


TCC measuresthe association, onlyAdditional <strong>in</strong>formation:Biases and marg<strong>in</strong>al frequencies of observations


ReconstructionForecastThe KxK cont<strong>in</strong>gency table:C 1C 2… … …C KP K1P K2P O,1P O,2ObservationC 1P 11P 12… P 1KP F,1C 2…P 21P 22…………C KP 2KP F,2… …P KKP F,KP O,K• Consider the table obta<strong>in</strong>ed by partition<strong>in</strong>ga normalized BND accord<strong>in</strong>g to some thresholds• From CC and marg<strong>in</strong>al frequencies it ispossible to reconstruct the whole table!Bias = (P F,1/P 1., …, P F,K-1/P O,K-1),P O= (P O,1, …, P O,K-1)K x K table (TCC, Bias, P OBS)+ residualK 2 1 + (K-1) + (K-1) + 1Total no. of elements


The residuals: OverallResidual table = Orig<strong>in</strong>al m<strong>in</strong>us theoretical (BND) tableNWSFONWSFOSums of absolute differences [%]LFMMOSNGMMOSCONNWSFO11 x 11 20.3 20.2 17.4 21.25 x 5 15.0 13.5 10.1 14.23 x 3 8.6 5.5 4.5 10.8


The residuals, cont.CON: TCC=0.896,resid=17.4%,N=590COLDER N.C. WARMERNWSFO: TCC=0.902,resid=21.2%COLDER N.C. WARMERWARMER N.C. COLDERWARMER N.C. COLDERPCC: 0.896 → 0.902Correction of 2 three-class errors improves the association ascorrection of 20, or so, one-class errors


Question• Sampl<strong>in</strong>g variability due to <strong>in</strong>sufficient sample size?or• Real features of the prognostic system ? Measure oriented Distribution oriented}Complementary approaches


Summary of PCC• Partition of <strong>in</strong>formationK x K table (PCC, Bias, P OBS) + residual• Reduction <strong>in</strong> dimensionalityK 2 2*K• The PCC, Biases and P OBSare <strong>in</strong>dependent of each other• Us<strong>in</strong>g them, the table could be essentially reconstructed• The distribution oriented approach could be applied to (usuallysmall) residual


Monte Carlo, cc=PCCMore examplesQPF, USA CONUShttp://www.hpc.ncep.noaa.gov/npvu/qpfv/PCC = 0.883


QPF, USA CONUS 2008Biases, frequencies of observations, residualPCC = 0.883sum(abs(differences)) = 1.15 %< 0.01 0.01 –0.10.1 –0.250.25 –0.50.5 – 1 1


Seasonal variation Slowly but constantly <strong>in</strong>creas<strong>in</strong>g trend Year to year variationsQPF, USA CONUS monthlyTime evolution of PCC for 6x6 tables


QPF, USA monthly tables, 2005-2009PCC-s and residualsAll 12 RFCs, togetherPCC Residual sum [%]


QPF, USA CONUS 2008Various scores for 2x2 tables fromDependence on the base rateORSSEDSTCCHSSPSSETS


GE 0.01 <strong>in</strong>chUSA CONUS 2001-2008Trends of various scoresGE 1 <strong>in</strong>chThe TCC approximations (Pearson,1900)


Many details not mentioned here, and especially so for the TCC, could be f<strong>in</strong>d <strong>in</strong>:J. Juras and Z. Pasarić (2006):Application of tetrachoric and polychoric <strong>correlation</strong><strong>coefficient</strong>s to <strong>forecast</strong> verification. Geofizika, 23, 59-82.(http://geofizika-journal.gfz.hr)Thanks for your attention!!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!