13.07.2015 Views

SSO 2006 Symposium April 20, 2006 New Frontiers in Statistics Pre ...

SSO 2006 Symposium April 20, 2006 New Frontiers in Statistics Pre ...

SSO 2006 Symposium April 20, 2006 New Frontiers in Statistics Pre ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

STATISTICAL SOCIETY OF OTTAWASOCIETÉ STATISTIQUE D'OTTAWACHAPTER - ASA SECTION - ASAREGIONAL A<strong>SSO</strong>CIATION - SSCA<strong>SSO</strong>CIATION RÉGIONALE - SSC<strong>Pre</strong>sident/PrésidentTimothy Ramsaytramsay@ohri.caVice-<strong>Pre</strong>sident/Vice-présidentJohn Nash(613) 562-5800x4796(work/travail)email: nashjc@uottawa.ca<strong>Pre</strong>sident-Elect/Président-désignéPast-<strong>Pre</strong>sident/Président-sortantEric Rancourt(613) 951-5046 (work/travail)(613) 951-1462 (FAX/télécopieur)email: Eric.Rancourt@statcan.caProgram Coord<strong>in</strong>ator/Coordonnateur deprogrammeDena Schanzer(613) 946-0461 (work/travail)(613) 954-5414 (FAX/télécopieur)email : Dena_Schanzer@phac-aspc.gc.caTreasurer/TrésorierManchun Fang(613) 737-7600 x4107 (work/travail)(613) 738-4800 (FAX/télécopieur)email: mfang@cheo.on.caSecretary/SecrétaireCynthia Bocci(613) 951-4885 (work/travail)(613) 951-1462 (FAX/télécopieur)email: cynthia.bocci@statcan.ca<strong>SSO</strong> <strong><strong>20</strong>06</strong> <strong>Symposium</strong><strong>April</strong> <strong>20</strong>, <strong><strong>20</strong>06</strong><strong>New</strong> <strong>Frontiers</strong> <strong>in</strong> <strong>Statistics</strong><strong>Statistics</strong> Canada1<strong>20</strong> Parkdale Ave.R.H. Coats Build<strong>in</strong>g, Tunney’s PastureSimon Goldberg Room (basement level)Ottawa, OntarioThe theme of this symposium is new frontiers of statistics. Nowadays, much of the mostexcit<strong>in</strong>g statistical work revolves around problems with large, complex databases for whichthe standard l<strong>in</strong>ear models are not particularly useful. Progress with these types ofproblems usually comes from apply<strong>in</strong>g so-called 'statistical th<strong>in</strong>k<strong>in</strong>g' to the issue at handwhile develop<strong>in</strong>g novel methods of analysis that may superficially not even seem to qualifyas statistical models. Usually, there is a heavy computational aspect to the analysis with amach<strong>in</strong>e learn<strong>in</strong>g or data m<strong>in</strong><strong>in</strong>g flavor. This symposium features speakers from manyareas of statistics.<strong>Pre</strong>-registration is HIGHLY RECOMMENDED (see form).The confirmed speakers (abstracts follow):Giles Hooker, McGill University, Department of PsychologyAn ODE to <strong>Statistics</strong>: Data Analysis for System DynamicsAlan F. Karr, National Institute of Statistical Sciences (North Carol<strong>in</strong>a)Secure Statistical Analysis of Distributed DatabasesDavid Martell, University of Toronto, Faculty of ForestryStatistical Analysis of Forest Fire ActivitySteven Wang, York University, Department of Mathematics and <strong>Statistics</strong>Cluster<strong>in</strong>g Categorical Data <strong>in</strong> Large DatabasesDouglas B. Woolford, University of Western Ontario, Department of <strong>Statistics</strong> andActuarial SciencesConvergent Data Sharpen<strong>in</strong>g Applied to Lightn<strong>in</strong>gMu Zhu, University of Waterloo, Department of <strong>Statistics</strong> and Actuarial SciencesRare Target Detection with LAGOAgenda :9:00 - 9:<strong>20</strong> Welcom<strong>in</strong>g remarks9:<strong>20</strong> -10:00 Alan F. Karr’s presentation10:00-10:30 Coffee Break10:30-11:10 Mu Zhu’s presentation11:10-11:50 Steven Wang’s presentation11:50-13:00 Lunch13:00-13:40 David Martell’s presentation13:40-14:<strong>20</strong> Douglas B. Woolford’s presentation14:<strong>20</strong>-14:50 Coffee Break14:50-15:30 Gilles Hooker’s presentation


An ODE to <strong>Statistics</strong>: Data Analysis for System DynamicsGiles Hooker, McGill UniversityAbstractOrd<strong>in</strong>ary differential equations (ODEs) have a long history <strong>in</strong> modell<strong>in</strong>g the evolution ofsystems over time. They have been widely used <strong>in</strong> the physical sciences and eng<strong>in</strong>eer<strong>in</strong>g andare the object of <strong>in</strong>creas<strong>in</strong>g <strong>in</strong>terest <strong>in</strong> biological sciences; modell<strong>in</strong>g disease dynamics --both <strong>in</strong> populations and <strong>in</strong>dividuals, neural fir<strong>in</strong>g processes, human k<strong>in</strong>ematics andecological systems. These new applications correspond to systems with less accuratemeasurements than are found <strong>in</strong> the physical sciences and the proposed models can only bedescribed as approximations: no longer be<strong>in</strong>g derived from first pr<strong>in</strong>ciples. As such, thereis a new need for statistical methodology for such models.In this talk, I will showcase the power of ODEs to <strong>in</strong>tuitively describe a wide range ofcomplex behavior and discuss some of the difficulties associated with fitt<strong>in</strong>g them to dataand methods for do<strong>in</strong>g so. I will present the development of diagnostic techniques forunderstand<strong>in</strong>g poorly fit models and show that there is a correspondence between thetechniques of data analysis <strong>in</strong> l<strong>in</strong>ear regression and those needed to understand systemdynamics. F<strong>in</strong>ally, I will list some important open problems that touch on a wide range oftraditional statistical concerns.Secure Statistical Analysis of Distributed DatabasesAlan F. Karr, National Institute of Statistical Sciences (North Carol<strong>in</strong>a)AbstractA cont<strong>in</strong>u<strong>in</strong>g need <strong>in</strong> contexts from national statistics to homeland security to bus<strong>in</strong>ess isfor statistical analyses that "<strong>in</strong>tegrate" data stored <strong>in</strong> multiple, distributed databases.However, barriers to actually <strong>in</strong>tegrat<strong>in</strong>g the databases, which <strong>in</strong>clude data confidentiality,proprietary data and scale, are numerous and not easy to overcome.For many analyses, however, it is not necessary actually to <strong>in</strong>tegrate the data. Instead,us<strong>in</strong>g methods based on techniques from computer science known as secure multi-partycomputation, the database holders can share analysis-specific sufficient statisticsanonymously, but <strong>in</strong> a way that the desired analysis can be performed <strong>in</strong> a statisticallyvalid manner. Four illustrative analyses will be presented: regression for horizontallypartitioned data, secure data <strong>in</strong>tegration, secure cont<strong>in</strong>gency tables and secure maximumlikelihood for exponential family models.Partially trusted third parties (PTTPs) will also be <strong>in</strong>troduced. PTTPs hold some <strong>in</strong>formationnot available to the database holders, but to their mutual benefit, remov<strong>in</strong>g or at leastattenuat<strong>in</strong>g unilateral <strong>in</strong>centives for database holders to "cheat" by report<strong>in</strong>g false data orsufficient statistics.Statistical Analysis of Forest Fire ActivityDavid Martell, University of TorontoAbstractForest fires are common <strong>in</strong> many of the forest regions of Canada and fire and forest managersmust account for their uncerta<strong>in</strong> occurrence and their potential impact on people, propertyand natural ecosystem processes. I will present a brief overview of forest fire managementwith emphasis on decision-mak<strong>in</strong>g and the importance of understand<strong>in</strong>g and predict<strong>in</strong>g fireactivity across broad spatial and temporal scales. I will then describe some of thestatistical analyses of fire activity that have been completed and important challenges thatrema<strong>in</strong>.


Cluster<strong>in</strong>g categorical data <strong>in</strong> large databasesSteven Wang, York UniversityAbstractWe <strong>in</strong>troduce two algorithms to cluster categorical data. The first cluster<strong>in</strong>g algorithm isdesigned to handle nom<strong>in</strong>al categorical data based on Hamm<strong>in</strong>g distance vectors. The proposedmethod is conceptually simple and straightforward as it does not require any statisticalmodel. It also can detect the number of clusters automatically without any user <strong>in</strong>put. Thesignificance of a possible cluster is determ<strong>in</strong>ed by a modified Pearson Chi-square test. Thesecond algorithm tries to handle ord<strong>in</strong>al categorical data sets with dependent structures. Itwas <strong>in</strong>itially based on the empirical probability distribution. Although it is more generalthan the first algorithm, it is computationally <strong>in</strong>tensive and not applicable to large datasets. We then propose a less rigorous but more efficient algorithm based on the empiricalprobability distribution. Comparisons with well known cluster<strong>in</strong>g algorithms such as K-modesand AutoClass show that the proposed algorithms outperform their competitors for some wellknown real data sets. The computational complexity and future works will also be discussed.Convergent Data Sharpen<strong>in</strong>g Applied to Lightn<strong>in</strong>gDouglas B. Woolford and W. John Braun, University of Western OntarioAbstractWe wish to relate forest fire ignitions to lightn<strong>in</strong>g strike occurrences through the analysisof Ontario lightn<strong>in</strong>g and fire data, supplied by the Ontario M<strong>in</strong>istry of Natural Resources.However, due to the sheer volume of the lightn<strong>in</strong>g data, as well as accuracy and miss<strong>in</strong>g dataissues, changes to the data are required prior to any such <strong>in</strong>vestigation. Initialexplorations of the data <strong>in</strong>dicate that it may be useful to cluster the lightn<strong>in</strong>g strokes <strong>in</strong>space-time. We propose a mode-seek<strong>in</strong>g cluster<strong>in</strong>g algorithm based on a convergent form ofdata sharpen<strong>in</strong>g methods. Data sharpen<strong>in</strong>g, as an algorithm, nudges observations closer totheir nearest local mode(s) at each iteration. We propose to iterate the algorithm untilconvergence, show<strong>in</strong>g that the data will converge to either local or global modes. Theusefulness of the algorithm <strong>in</strong> the lightn<strong>in</strong>g context is threefold: First, the lightn<strong>in</strong>g datacan be reduced to correspond<strong>in</strong>g local spatial-temporal modes; second, slight modificationsresult <strong>in</strong> a noise-reduction method that can be applied to estimate short-term spatialtrack(s) of lightn<strong>in</strong>g storm system(s); third, the sharpened data provides a means for abootstrap based simulation of spatial lightn<strong>in</strong>g strike patterns.Rare target detection with LAGOMu Zhu, University of WaterlooAbstractLAGO is a computationally efficient tool for f<strong>in</strong>d<strong>in</strong>g rare targets <strong>in</strong> a database. To do so,LAGO scores every item <strong>in</strong> the database with a specialized radial basis function network(RBFnet), tra<strong>in</strong>ed with some learn<strong>in</strong>g data. Suppose p1 is the density function of the rareclass and p0, the density function of the background class. The RBFnet constructed by LAGOis an adaptive-bandwidth kernel density estimator of p1 adjusted locally by a factor thatapproximates p0 to the first-order. The result<strong>in</strong>g scor<strong>in</strong>g function f(x) is thusapproximately a monotonic transformation of the posterior probability that item x belongs tothe rare class.The orig<strong>in</strong>al LAGO (now called eLAGO) uses elliptical radial basis functions which can adaptmore flexibly to the tra<strong>in</strong><strong>in</strong>g data. A simpler (but perhaps more generally useful) variationthat uses spherical radial basis functions (sLAGO) is now available.


STATISTICAL SOCIETY OF OTTAWASOCIETÉ STATISTIQUE D'OTTAWACHAPTER - ASA SECTION - ASAREGIONAL A<strong>SSO</strong>CIATION - SSCA<strong>SSO</strong>CIATION RÉGIONALE - SSCRegistration FormANNUAL SYMPOSIUM OF THE STATISTICAL SOCIETY OF OTTAWA<strong>New</strong> <strong>Frontiers</strong> <strong>in</strong> <strong>Statistics</strong>Thursday, <strong>April</strong> <strong>20</strong>, <strong><strong>20</strong>06</strong>8:30 a.m. - 4:00 p.m.<strong>Statistics</strong> CanadaR.H. Coats Build<strong>in</strong>g, Tunney’s PastureSimon Goldberg Room (basement level)Ottawa, OntarioPlease use the Holland Street entrance and note that ID is required to access the build<strong>in</strong>g.Registration Fees- Cash or cheque only$75 for members if received by <strong>April</strong> 7, <strong><strong>20</strong>06</strong>, $85 late-registration fee$75 + $10 membership for non-members if received by <strong>April</strong> 7, <strong><strong>20</strong>06</strong>, $95 late-registration fee$50 for full-time students, if received by <strong>April</strong> 7, <strong><strong>20</strong>06</strong>, $60 late-registration feeRegistration fee <strong>in</strong>cludes coffee breaks and a catered lunch.Please make cheques payable to "The Statistical Society of Ottawa" and provide the follow<strong>in</strong>g<strong>in</strong>formation:Name: __________________________________________Affiliation: _______________________________________Telephone: _______________________________________I am a member of ASA ( ), SSC ( ), <strong>SSO</strong> ( )Please send completed registration form along with cheque to:Carole Jean-Marie<strong>Statistics</strong> CanadaBus<strong>in</strong>ess Survey Methods Division11 th Floor, R.H. Coats Build<strong>in</strong>g, Tunney’s PastureOttawa, Ontario K1A 0T6Tel. (613) 951-0827Fax. (613) 951-1462Email: Carole.Jean-Marie@statcan.ca

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!