Agenda• UrbanSim Overview• Simplest zone model• Complete zone model• Parcel model• Computational Platform: OPUS• Data Development
What Is UrbanSim?Urbansim is an integrated microsimulation model system forplanning and analysis of urban development, incorporating theinteractions between land use, transportation, and public policy.ServicesGovernmentsTransportationLandHousingDevelopersFloorspaceHouseholdsLaborBusinessFlow of consumption from supplier to consumer.Regulation or pricing.
UrbanSim ObjectivesMust assess the potential effects of policies• Transportation projects (transit, highway, tolls)– Massive investments with long lead times (10-30 years)• Land use regulations (zoning, environmental restrictions)– Very dispersed decision-makingMust be credible• Reasonable accuracy of predictions• Lack of bias• Plausible precision• Sensitive to relevant policies• Understandable behavior4
Primary UrbanSim Databases for Microsimulationprimary inputs and outputsParcels Buildings Households Persons JobsParcel idBuilding idHousehold idPerson idJob idZones,cities, zipcode, etc.Parcel idBuilding id Household id /Job id (ifworker)Building id1.18 millionparcels1.0 millionbuildings1.28 millionhouseholds3.2 millionpeople1.85 millionjobsCounts are from Puget Sound model application
UrbanSim Models:Über-Über-Simplified Zonal VersionHouseholdLocationModelsHousehold Transition ModelHousehold Location Choice ModelEmploymentLocationModelsEmployment Transition ModelEmployment Location Choice ModelNo representation of supply side of real estate market, or prices. No relocation ofagents once placed. Becomes an ‘incremental’ model, allocating growth.
UrbanSim Models:Über-Simplified Zonal VersionHouseholdLocationModelsHousehold Transition ModelHousehold Relocation ModelHousehold Location Choice ModelEmploymentLocationModelsEmployment Transition ModelEmployment Relocation ModelEmployment Location Choice ModelBeing used in Research Triangle Park, North Carolina. No representation of supplyside of real estate market, or prices. Last resort when there is no data on supply.
UrbanSim Models: Zonal VersionHouseholdLocationModelsLandDevelopmentModelsReal Estate Price ModelResidential Development ProjectLocation Choice ModelNonresidential Development ProjectLocation Choice ModelBuilding Construction ModelEmploymentLocationModelsHousehold Transition ModelHousehold Relocation ModelHousehold Location Choice ModelEmployment Transition ModelEmployment Relocation ModelEmployment Location Choice Model
UrbanSim Models: Full Parcel VersionLandDevelopmentModelsProcess Pipeline EventsReal Estate Price ModelHouseholdLocationModelsExpected Sale Price ModelDevelopment Proposal Choice ModelBuilding Construction ModelEmploymentLocationModelsHousehold Transition ModelHousehold Relocation ModelHousehold Location Choice ModelEmployment Transition ModelEmployment Relocation ModelEmployment Location Choice ModelEconomic Transition ModelWorkplaceLocationModelsHome-based Job Choice ModelWorkplace Location Choice ModelJob Change Model
Model Specification: Multinomial Logit
Household Location ChoicePredicts grid cell or parcel location choiceApplies to new and moving householdsMultinomial Logit specificationUsed 1999 Household Travel Survey• 2,364 households who moved within 5 yrsVariables used:• Housing cost to income ratio• Income * improvement value/unit• Trip-weighted utility for HBW by SOV; Transit• Near arterial road• Housing density within walking distance• Housing density if HH has children• High density if HH is young• Mixed use development if HH is young• Percent High Income if HH is high income• Percent Mid Income if HH is mid income• Percent Low Income if HH is low income• Percent Minority if HH is minority• Percent Minority if HH is not minorityU i= u i+ ε iu i= β i⋅ x iP i=∑∀i 'eβ i ⋅x ie β i ' ⋅x i 'Sample specification from Puget Sound
Employment Location ChoicePredicts grid cell location choiceApplies to new and moving jobsMultinomial Logit specification1995-2000 employment changeVariables used:• Land value in area• Total value of land and improvements• Trip-weighted (destination) utility for HBW by SOV• Travel time to Seattle CBD• Employment by sector in area• Industrial; commercial sqft• Near arterial road• Near highway• Housing density in area• Percent low income in area; mid-income• Building ageU i= u i+ ε iu i= β i⋅ x iP i=∑∀i 'eβ i ⋅x ie β i ' ⋅x i 'Sample specification from Puget Sound
Real Estate Development ModelPredicts grid cell or parcel development eventsMultinomial Logit specificationOne equation per starting land use typeUsed 1995-2000 development eventsVariables used:• Value of land and improvements• Land value per acre in area• Employment by sector in area• Housing units in area• Proximity to existing development• Development composition in area• Recent development events in area• Travel time to Seattle CBD• Trip-weighted travel utility• Highway adjacency and distance from• Percent: floodplain; water; wetland; stream buffer; steep slopeU i= u i+ ε iu i= β i⋅ x iP i=∑∀i 'eβ i ⋅x ie β i ' ⋅x i 'Sample specification from PSRC
Model Assessment Methods• Assessment of Individual Models• Goodness of fit measures for models• Confidence intervals for variables• Assessment of Model System• Longitudinal Validation14
Validation of Workplace Choice ModelIndividual-level New Logit modelRMSE 1440Previous aggregate gravity modelRMSE 255815
Historical Validation from 1980 – 1994:Correlation of Simulated vs Observed 1994Eugene-Springfield, OregonCell Zone 1-Cell RadiusEmployment 0.805 0.865 0.917Population 0.811 0.929 0.919Nonresidential Sq ft 0.799 0.916 0.927Housing Units 0.828 0.927 0.918Land Value 0.830 0.925 0.908
Historical ValidationEugene-Springfield 1980-1994
Comparison of Predicted to Observed Changesfrom 1980-1994
Experience from UrbanSim ProjectRobust system developed over first several years of project – but it wasstill too hard to do important tasks:• Create input data (data integration and cleaning)• Implement new models• Modify specifications of existing models• Add variables• Change the spatial units of analysis• Estimate models• Diagnose models• Visualize results• Interface with travel models• Generate indicators• Manage runs and access and view resultsDecided in January 2005 to design Open Platform for Urban Simulation
Design Requirements for Opus• Open Source• Very Highly Modular, Below Model Level• Extensible by User-Contributed Packages• Scripting Capacity• High Performance for Production Use• Integrate Model Estimation and Application• Integrate Spatial Analysis and Visualization• Represent Multiple Levels of Geography• Allow Integration of Heterogeneous Components• Computing languages• Model scopes (e.g. land use, traffic assignment)• Modeling approaches (discrete choice, ABM, rules)• To our knowledge, no previous platform does all this.
Modular Choice ModelsSelection of ChoosersCreation of Choice Sets• Filters: feasible choices• Samplers: Random, stratified, weightedVariables• Modular variable computation with dependencies implementedUtilities• Linear implemented, non-linear plannedProbabilities• Multinomial logit implemented, others plannedEstimation• MNL implemented, others plannedChoice selection• Random, Lottery, Constrained
Integrated Estimation & ApplicationToo many problems arise from loosely coupledmodel estimation and application:• Redundant specifications lead to errors• Data setup is tedious and inefficient• Getting estimation results into usable form hard• Experimentation and iteration is very costly and error proneThe solution:• One repository for the model specifications• Integrate model estimation into modular system– Shares application code, adds estimation step
Integrated VisualizationLoosely coupled GIS is too inefficient• Can export data for making pretty maps• But too time consuming for exploratory work• And what about dynamic maps – animations?Need Integrated Spatial Analysis• Solution: Python Numeric packages for image processing – fastspatial queries (Numpy, Ndimage)Need Tightly Coupled Visualization• Must be able to display data in memory on map• E.g. Python with Mapnik
Integration of HeterogeneousComponents:A Tiered Opus ArchitectureOpus CorePythonOpus PackagesPythonExternal LibrariesC/C++
Opus External Libraries(C/C++ with Python Wrappers)Statisticaland NumericLaplackBLASNumpyScipyRBiogemeData Managementand GISMySQLPostgresSQLitePostGISQGISOpenEVPROJ4GDALTravel ModelsMATSIMMetropolisMALTAAMOSVISUMEmme/3TranscadPackages in Bold have already been interfaced to Opus, remainder in progress
Data Integration in UrbanSim27
UrbanSim Data Integration Process
New Model System Based onParcels and Buildings
Input Data: Land Use Plan
Input Data: UGB and Environmental
Input Data: Household Survey
Input Data: Employment
Data Integration Challenges• Messy Data• Many outliers, errors, and missing data• Inconsistent coding schemes among data sources• Difficult to integrate with other data sources• Building-level data• Business establishment data• Market information (vacancies, prices, rents)• Volume of data too massive to manually correct• 2+ milliion parcels in Bay Area• Problems hard to diagnose• Which data is wrong? (which attributes/sources are incorrect? May havesystematic patterns of omissions – e.g. tax-exempt properties)• Misgeocoding: some businesses are geocoded to the wrong place.Complicates the diagnosis.34
The Magnitude of the ProblemSeattleThis map shows only buildings withmissing values for “Building TypeID”, a description variable.195,501 out of ~1,200,000Building Type ID = NullKing, Kitsap, Pierce & Snohomish Co.Tacoma
Data Imputation Tool?
Data Imputation Tool
Ways Forward on Data IntegrationOption 1:• Machine Learning/Data Mining• Model patterns in the observed data• Use the models to detect outliers, impute data• Preliminary work on this now implemented using WEKA libraryOption 2:• In some cases, the missingness level is very high• Developing countries (e.g. Ghana and South Africa)• Potential to Synthesize much of the data, subject toconstraints, using procedural modelingOption 3:• Potential to hybridize statistical/machine learning andprocedural modeling, to synthesize from disparate sources?
Data ImputationK-Nearest Neighbors for Continuous AttributesAttributes: Stories, Bldg SF,Improvement Value, etc."KNN Basics:"Finds k closest neighbors in ndimensional space."Uses k neighbors target valuesto make prediction."
Data ImputationSupport Vector Machines for Categorical AtributesAttributes: Building Use Code,Land Use Code, etc."SVM maps training instancesinto higher dimensionalspace."Creates hyper planes that havemaximum distances frominstances as categoryboundaries."
Machine Learning/Data MiningSo far, only applied to single tables, single outputNeed to develop analysis for:• multivariate outcomes,• across tables,• some of which have poorly-defined (spatial) crossreferences• and mixtures of continuous, categorical and orderedoutcomes42
Option 2: Procedural ModelingTown centerPopulationParcelsTerrainParksJobsBuildingsInputOutput
Results: Completion and ValidationReal CitySynthetic City