17.07.2015 Views

What is GBIF ? - Extranet

What is GBIF ? - Extranet

What is GBIF ? - Extranet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Economies of ScaleAdvancements in informationmanagement technologies nowallow us to collect data for onepurpose and utilize it manytimes often for purposes otherthan those for which it wasoriginally collected.Global Biodiversity Information Facility


<strong>What</strong> <strong>is</strong> <strong>GBIF</strong> ?<strong>GBIF</strong> <strong>is</strong> an international scientific co-operativeproject based on a multilateral agreement (MoU)between countries, economies and internationalorgan<strong>is</strong>ations, dedicated to:• establ<strong>is</strong>hing an interoperable, d<strong>is</strong>tributednetwork of databases containing scientificbiodiversity information• with initial focus on species- and specimen-leveldata,• with links to molecular, genetic and ecosystemslevelsGlobal Biodiversity Information Facility


Why was <strong>GBIF</strong> establ<strong>is</strong>hed ?• To make primary scientific data aboutbiodiversity openly and freely accessible toeveryone, no matter where in the world theylive.• Calls from governments, industry and the publicfor scientific biodiversity information areincreasing steadily.• Basic biodiversity information <strong>is</strong> needed forenvironmental dec<strong>is</strong>ion-making, scientificinquiry, and economic development.Global Biodiversity Information Facility


<strong>GBIF</strong> Participants: 47 Countries• Argentina• Australia• Austria• Belgium• Benin• Bulgaria• Cameroon• Canada• Colombia• Costa Rica• Czech Republic• Denmark• Equatorial Guinea• Estonia• Finland• France• Germany• Ghana• Guinea• Iceland• India• Indonesia• Japan• Korea• Madagascar• Mexico• Morocco• Netherlands• New Zealand• Nicaragua• Norway• Pak<strong>is</strong>tan• Papua New Guinea• Peru• Philippines• Poland• Portugal• Slovakia• Slovenia• South Africa• Spain• Sweden• Switzerland• Taiwan• Tanzania• UK• USAGlobal Biodiversity Information Facility


<strong>GBIF</strong> Associate Participants:31 International Organizations• ARCBC• ASEANET• BioNET• BIOSIS• BGCI• CABI Bioscience• CBOL• EASIANET• European Comm<strong>is</strong>sion• ETI• Finding Species• FreshwaterLife• ICIPE• IABIN• ICZN• ITIS• IUCN• NatureServe• Nordic Gene Bank• NSCA• OBIS• PBIF• SAFRINET• SBSV• Species 2000• TDWG• UNESCO - MAB• UNEP – WCMC• WDCBE• WFCC• Wildscreen TrustGlobal Biodiversity Information Facility


Everything <strong>GBIF</strong> does <strong>is</strong> inpartnership with othersIUCN/Species Information ServiceCBD/Global Taxonomy InitiativeGlobal Strategyfor PlantConservationGlobal Biodiversity Information Facility


<strong>What</strong> do we mean by primarybiodiversity data ?• Label data on ~ 1.5 - 3.0 billion specimens innatural h<strong>is</strong>tory collections, herbaria, botanicalgardens, etc.• Associated notes,recordings, publications,etc.• Observational data (e.g.bird banding data)• These data have beenamassed over ~ 300years; most not digital• Big legacy data problemGlobal Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Biodiversity Knowledge“Taken collectively, the plant andanimal specimens in the world’smuseum collections combined withrecent observational and monitoringdata provide our most completepicture of the biological diversity ofthe planet.”Global Biodiversity Information Facility


Emergent Properties of Large Data Sets• Combine institutional and nationalinformation resources into large,hem<strong>is</strong>pheric whole• Take advantage of 3,000,000,000specimens in world museums• New relevance to systematics, scientificcollections, and biodiversity studies• Integrate in a truly useful, scientificresource for basic and applied studiesGlobal Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Standards for data exchangeare the keyGlobal Biodiversity Information Facility


Global Biodiversity Information Facility


andprotocols• Simple web services• XML messaging between computer applications• Th<strong>is</strong> <strong>is</strong> open data sharing -- not data exchange withtrade partner agreements• Enables search & retrieval of structured data• Enable single point of access (portal/search) tod<strong>is</strong>tributed information resources• Created by the TDWG/CODATA subgroup onbiological collection data• Unified protocol in 2005: Merger of DiGIR andBioCASe to TAPIR ”TDWG Access Protocol forInformation Retrieval”


Darwin Core and ABCDdata formats• Two XML schemata for data exchangeavailable and to choose from• Darwin Core <strong>is</strong> a minimal set• 48 elements in flat structure• Can be extended for instance, curatorial,bacteriological, observational, crops, ...• ABCD (Access to Biological CollectionData) <strong>is</strong> a superset• 1200 elements in hierarchical structure• Can describe entire collectionGlobal Biodiversity Information Facility


Primary species occurrence core data includes but<strong>is</strong> not limited to the following essential details:• Name of the taxon to which theorgan<strong>is</strong>m has been assigned• Location where the specimen wascollected or the observation made• Date on which the specimen wascollected or the observation made• Where the specimen or record <strong>is</strong> heldand how to access more informationGlobal Biodiversity Information Facility


as Darwin CoreGlobal Biodiversity Information Facility


Network Structure<strong>GBIF</strong> PortalParticipantPortals orNodesData‐richlocalsourcesD<strong>is</strong>tributed, local or regional, special<strong>is</strong>ed databasesGlobal Biodiversity Information Facility


<strong>GBIF</strong> Information ArchitectureUserRequests<strong>GBIF</strong> Data NodesBiodiversity Data Access PortalSpecimen DataSpecimen DataSpecimen DataMetadataSpecimen DataSpecimen DataObservation DataSpecimen DataSpecimen DataCheckl<strong>is</strong>tsMetadataMetadataBiodiversityDataIndexQueryTaxonomicNameService(ECAT)Catalogueof LifeSpecimen Links to Data other DatadataMetadataGlobal Biodiversity Information Facility


<strong>GBIF</strong>'s Fundamental Principles• Equitable sharing of data• Data providers retain control• Protection of intellectual property rights• D<strong>is</strong>tributed network architecture• Common standards and protocols• Partnerships are paramount• Avoidance of duplication of effort• Promotion of technical developments todeal with complexity of biodiversity dataGlobal Biodiversity Information Facility


<strong>GBIF</strong> Data Sharing Agreement1. Biodiversity data accessible via the <strong>GBIF</strong> network are openly and universallyavailable to all users within the framework of the <strong>GBIF</strong> Data Use Agreement and withthe terms and conditions that the data provider has identified in its metadata.2. <strong>GBIF</strong> does not assert any intellectual property rights in the data that <strong>is</strong> madeavailable through its network.3. The data provider warrants that they have made the necessary agreements with theoriginal owners of the data that it can make the data available through <strong>GBIF</strong>network.4. The data provider makes reasonable efforts to ensure that the data they serve areaccurate.5. Responsibility regarding the restriction of access to sensitive data resides with thedata provider.6. The data provider includes stable and unique identifier in their data so that theowner of the data <strong>is</strong> known and for other necessary purposes.7. <strong>GBIF</strong> Secretariat may cache a copy and serve full or partial data further to other userstogether with the terms and conditions for use set by the data provider. Queries ofsuch data through the <strong>GBIF</strong> Secretariat are reported to the data provider.8. Data providers are endorsed by a <strong>GBIF</strong> Participant, if applicable, before theirmetadata <strong>is</strong> made available by the <strong>GBIF</strong> Secretariat.9. <strong>GBIF</strong> Secretariat <strong>is</strong> not responsible for data content or the use of the data.10. <strong>GBIF</strong> Secretariat <strong>is</strong> not liable or responsible, nor are its employees or contractors, forthe data contents; or for any loss, damage, claim, cost or expense however it mayar<strong>is</strong>e, from an inability to use the <strong>GBIF</strong> network.


<strong>GBIF</strong> Data Use Agreement1. The quality and completeness of data cannot be guaranteed.Users employ these data at their own r<strong>is</strong>k.2. Users shall respect restrictions of access to sensitive data.3. In order to make attribution of use for owners of the datapossible, the identifier of ownership of data must be retainedwith every data record.4. Users must publicly acknowledge, in conjunction with the useof the data, the data providers whose biodiversity data theyhave used. Data providers may require additional attributionof specific collections within their institution.5. Users must comply with additional terms and conditions ofuse set by the data provider. Where these ex<strong>is</strong>t they will beavailable through the metadata associated with the data.Global Biodiversity Information Facility


Growth rate of <strong>GBIF</strong> data sharingGrowth in Data Sharing Oct 2003 - Aug 2005140Providers Records90,012080,010080604020Oct-03Nov-03Dec-03Jan-04Feb-04Mar-04Apr-04May-04Jun-04Jul-04Aug-04Sep-04Oct-04Nov-04Dec-04Jan-05Feb-05Mar-05Apr-05May-05Jun-05Jul-05Aug-05Data Providers070,0Data Records (in millions)60,050,040,030,020,010,00,0Global Biodiversity Information Facility


Currently …as of 30 Sep 2005Global Biodiversity Information Facility


Name and concept data standards• Taxonomic Concept Schema (Napier Schema) includes• LinneanCore for name data, incl. publication• Circumscriptions of specimens• ”The best quality representations of a concept would explicitlydefine the concept, by recording the circumscription of theconcept by other concepts, character descriptions or biologicalspecimens.”• Name provider package will be available from <strong>GBIF</strong> inQ3/2005• To use TAPIR protocol and UBIF metadata• Species 2000 SPICE protocol (with the Common DataModel) remains the most widely used standard forexchange of name dataGlobal Biodiversity Information Facility


Growth in names data…Actual and Projected Growth in Names Data ContentNumber of Known Species200000018000001600000140000012000001000000800000600000400000200000120%100%80%60%40%20%Percent of Total02000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 20110%Global Biodiversity Information Facility


<strong>GBIF</strong>'s role inbiodiversity information networksGlobal Biodiversity Information Facility


DiGIR-BioCASe Protocol and Nested NetworksGet DiGIR-style recordseachwitha set ofDarwinCore descriptors and acomplete ABCD UnitGet complete ABCDdocuments from eachBioCASe providerUserGet Darwin Corerecords wheredarwin:ScientificNameequals Puma concolorfrom any provider.Get standard plantgenetic resourcePassport data for allcrop types.Get full set of SoyBean cropdescriptors.BioCASe ProviderBioCASe ProviderMaNIS ProviderOBIS ProviderIPGRI Banana ProviderIPGRI Soy Bean ProviderTaxon OccurrenceTaxon OccurrenceTaxon OccurrenceTaxon OccurrenceTaxon OccurrenceTaxon OccurrenceDarwin CoreDarwin CoreDarwin CoreDarwin CoreDarwin CoreDarwin CoreABCDABCDCuratorialMarineIPGRI PassportIPGRI PassportBanana DescriptorSoy Bean Descriptor


Why share data through <strong>GBIF</strong>?• The value of data <strong>is</strong> in its use• Data that potential users are not aware of or cannot access <strong>is</strong> of little orno value• Still, a significant proportion biodiversity data <strong>is</strong> under-util<strong>is</strong>ed becausepotential users are not aware of its ex<strong>is</strong>tence or cannot access it• Increased awareness of and util<strong>is</strong>ation of ex<strong>is</strong>ting specieslevel biodiversity data highlights the importance ofnatural h<strong>is</strong>tory collections and observational data• Effective sharing of data/information/knowledge increases the value oftaxonomic research for society• Will in the longer term increase funding opportunities• Synerg<strong>is</strong>tic effects in combining data: 1+1>2• Exposing information leads to improved quality• Feedback and data cleansing• <strong>GBIF</strong> <strong>is</strong> fundamental enabling research infrastructure• You must deposit your data in GenBank in order to publ<strong>is</strong>h...• <strong>What</strong> <strong>is</strong> the scientific merit of sharing data?Global Biodiversity Information Facility


Issues and Concerns of Data Providers•Data Quality IssuesGlobal Biodiversity Information Facility


QualityQuality as applied to data, has variousdefinitions but in the geographic worldone definition <strong>is</strong> now largely accepted –that of “fitness for use” (Chr<strong>is</strong>man 1983).Global Biodiversity Information Facility


Fitness for useIn a database, the data have no actual qualityor value; they only have potential value. Thatvalue <strong>is</strong> realized only when someone uses thedata to do something useful (Engl<strong>is</strong>h 1999).The quality of data cannot be assessedindependently of the users of thatdata (Strong et al. 1997).Global Biodiversity Information Facility


Fitness for useData are of high quality if they are fitfor their intended use in operations,dec<strong>is</strong>ion-making, and planning.(Juran 1964)Global Biodiversity Information Facility


<strong>What</strong> do we mean by “fitness for use”?Fitness for use–Does species ‘x’ occur in Tasmania?–Does species ‘x’ occur in National Park ‘y’Global Biodiversity Information Facility


Data cleaning tools• Error <strong>is</strong> inherent in all data• Rather than attempting to hide errors, it<strong>is</strong> better to expose them broadly tofacilitate quality improvement• For these reasons, the prototype portalhas a feedback mechan<strong>is</strong>m to dataprovidersGlobal Biodiversity Information Facility


Data CleaningA process used to determine if dataare inaccurate, incomplete, orunreasonable and then improving thequality through correction of detectederrors and om<strong>is</strong>sions.Global Biodiversity Information Facility


Methods for geocode validation• Internal Database Checks• External Database Checks• Outliers in Geographic Space - GIS• Outliers in Environmental Space - Models• Stat<strong>is</strong>tical outliersGlobal Biodiversity Information Facility


Internal/External Database Checks• Logical incons<strong>is</strong>tencies within the database• Checking one field against another• Text location vs geocode or D<strong>is</strong>trict/State• Checking one database against another• Gazetteers• DEM• CollectorsGlobal Biodiversity Information Facility


Geographic outliers - GIS• Country, State, named d<strong>is</strong>trict, etc.Gazetteer of Brazilian localitiesGlobal Biodiversity Information Facility


Environmental Outliers• Cumulative Frequency CurvesGlobal Biodiversity Information Facility


Bioclimatic Envelop – Diva GISResults from Diva-GIS showing the use of the Bioclimatic Envelopefrom BIOCLIM to identify Global outliers Biodiversity in climate Information space. Facility th<strong>is</strong> case thepercentile cut off <strong>is</strong> set at 95. Red points on the envelopecorrespond with red points on the map, green points in theenvelope correspond with yellow points on the map


Issues and Concerns of Data Providers•Data Quality Issues•Dealing with Sensitive DataGlobal Biodiversity Information Facility


Approaches to Data Generalization• Geo-spatial• Rounding to nearest degree or minute• Using centroid of bounding box• Randomized off-set• Taxonomic• Releasing only generalized data• Restricting access (User authorization)Global Biodiversity Information Facility


Global Biodiversity Information Facility


Global Biodiversity Information Facility


Issues and Concerns of Data Providers•Data Quality Issues•Dealing with Sensitive Data•Getting Proper AttributionGlobal Biodiversity Information Facility


Develop globally unique identifiers• Data sharing would be greatly facilitatedif there were Globally Unique Identifiers(GUIDs) for:• Collections• Specimens• Taxonomic Names• Taxon Concepts• Collectors and localities• etc.Global Biodiversity Information Facility


Develop globally unique identifiers• <strong>GBIF</strong>’s proposed approach:• Hold series of workshops to d<strong>is</strong>cuss GUIDconcept and define areas of highest need• Identify appropriate type of GUID for eacharea (e.g. LSID, DOI, etc.)• Work with community to set up GUIDproviders• Incorporate GUIDs into portal and otherdatabasesGlobal Biodiversity Information Facility


<strong>What</strong> makes <strong>GBIF</strong> work• Standard<strong>is</strong>ed schemata for data sharing• Network of providers• Participant nodes promote and coordinate activities of data providers• Collaborative agreements• Control and ownership of data remains with providers• Procedures for interoperability• Web services, in particular• global reg<strong>is</strong>try for advert<strong>is</strong>ement of shared data• Integration at <strong>GBIF</strong> Data Portal• But anyone can build a thematic or national portal• V<strong>is</strong>ion and leadership• <strong>GBIF</strong> mandate <strong>is</strong> unique• <strong>GBIF</strong> <strong>is</strong> multi-purpose open-ended cyber-infrastructure that enablesbiolog<strong>is</strong>ts to serve the society in new ways.• ”<strong>GBIF</strong> somos todos.”Global Biodiversity Information Facility


“The value of datalies in their use.”Bits of Power – Issues in Global Access to Scientific DataNational Academy Press 1997Global Biodiversity Information Facility


“Interoperability must beperceived as the sharingof information.”Eliminating Legal and Policy Barriers to InteroperableGovernment Systems - Electronic Commerce, Law, andInformation Policy Strategies Report June 1999Global Biodiversity Information Facility


“The most profound barriers tointeroperability are the soft“human technologies” implied infundamental policy andorganizational design.”Eliminating Legal and Policy Barriers to InteroperableGovernment Systems - Electronic Commerce, Law, andInformation Policy Strategies Report June 1999Global Biodiversity Information Facility


“In summary, more exposuretypically results in more v<strong>is</strong>ibility,recognition, invitations, citationsand projects.”The need to make scientific data publicly available R. Froese et al. ACP-EUF<strong>is</strong>heries Research Report (14) 267 - 271Global Biodiversity Information Facility

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!