|
|
Line 1: |
Line 1: |
| == Planning Meeting #1 11/29/11 ==
| | [[Internal:Scrum:Planning 20111129 | Planning Meeting #1 11/29/11]] |
| | |
| {{:projects:idigbio:meetings:diag.jpg?200|}}
| |
| | |
| === Problems ===
| |
| | |
| ==== Problems with Storage ====
| |
| | |
| * Full text search at scale - use Riak?
| |
| * File system - cost, scale, access via web -> use object store, swift
| |
| * Efficient text store - divide data into text and objects
| |
| * Too big/unresiliant for 1 location
| |
| * Federation & replication is hard - Swift sort of has repl, Riak does but $$
| |
| * "Backup" and "Archive"
| |
| * Mapping many:many between imgs + specimens
| |
| | |
| ==== Problems /w Local Data Processing ====
| |
| | |
| * Iteration / MapReduce performance (Riak may support natively)
| |
| ** API programming ease
| |
| ** File system support
| |
| * Access control / Metering / Monitoring & Policy
| |
| * Appliance vs service vs vm
| |
| * Port existing tools to run on our system
| |
| * Download results vs update iDigBio
| |
| * Image processing
| |
| | |
| ==== Problems with Data Exposure ====
| |
| | |
| * Large requests eg results for "US" - metering, rate limiting
| |
| * Formats - JSON, XML, CSV - and heiarchical data
| |
| * Programatic access efficiency / latency, for r in set do
| |
| * API bindings for used languages
| |
| * Usage tracking
| |
| | |
| ==== Problems with Portal ====
| |
| | |
| * Visualization depends on - geolocation - base mapping layers
| |
| * Full text/faceted search performance
| |
| * Taxon matching needs high quality name resolution service
| |
| * Comparison to existing portals
| |
| * Web design quality
| |
| * Typical software feature requests and bugs from users - bugs -> internal redmine (poor auth integration)
| |
| * Feedback, usage tracking
| |
| | |
| ==== Problems w/ Peers and Partners ====
| |
| | |
| * How much data do peers get -
| |
| * Sharding and reassembly between peers - force full copy of metadata, shard objects - best to have one place with all images but allow peers to mirror sets
| |
| * Replication protocols - OAI-PMI
| |
| ** multimaster updates
| |
| ** data provenance and residnancy tracking
| |
| * Peer training and technology skills to run our stack - or packaging for simplicity
| |
| * Usage tracking of remote data access
| |
| * Peer storage of object versions
| |
| | |
| ==== Problems with Ingestion ====
| |
| | |
| (bad data)
| |
| * Field mapping - standardize on Darwin, Audobon, etc
| |
| * Taxa Name -> LSID?....
| |
| * Georeferencing, provided, data importation, quality check
| |
| * Outlier detection/correction
| |
| * Staging/preview area assist with above
| |
| * Whole data set verses updates -> frequency of updates, DWC archive, TAPIR
| |
| * Human-in-loop vs Bulk
| |
| (good data)
| |
| * Specimen / Occurrence / Image ID [Local] -> GUID/URI (assign LSID range to each provider) [Global]
| |
| * Provenance Tracking - Collection, TCN, Uploader, Residency
| |
| * Versioning - overwrite vs append
| |
| * Required field set - GBIF minimum, not images
| |
| * Accepted protocols - TAPIR, DWC Arc, OAI-PMH, native to app, CSV, XLS, SQL
| |
| | |
| | |
| === Tasks ===
| |
| | |
| Dec 11
| |
| * M- Fix c11node22
| |
| * A- Swift -> use 5 (6) nodes on c11
| |
| * A- Riak -> install on 5 (6) nodes on c11
| |
| | |
| Dec 17
| |
| * M+K+J- Sample dataset -> Ask Kate
| |
| * A+M- Select iDigBioCore from DWC + extensions
| |
| * M- Push sample data in + convert to GeoJSON
| |
| * A- Experiment/design som Riak queries - check performance, pick indexes
| |
| | |
| Dec 24
| |
| * A+M- Pick facets for search
| |
| * A- Faceted web search, output list
| |
| * A- GeoJSON + Polymaps
| |
| | |
| === Future Sprints ===
| |
| | |
| * Does it scale?
| |
| * Get more (good) data
| |
| * Get more (bad) data
| |
| * 3rd party API (low level)
| |
| * API scale & access
| |
| * Peering
| |