Difference between revisions of "Internal:Scrum"

From iDigBio
Jump to: navigation, search
(Replaced content with " Planning Meeting #1 11/29/11")
 
Line 1: Line 1:
== Planning Meeting #1 11/29/11 ==
+
[[Internal:Scrum:Planning 20111129 | Planning Meeting #1 11/29/11]]
 
+
{{:projects:idigbio:meetings:diag.jpg?200|}}
+
 
+
=== Problems ===
+
 
+
==== Problems with Storage ====
+
 
+
* Full text search at scale - use Riak?
+
* File system - cost, scale, access via web -> use object store, swift
+
* Efficient text store - divide data into text and objects
+
* Too big/unresiliant for 1 location
+
* Federation & replication is hard - Swift sort of has repl, Riak does but $$
+
* "Backup" and "Archive"
+
* Mapping many:many between imgs + specimens
+
 
+
==== Problems /w Local Data Processing ====
+
 
+
* Iteration / MapReduce performance (Riak may support natively)
+
** API programming ease
+
** File system support
+
* Access control / Metering / Monitoring & Policy
+
* Appliance vs service vs vm
+
* Port existing tools to run on our system
+
* Download results vs update iDigBio
+
* Image processing
+
 
+
==== Problems with Data Exposure ====
+
 
+
* Large requests eg results for "US" - metering, rate limiting
+
* Formats - JSON, XML, CSV - and heiarchical data
+
* Programatic access efficiency / latency, for r in set do
+
* API bindings for used languages
+
* Usage tracking
+
 
+
==== Problems with Portal ====
+
 
+
* Visualization depends on - geolocation - base mapping layers
+
* Full text/faceted search performance
+
* Taxon matching needs high quality name resolution service
+
* Comparison to existing portals
+
* Web design quality
+
* Typical software feature requests and bugs from users - bugs -> internal redmine (poor auth integration)
+
* Feedback, usage tracking
+
 
+
==== Problems w/ Peers and Partners ====
+
 
+
* How much data do peers get -
+
* Sharding and reassembly between peers - force full copy of metadata, shard objects - best to have one place with all images but allow peers to mirror sets
+
* Replication protocols - OAI-PMI
+
** multimaster updates
+
** data provenance and residnancy tracking
+
* Peer training and technology skills to run our stack - or packaging for simplicity
+
* Usage tracking of remote data access
+
* Peer storage of object versions
+
 
+
==== Problems with Ingestion ====
+
 
+
(bad data)
+
* Field mapping - standardize on Darwin, Audobon, etc
+
* Taxa Name -> LSID?....
+
* Georeferencing, provided, data importation, quality check
+
* Outlier detection/correction
+
* Staging/preview area assist with above
+
* Whole data set verses updates -> frequency of updates, DWC archive, TAPIR
+
* Human-in-loop vs Bulk
+
(good data)
+
* Specimen / Occurrence / Image ID [Local] -> GUID/URI (assign LSID range to each provider) [Global]
+
* Provenance Tracking - Collection, TCN, Uploader, Residency
+
* Versioning - overwrite vs append
+
* Required field set - GBIF minimum, not images
+
* Accepted protocols - TAPIR, DWC Arc, OAI-PMH, native to app, CSV, XLS, SQL
+
 
+
 
+
=== Tasks ===
+
 
+
Dec 11
+
* M- Fix c11node22
+
* A- Swift -> use 5 (6) nodes on c11
+
* A- Riak -> install on 5 (6) nodes on c11
+
 
+
Dec 17
+
* M+K+J- Sample dataset -> Ask Kate
+
* A+M- Select iDigBioCore from DWC + extensions
+
* M- Push sample data in + convert to GeoJSON
+
* A- Experiment/design som Riak queries - check performance, pick indexes
+
 
+
Dec 24
+
* A+M- Pick facets for search
+
* A- Faceted web search, output list
+
* A- GeoJSON + Polymaps
+
 
+
=== Future Sprints ===
+
 
+
* Does it scale?
+
* Get more (good) data
+
* Get more (bad) data
+
* 3rd party API (low level)
+
* API scale & access
+
* Peering
+

Latest revision as of 17:13, 13 December 2011

Planning Meeting #1 11/29/11