Broken/Scrum: Difference between revisions

← Older edit

Broken/Scrum (view source)

Revision as of 17:13, 13 December 2011

3,163 bytes removed , 13 December 2011

Replaced content with " Planning Meeting #1 11/29/11"

Mcollins

82

edits

@@ Line 1: / Line 1: @@
-== Planning Meeting #1 11/29/11 ==
+[[Internal:Scrum:Planning 20111129 | Planning Meeting #1 11/29/11]]
-{{:projects:idigbio:meetings:diag.jpg?200|}}
-=== Problems ===
-==== Problems with Storage ====
-* Full text search at scale - use Riak?
-* File system - cost, scale, access via web -> use object store, swift
-* Efficient text store - divide data into text and objects
-* Too big/unresiliant for 1 location
-* Federation & replication is hard - Swift sort of has repl, Riak does but $$
-* "Backup" and "Archive"
-* Mapping many:many between imgs + specimens
-==== Problems /w Local Data Processing ====
-* Iteration / MapReduce performance (Riak may support natively)
-** API programming ease
-** File system support
-* Access control / Metering / Monitoring & Policy
-* Appliance vs service vs vm
-* Port existing tools to run on our system
-* Download results vs update iDigBio
-* Image processing
-==== Problems with Data Exposure ====
-* Large requests eg results for "US" - metering, rate limiting
-* Formats - JSON, XML, CSV - and heiarchical data
-* Programatic access efficiency / latency, for r in set do
-* API bindings for used languages
-* Usage tracking
-==== Problems with Portal ====
-* Visualization depends on - geolocation - base mapping layers
-* Full text/faceted search performance
-* Taxon matching needs high quality name resolution service
-* Comparison to existing portals
-* Web design quality
-* Typical software feature requests and bugs from users - bugs -> internal redmine (poor auth integration)
-* Feedback, usage tracking
-==== Problems w/ Peers and Partners ====
-* How much data do peers get -
-* Sharding and reassembly between peers - force full copy of metadata, shard objects - best to have one place with all images but allow peers to mirror sets
-* Replication protocols - OAI-PMI
-** multimaster updates
-** data provenance and residnancy tracking
-* Peer training and technology skills to run our stack - or packaging for simplicity
-* Usage tracking of remote data access
-* Peer storage of object versions
-==== Problems with Ingestion ====
-(bad data)
-* Field mapping - standardize on Darwin, Audobon, etc
-* Taxa Name -> LSID?....
-* Georeferencing, provided, data importation, quality check
-* Outlier detection/correction
-* Staging/preview area assist with above
-* Whole data set verses updates -> frequency of updates, DWC archive, TAPIR
-* Human-in-loop vs Bulk
-(good data)
-* Specimen / Occurrence / Image ID [Local] -> GUID/URI (assign LSID range to each provider) [Global]
-* Provenance Tracking - Collection, TCN, Uploader, Residency
-* Versioning - overwrite vs append
-* Required field set - GBIF minimum, not images
-* Accepted protocols - TAPIR, DWC Arc, OAI-PMH, native to app, CSV, XLS, SQL
-=== Tasks ===
-Dec 11
-* M- Fix c11node22
-* A- Swift -> use 5 (6) nodes on c11
-* A- Riak -> install on 5 (6) nodes on c11
-Dec 17
-* M+K+J- Sample dataset -> Ask Kate
-* A+M- Select iDigBioCore from DWC + extensions
-* M- Push sample data in + convert to GeoJSON
-* A- Experiment/design som Riak queries - check performance, pick indexes
-Dec 24
-* A+M- Pick facets for search
-* A- Faceted web search, output list
-* A- GeoJSON + Polymaps
-=== Future Sprints ===
-* Does it scale?
-* Get more (good) data
-* Get more (bad) data
-* 3rd party API (low level)
-* API scale & access
-* Peering