iDigBio fullsize image availability 08/08/2018 (All day) to 01/31/2019 (All day)
iDigBio media fetching disabled 08/20/2018 (All day) to 01/31/2019 (All day)
Performance of portal mapping 10/22/2018 - 20:00 to 01/30/2019 - 19:00

Apache Spark

Exploring unique values in iDigBio using Apache Spark

Data exploration for large datasets is always challenging. Often you are left with deciding between subsetting the dataset (randomly or on some facet), making slow progress waiting for results just to find that something needs to be fixed, or optimizing code for performance when you don't even know if the result is going to be interesting. Having a high-performance system capable of ad-hoc investigation has always been difficult and/or expensive.

Subscribe to RSS - Apache Spark