IDigBio API Performance

From iDigBio
Jump to navigation Jump to search
Caution! Caution: The v1 API has been superseded by iDigBio API v2 / iDigBio Search API. API users should migrate to using the most recent version of the API. This page remains for historical reference.


Performance Overview

Many API providers (Google, Flickr, Twitter, ...) throttle or rate-limit access to their APIs. The iDigBio API is not throttled or rate-limited at this time.

Most normal use cases (e.g. viewing data via tools driven by actual users) should not experience performance issues.

You may notice significant performance issues if you use high offset/start numbers and large limitt/offset

If you find that you are using large "limit" and "offset" values against the API (or large "from" and "size" numbers with Elasticsearch), you may experience performance issues and should consider the alternative mechanisms for meeting your data access goals. In particular, use the download features of the portal or add additional query terms to your Elasticsearch queries to constrain the result set.

Recommendations for Bulk Record Access

If you wish to download a large number of records or a complete recordset, use the Download system available in the search portal (https://www.idigbio.org/portal).

If the search portal does not meet your needs, consider using the Elasticsearch interface which is documented in the iDigBio API pages.

If neither of these meet your needs, please contact iDigBio. https://www.idigbio.org/contact

The following information about API performance has been left here for historical reference, but generally speaking if you find yourself running into the following kinds of performance issues due to iterative paging over a large offsets, you should consider one of the alternatives mentioned above.

Understanding the performance limits fetching list of records from the iDigBio API

While fetching a single record does not require computation, fetching a list of record IDs (endpoints) varying the amount of records in the response and the starting record offset, requires different amounts of computation that leads to certain performance limits. The graphs below give you an understanding of these limits, enabling you to code in the most efficient manner depending on your needs. All graphs show the average response time of a request (blue lines; left side axis) and the number of requests that terminate with a time out exception (red lines; right side axis). Varying the number of parallel requests, we can observe that iDigBio can easily handle 60 concurrent users for requests with the first 1,000 record endpoints; this limit drops to 20 concurrent users if the requests are for larger responses containing a 10,000 record endpoints. Response time also varies with the offset of the request. With 10 concurrent users, the limit is around at the 10-million offset. The current timeout for requests is 50 seconds. Thus, the number of requests terminating with timeout increase as you get close to this limit. It is recommended that you code with retries and back-off mechanisms when you encounter timeout situations.

RecordRetrieval_1000batch_VaryingParallelism.png RecordRetrieval_10000batch_VaryingParallelism.png RecordRetrieval_1000batch_10Parallelism_VaryingOffset.png