iDigBio Call for Community Appliances

Wed, 2012-04-11 10:53 -- kevinlove

Overview:

An important activity of iDigBio is to deliver IT infrastructure and services for a highly coordinated biocollections digitization community. Computer appliances - separate and discrete computing devices designed to provide a specific computing resource- will enable the biocollections digitization community to interact with the iDigBio storage cloud and specimen database. To facilitate this, iDigBio seeks to team with developers of tools with demonstrated broad applicability in the collections community to help guide development, disseminate, and host virtual appliances that will integrate such tools.

Because the role of iDigBio is not to create new tools, but rather to integrate them and help reduce barriers to their implementation, identification and selection of tools and services in iDigBio is an important concern. Given the limited personnel resources of iDigBio and the effort required to create and maintain sustainable appliances, collaboration and consultation with the broader community (including tool developers and users) is key to successful tool design, implementation and dissemination. This document describes the process to be used by iDigBio for the integration, dissemination, and hosting of appliances. Examples and FAQs illustrate potential use-case scenarios for appliances that are expected to be valuable and applicable for the digitization community.

Background: underlying technology and deployment modes

Virtualization technologies, such as virtual machines (VMs) provided by commercial and open-source products including VMware, VirtualBox, KVM, and Xen, provide the technology foundation used by iDigBio to integrate software into ready-to-use appliances. Virtualization technologies allow software tools, their dependences (e.g., O/S distribution, libraries) and their configuration to be packaged in an easy-to-implement “virtual appliance”. Once created, a virtual appliance can be easily disseminated to users via file transfers (e.g., HTTP, BitTorrent, physical DVDs/USBs) or hosted on iDigBio.

Most virtual appliances are designed so that they can run using virtual machine technologies already available on the majority of today’s desktop and server environments. In iDigBio, their deployment will typically be in end-user environments (where appliances run on a user's workstation or local server) or in a hosted server environment (where appliances run in cloud infrastructures such as the iDigBio cloud, particularly useful for geo-referencing and post-processing of digitized objects stored in the iDigBio data cloud).

As examples, a virtual appliance for label capture might integrate the Linux operating system, open-source image-processing, and OCR tools; an ingestion appliance may provide the capability for automatic, reliable batch upload of images and meta-data to iDigBio. Virtual appliances such as these would target infrastructure currently managed by the end-user, and anyone in the digitization community could then download these appliances through the iDigBio portal and deploy them using the local resources of the TCN institutions (using standard desktop computers running Windows, MacOS, or Linux).

As another example, a geo-referencing system integrated into a virtual appliance might be deployed on iDigBio cloud resources, and its geo-referencing capabilities could be accessed through a Web service interface. The underlying technology (virtualization) is the same, but in this example users would not download the appliance but could instead connect to its services through the Internet. The benefit of using the iDigBio cloud resource is that it would enable scalable access to a service common to many users by leveraging the ability of cloud infrastructures to deploy multiple virtual appliances and to balance loads among them.

To foster the development and sharing of a collection of appliances, iDigBio will maintain best practices that document tools, methods and standards for the integration and publishing of appliances.

Process overview:

  1. Identification of candidate tools: 
    Recommendations for integration of tools in appliances follows an open process with users in the community. Appliances are thus expected to have broad applicability and demonstrated community buy-in. For these reasons, the mechanism for recommendations is through participation in iDigBio working groups, online forums, and workshops.
     
  2. Submission of proposal: 
    iDigBio will provide a mechanism for users to submit proposals and track proposal status. Three major categories of proposals will be considered; the type of information requested in the proposal varies for the different categories, and will be clearly conveyed in the open call for proposals to be available on the Web portal.
    1. End-user appliance

      This involves collaboration with iDigBio personnel in developing an iDigBio appliance downloadable from the portal. Key expectations for these appliances include: a) that there is an active user base for the tool in the community b) that the tool is well-documented, with clear resource requirements c) that tool development is active d) that tool developers are willing to collaborate with iDigBio personnel in the integration and maintenance of the appliance e) that the tool and its software dependences do not have license terms that would preclude their open distribution through iDigBio and f) that the resulting appliance adds value and/or simplifies user experience.
       

    2. iDigBio-hosted appliance/service

      This involves collaboration with iDigBio personnel in developing an iDigBio appliance that will be hosted on the iDigBio cloud. Key expectations for these appliances include: a) that there is an active user base for the tool in the community b) that the tool is well-documented, with clear resource requirements c) that tool development is active d) that tool developers are willing to collaborate with iDigBio personnel in the integration and maintenance of the appliance e) that the tool exposes its functionality through a well-defined and well-documented service interface and f) that the resulting appliance adds value and/or simplifies user experience.
       

    3. Dissemination of existing appliances through iDigbio

      In this category, users provide an appliance that has already been developed, and iDigBio promotes the appliance by linking to it and/or hosting the appliance files on iDigBio resources. Key expectations for these appliances include: a) that the tool is relevant to the community and b) that the tool and its software dependences do not have license terms that would preclude their open distribution through iDigBio.

  1. Proposal review:
    An iDigBio sub-committee will review proposals and make recommendations on whether to accept a proposal; the subcommittee will also identify priorities in the allocation of resources for development.

FAQ:

1. What is a virtual appliance?

A virtual appliance is a virtual machine image that is packaged, configured and customized with an operating system and software tools that allow it to perform a specific, well-defined function. To help illustrate the concept, consider a typical DVR (digital video recorder) cable box. Inside this box, there is a computer running an operating system (possibly Linux) and an application that is customized to record and playback TV programs. This complex system is packaged in a convenient and easy to use format and users are generally not aware they are interacting with a computer. Analogously, virtual appliances can combine a complex set of tools with a selected operating system into an easy to install package that allows users to perform particular tasks (e.g. data ingestion, OCR, data cleanup).

2. How do I run a virtual appliance?

To run a virtual appliance, you need two things:  virtual machine monitor (VMM) software installed in your computer (e.g., the free VMware Player and VirtualBox software) AND a virtual appliance image in your disk (available for download from a website such as iDigBio or the VMware appliance marketplace).

3. What is a virtual machine monitor?

A virtual machine monitor (VMM) is a software program that allows your computer to run multiple, isolated virtual machines, each of which has its own operating system and software. For instance, a VMM allows users to run a Linux-based virtual machine alongside a Windows or MacOS host. 

4. What virtual machine monitors are available for typical computer platforms?

There are numerous VMM technologies available as open-source or commercial software, including VMware Player/Workstation/Fusion (for Linux, Windows, MacOS), VirtualBox (also for  Linux, Windows, MacOS), KVM (for Linux), and Xen (for Linux). Typically, VMware and VirtualBox are used in desktop computers, and KVM and Xen in server computers.

5. What is a virtual appliance image?

A virtual appliance image is a file (or set of files) that contain all the information needed for a VMM to start up a virtual machine. These typically include information about the virtual machine's "virtual hardware" resources (number of processors, amount of memory), and the contents of the virtual disk that stores all information used by the virtual appliance. Virtual disks are often the largest part of an image, and can be hundreds of megabytes to tens of gigabytes depending on the size of the software they integrate.

Submission Form Template – End-user appliances

Submission Form Template – iDigBio-hosted tools and services

Submission Form Template – Dissemination of user-provided appliances