AWS Genomics Event: Matt Wood; Chris Dagdigian on cloud biology automation

I’m in Seattle at the AWS Genomics Event, excited for a fun day of talking about genomics in the cloud.

Introduction to Research in the Cloud: Matt Wood, AWS

Matt Wood starts off the day with an introduction to Amazon Web Services and details about Amazon’s interest in Genomics. Idea is to move from data to materials, and from compute to methods; focusing better on the science. Areas where Amazon interacts with science:

  • Reproducibility: 1000 genomes a great example. Improves the impact of science by easing reuse. Can package the environment as machine images, which is awesome since you can give collaborators exactly what you did. Allows us to work in new ways since you can share complex environments. CloudFormation allows you to define in JSON all of the items in a cluster. Tools like Puppet and Chef provision software and configuration. Taverna can model the actual science workflow. Amazon provides SimpleDB as a key/attribute store to help model and store metadata associated with experiments or data. Galaxy fully invested in reproducibility and community involvement within their infrastructure.

  • Constraint removal: avoid constraints that limit innovation and research. Expand your problem space by introducing an easy approach to scaling.

  • Algorithm development: Infrastructure enables algorithms. Nice examples are:
  • GPU instances; b. Crossbow utilizing Hadoop.

  • Collaboration and sharing: data, data uses and multiple users over lots of locations. General idea: moving the compute to the data. Amazon has free inbound transfer; if that’s too slow, also have Import/Export via FedExed hard drives. Can do parallel upload to S3.

  • Funding options: On-demand is the easiest approach, but most costly. Can use reserved capacity to reduce the hourly rates. The spot market lets you bid on capacity and save money; need to architect for interruption.

  • Compliance: shared responsibility — Amazon secures the infrastructure; users secure the instances and data. ISO 27001 and HIPPA compliant. Data mirrored across availability zones, but local data stays local. GovCloud: US only usage.

Some exciting things that are coming soon in genomics. Getting closer to health and patient data: going to require security and data availability, scaling to large numbers of users with elastic pipelines. Important to put patients in charge of their own data.

Practical Cloud & Workflow Orchestration: Chris Dagdigian, The BioTeam

Chris Dagdigian discusses working on the hardware geek side of science with AWS. Three topics: time, laziness and beauty. Getting to the point where automated provisioning changes lag time between wanting to do science and getting the hardware ready to do it. Research infrastructure is 100% scriptable and automatable; be lazy and automate what you do. The beautiful bits are what you can build on top of Amazon infrastructure.

Demo time:

  • CloudInit gives you a hook into freshly booted systems. Don’t need to maintain tons of AMIs; easy way to configure a new system with a YAML configuration file.

  • Amazon CloudFormation allows you to turn on/off a large number of instances. Create an elastic database cluster, webserver cluster and monitoring: all in a JSON input file. The example JSON template is a good place to get started.

  • Opscode Chef enables infrastructure as code. Important that everything is idemopotent so you can run multiple times. Demo with knife, Chef’s commandline tool. Can run ssh code on each node in a cluster, but also do searches with this. With the searches can find certain nodes with properties of interest and run those.

  • MIT StarCluster builds ready to use cluster compute farm on AWS. Especially useful for handling legacy use cases. Slideshare example of running this.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s