Bioinformatics Open Source Conference (BOSC) 2010: Day 2 afternoon

BOSC 2010 sadly wrapped up on Saturday afternoon after a great two days of talks, discussion and planning. Here are my notes from the afternoon sessions.

Simon Mercer — Microsoft Biology Framework

Simon will be presenting information about the Microsoft Biology Foundation and their new 1.0 release. Microsoft External Research brokers relationships between academic communities and Microsoft researchers. This collaboration process involves the development of reusable software that is often made available. Examples include the Ontology Add-in for Word, NodeXL that visualizes networks, 3D molecular viewer for PDB, Trident scientific workflow workbench that provides an interactive and commandline environment for developing workflows.

Goal was to develop together these and other collaborative tools within Microsoft into a framework: Microsoft Biology Foundation. This is reusable tools designed for the .NET platform. Looks like lots of useful stuff: standard representations, file parsing IO, algorithms and web services.

Clickframes — Clickframes: rapid, validated development for clinical informatics

William is from Children’s Hospital in Boston and Beacon 16 software. Clickframes provides a robust software modeling schema for MVC display, database access, user authentication: all of the nasty bits. Written in Java. Idea is to avoid large product requirement documents and take care of both modeling data, and generate code for some of the nasty details. XML based language that folks can write their actual specifications in. Specs turn into interactive web based previews. XML also generates a flow diagram of the application. Tests are automatically generated in Selenium. Really saves a lot of the have to do development things to help focus on the interesting parts.

Morris Swertz — molgenis: database at the push of a button

Molgenis provides models of the biology and tries to autogenerate the background bits. Models are specified in a domain specific language that produces code and magic. It’s Java based and has a XML language to specify what you want and are doing. Plugins can be used to add in java code to handle specific tasks. Generates java classes, tests, SQL and everything for web development on Tomcat. It has a nice interface to R which allows to retrieve data directly from the web form, uses a REST interface. Provides an RDF SPARQL query interface. Reuses models and tools from Galaxy under the covers for sharing.

Alexandros Kanterakis — MOLGENESIS and MAGE-TAB for microarrays

Idea is to use MOLGENESIS to build a database for microarray and GWAS analaysis: want to combine genotypic and phenotypic information for eQTL analysis. Data is stored in MAGE-TAB which provides a tab oriented form of microarray information. MAGE was translated into the MOLGENESIS XML data model. Used MOLGENESIS to produce a web based system for managing the database. Lots of endorsements for using MAGE-ML to model complicated experiment metadata.

Sebastian Schultheiss — Persistence of bioinformatics web services

Looked at 927 web services to see how many are still available. 17% of the original published services are no longer active. Problematic since your scripts are no longer reproducible and comparable. Over time the publishing policies have become stricter and things do seem to be improving. On average 45% of original services are available and still seem to work with test data. 58% of the services are developed on students who are graduating and moving on, 24% of the folks admitted that are not planning to maintain the service.

Lincoln Stein — Gbrowse2

GMOD provides the infrastructure and tools for model organism databases. Contains standard ontologies, schema, file formats, browsers and editors.

Gbrowse is the web-based genome browser part of GMOD. Image glyphs are configurable in the display which allows user to provide organism specific things like pictures of worms, haplotype displays, time course RNA data.

Version 2.0 contains a lot of AJAX and javascript: dragging, zooming, support for SAM/BAM, BED, GFF, WIG BigWig. Subtracks allow items to be organized into groups of tracks related to interesting top level items.

Behind the scenes, you can render tracks independently. JBrowse is the next generation Gbrowse.

Gary Bader — Cytoscape web

Web based component which provides a scaled down version of Cytoscape. Made up of Flash + Javascript and is client-side only. Full customization is possible, generally it looks like an awesome version of cytoscape functionality on the web. It is more suitable for medium sized networks (less than 2000 elements).

Being used for several different clients: GeneMania, iRefWeb, Pathguide. Webiste features online demos. Uses jQuery for interaction.

Nobuaki Kono — Pathway projector

A genome browser for pathway data in the style of google maps. Lots of google features: browsing, marking points, drawing graphs. This allows manual annotation with the Quikmaps javascript library. Info windows pop up while browsing with links to external resources.

James Morris — Evoker: a visualization tool for genotype intensity data

Genome wide association studies: associated SNP or other data with specific phenotypes, build up p-values based on allele differences hopefully identifying signals that are significantly different. Need good quality control in GWAS to avoid false positives from poor quality DNA, population structure or hidden confounding artifacts.

Evoker provides the visualization components to assess these issues, integrating with large data stores. It’s written in Java with perl helper scripts. Fully interactive for zooming in and out and what not. Provides statistical plots to confirm good genotype calls and identify false positives.

Pavel Tomancak — Fiji is just ImageJ

Fiji provides visualization of biological images and is a distribution of ImageJ. Two reasons for the project: first is that it’s needed in the community and has had big uptake, second is that it’s build around biological projects and provides community aspects. Fiji is targetted at Biologists, Bioinformaticians, Software developers and vision researchers. It’s batteries included to target it at Biologists, and includes documentation and tutorials. Includes an API accessible from any JVM language. Code is developed under Git and put an emphasis on communication between developers and users. Developed an image library that allows researchers to write algorithms in DSL and autogenerate into Fiji code. An auto push updater was developed last summer during GSoC.

Iddo Friedberg — IPRStats for visualization of InterProScan results

Use case for IPRScan: deal with the diversity of microorganisms and their health effects. Microbes live in complex communities which is what metagenomics studies. DNA isolated directly from environmental samples and annotating the samples is a problem. One approach is to use InterProScan, and then IPRStats provides visualization of InterProScan results.


Bioinformatics Open Source Conference (BOSC) 2010: Day 2 morning talks

Day 2 of the bioinformatics open source conference (BOSC) kicked off bright and early on Saturday with a very nice discussion from Ross Gardler about building open development communities.

Ross Gardler — Community Development at the Apache Software Foundation

Ross is going to discuss how the community development system at Apache could be useful to open source communities in biology. Apache has 70 active projects and 30+ in development; there are 2500 regular contributors with commit access. The Apache foundation started in 1995 to fix up the UIUC server and became an official organization in 1999 to provide legal protection for members. The mission statement was broad and general: more about a way of doing things in an open manner than about specific projects. Foundation exists to get the legal nastiness and what not out of the way so folks can write code and documentation with minimal resistance. Apache provides indirect financial support. They don’t pay for code but many developers are paid by third-parties to do work on Apache projects.

Apache is a meritocracy, and everyone has a voice and vote. Contribution of value produces merit within projects. If you earn merit in multiple projects then you can earn membership at the foundation level. Consensus is made via debate and code, although occasionally a vote is required via the mailing list. The rule of lazy consensus is that trusted folks can just code away: once you have code you can evaluate it more easily and move forward if everyone agrees.

Growing the foundation from original Apache server to the 70+ projects has been a challenge. Jakarta become developed as a sub-project head underneath Apache which had some failures; modification to the organization was to keep a flat structure without any umbrella projects. This allowed projects to be reviewed by the Apache folks who have lots of experience evaluating development communities. The Apache foundation doesn’t consider technical issues, but rather things like stagnating communities, undue commercial influence and other potential problems.

What are the characteristics of a good Apache project? Diversity — at least 3 committers unrelated to each other outside the project. Full audited code for IP issues which makes the work more palatable to companies who are contributing. Projects should be generic and reusable, so the component parts are available. Idea is that the components can be used outside of your field so you can build a wider community.

How do you scale the community? More projects brings in additional volunteers and doesn’t stress the overhead too much, but creates the potential for dilution of the Apache foundation values and brand. The flat structure gives power to new members since there are low barriers to entry, but this can result in the blind leading the blind. However, hierarchy is inefficient. Peer review is one of the answers to helping the community self-regulate.

Mentoring helps bring new folks into Apache. In the incubator, mentors guide new project teams and teach them the apache way. Google summer of code brings in some community members, and the Apache mentoring project goes beyond this to provide mentoring on a year round basis.

Summary of lessons: the foundation should handle the brand, infrastructure, and legal aspects of projects. This also allows for cross project community discussions. The project handles technical issues and handling contributors. Lazy consensus is used to avoid management by committee and keep the power in the hands of the people who do things. Need to think how to generalise your project components and get outside of you niche. Excellent things to think about for the biology community where we are used to trying to specialize.

Chris Fields — BioPerl

Chris will talk about current things happening in BioPerl, and then focus on some changes that are happening in the community: making things easier for new users, using modern perl features and dealing with BioPerl being monolithic. BioPerl has been around since 1996 and has impressive number of current and past contributors. Lincoln Stein next-gen tools: Bio-SamTools, Bio-BigFile which are separate CPAN distributions. Gbrowse talk later.

Summer of code happening for the 3rd year. The alignment subsystem is being cleaned up to include the capability to deal with large datasets via indexing and reduced in memory representations.

Moving forward, how can the current code be improved and modularized. To lower the barrier to entry, the BioPerl repository was migrated to GitHub. The monolithic nature of BioPerl makes things very hard to maintain and release. One idea is to make BioPerl a front end installer that adds specific individual packages based on interests and needs. Have an initial prototype using Moose for BioPerl objects, and for BioPerl on Perl 6.

Raoul Bonnal — BioRuby

Overview of Ruby itself: a nice language with object orientation, functional aspects and reflection. BioRuby works with both standard C Ruby and JRuby. Last BioRuby update presentation was 2008 BOSC, and have tons of development including 3 Hackathons and 1 Codefest. New features include support for BioSQL which allows interoperable storage of sequences, PhyloXML support from a GSOC project, Fastq parsing support, NCBI REST access, and TogoWS support.

BioRuby has frequent meetings via mail, skype and IRC. Very strict requirement for tests as they continue to move to an agile programming style. BioRuby has a plugin system with standard naming scheme: bioruby-plugin-NAME. Provide a script interface to download and install plugins.

Peter Rice — EMBOSS

EMBOSS received continued funding last year which allowed new development as opposed to bug fix and maintenance releases over the previous two years. EMBOSS aims at both developers and end users, and is targeted at the commandline. There are over 100 interfaces including Galaxy. New release supports BAM and other new next gen features. 3 open source books are coming out soon, which will lock down much of the library functionality.

Fastq and other parsing was improved by thinking about truncated failure cases and building up a standard set of problem cases. New EMBOSS accesses BioMart and ENSEMBL. New planned are DAS, GMOD and BioSQL. Provide a standard definition format for defining databases; awesome way to avoid re-doing all of the specific process. Other new planned features include improved Ontology support.

Tiago Antao — population genetics in Python

HapMap project develops a haplotypemap of the human genome: 11 populations, 90-180 individuals in each. It contains SNPs, CNVs, genotypes, pedigree info. UCSC known genes are most useful for overlapping with data from HapMap. Python library accesses both HapMap and UCSC with Biopython, matplotlib, GenePop and Entrez data. Ensembl Variation API covers a similar are in Perl.

Structure is SQLite based: remote data is downloaded once and stored and indexed. Interface examples look straightforward to retrieval and querying. Very nice demonstration plots of data with matplotlib.

Laurent Gautier — Bioconductor and Python

Provides a way to natively access libraries implemented in R. Bioconductor is one really useful targets for biologists: tons of open source packages in R. Laurent shows an awesome diagram of the biological data landscape: what Python handles well and what R/Bioconductor handles well. R is heavily statistical while Python is more focused on data processing.

Idea of rpy2 is to bridge the Python and R communities. Community wise, this lets interpreters develop that can share the usefulness of each separate community. Nice example of using edgeR from python to look at differential expression of RNA-seq data.

Eric Talevich — Bio.Phylo package in python

Eric developed a phylogenetics library for Biopython, that makes it easy to explore tree data. There are bunch of phylogenetics formats: New
ick, Nexus, PhyloXML and NeXML.

Eric provides a demo of using PhyloXML to parse a Newick tree, visualize it in multiple ways: text tree, networkx style graphical trees. With PhyloXML you can specify attributes of a tree and annotate it, and then store all this in the XML format. Easy to promote standard Newick to the more representative PhyloXML.

Bioinformatics Open Source Conference (BOSC) 2010: Day 1 afternoon talks

These are my notes on the afternoon talks from BOSC 2010.

Stefen Moeller — Community-driven computational biology with Debian and Taverna

Stefan describes the DebMed initiative to provide Debian/Ubuntu packages for biological programs. How can this be generalized to cloud instances? Taverna provides the ability to general tools as web services and avoid some of the burden of installing packages.

Final idea is shared public data which can be made available on cloud images that would work on Eucalyptus. Really good idea to have generalized data but not sure about technical aspects of providing images across providers.

Darin London — Dealing with the data deluge: what can the robotics community teach us?

Dealing with 50+ cell lines sequenced with multiple ChIP-seq anitbodies. How to best manage this? Next gen data is very heterogenous across time and types of data.

Can we think up any good ideas for dealing with this type of data by looking at things the robotic community has done? Behavior-based robots act via independent modules modeled after biological activity. Systems are fault tolerant since different modules can pick up when others fail to act. Can parallelize this since individual modules act autonomously instead of needing to be serialized.

One useful idea is to predict when problems might happen with running out of disk space or memory based on the system parameters.

Developed a pipelin to generate data for ENCODE. Three times of agents: runner agents, processing agents launched by runner agents, and human agents. The task list is developed in Google spreadsheet. By adding tasks to the spreadsheet, can control the agents. Available as Perl module on CPAN.

Nyasha Chambwe — Goby framework

One issue with scaling and dealing with data is the proliferation of biological file formats. What are the desirable characteristics of file formats: well specified, easy to parse, compression and streaming. Developed new file formats for next gen data with a file format to analyze them.

Goby uses protocol buffers to provide a flexible and efficient mechanism for serializing. The data is defined as a message in a proto file. File is chunked and each region can be gzipped for random access to each region.

Demonstrate a full pipeline for RNA-seq analysis using Goby file formats.

Dana Robinson — BioHDF

Goals are to create a data model to describe data, a store to allow for efficient retrieval, and a toolkit for development.

BioHDF is a database schema in HDF for storing biological data, and a library and C API which are coming, and commandline tools similar to samtools. Reads are stored in a hierarchical manner by reads and alignments. Information stored is: reads, alignments, annotations, clusters of aligned reads, reference sequences and indexes. Additional user specific data can be stored.

One exciting development that is being discussed on the samtools mailing list is switching the underlying representation of BAM to HDF and abstracting it out with a higher C API.

Jens Lichtenberg — Concurrent bioinformatics software for discovering genome-wide patterns

WordSeeker — a tool that does motif discovery: enumerate the word space using suffix and radix trees, score the motifs, cluster them based on word sizes, evaluate conservation analysis using phastCons scores from UCSC, look for biased distribution of motif locations.

Scalable approach is necessary to parallelize the enumeration of all words. Similarly for scoring need to do frequent lookups. Can be scaled via MPI for distributed memory processing or OpenMP for shared memory machines. Presented timing data for analysis on Arabidopsis genome.

Chris Hemmerich — Automated Annotation of NGS Transcriptome Data using ISGA and Ergatis

Ergatis is a workflow management tool for running pipelines. Integrative Services for Genomic Analysis (ISGA) is a biologist’s tool for running and customizing Ergatis pipelines. It provides a graphical interface for setting up a pipeline and customizing input parameters. A specific transcriptome pipeline example is presented.

Mark Wilkinson — SADI

Mark discusses his semantic web solution for pulling together web services to make it easy to ask complex questions. Idea is to support scientific method and discussion where we have opinions and debate: not necessarily 100% about what something means. General notion is to create OWL ontologies that help define expressed hypotheses.

Aravind Venkatesan — Bio-Ontologies in Galaxy

ONTO-Toolkit is a collection of tools to manage ontologies represented in the OBO file format. Wraps ONTO-PERL which provides a high level API for querying ontologies. Two use cases:

  • Investigate the similarities between two different molecular functions. Look upstream of both and see how many of their ancestor terms are shared. Most specific common term can be used to assess this.

  • Identify overlapping annotations for a given pair of distinct biological process terms. Look for overlap between two distinct biological processes.

Christian Zmasek — Connecting TOPSAN to computational analysis

The Open Protein Structure Annotation Network (TOPSAN). Structures are available in PDB but very little annotation about them beyond the PDB titles. So TOPSAN provides a database for community annotation of proteins.

Most annotations entered by humans, but can also provide structured data in a simple format TOPSAN Protein Syntax (TPS). This is a RDF triple of protein, predicte (homologous, encodedbj, citation, memberof) and the value

Jianjiong Gao — Musite: Global Prediction of General and Kinase-Specific Phosphorylation Sites

Musite is an open source tool for protein phosphorylation prediction. Disordered regions typically have phosphorylation regions, so may also be useful for evaluating protein disorder.

Bioinformatics Open Source Conference (BOSC) 2010: Day 1 morning talks

The Bioinformatics Open Source Conference (BOSC 2010) is taking place in Boston on Friday, July 9th and Saturday, July 10th. It focuses around open source software for biology, and is a technical conference for folks in the data trenches. These are my notes from the talks and discussions on the first morning.

Guy Coates — Clouds: all fluff and no substance

Guy is from Sanger Institute and starts with an overview of the amazing infrastructure they have for churning out sequencing data. They’ve been experiencing exponential growth in both storage and compute. Moore’s law is too slow for the increase in computational needs.

So a natural area to explore is on-demand Cloud Computing. Where is cloud computing on the hype cycle?

3 use cases to explore:

  • Web presence
  • HPC workload
  • Data warehousing

As an example, consider Ensembl. The web presence has 10k unique visitors a day and 126k page views. The HPC workload is automated analysis of 51 vertebrate genomes.

Approach to improving website reactivity in two areas. First was improving website code and caching to avoid large page loads. The second was adding a US mirror in California which takes about 1/3 of web traffic. This was a traditional mirror in a co-location facility. A US east mirror was built on Amazon web services. Tuning was necessary for cloud infrastructure, especially for the 1Tb Ensembl database.

How does the cloud web presence compare to traditional co-location? Having no physical hardware saved for start up time and management infrastructure. Virtual machines provide free hardware upgrades so don’t have to sweat the 3 year cycle of hardware obsolescence.

Is the cloud cost effective? Need to consider your comparison. How much would it cost to do locally? How many times do you need to run it? But also consider total cost of operation of a server, including power, cooling and admin. Comparing to a co-location facility: $120k for hardware + $51k per year is $91k per year for a 3 year lifeycle. Amazon is $77k/year, so is cost effective.

Additional benefits of the cloud: the website and cloud is packaged together so there is a ready to go Amazon image with everything including data in a Amazon public dataset. Some open questions are how scale out on Amazon will compare relative to experience at Ensembl and co-location facilities.

Some lessons learned in generating the mirror: more time than expected to move code outside of Ensembl. Overall happy with Amazon and plan to consider Amazon’s far-east servers. Virtual servers can also be useful for other Sanger services. In terms of the hype cycle we are in the plateau of boring ol’ usefulness, which is good.

Second use case for Amazon is Ensembl’s compute pipeline for gene calling and annotation. Code base is object oriented perl running core algorithms in C with over 200 external binaries called out. Workflow is embarrassingly parallel and IO heavy. Takes ~500 CPU days for a moderate sized genome and need a high performance file system underneath. Wanted to explore how difficult it would be to move some of this to the cloud for coping with increases in data. Other important thing is democratizing the pipeline so others can use it with ready to go AMIs on Amazon.

Did not end up working well. Porting the queuing system to Amazon was difficult with LSF/SGE queuing systems due to LSF licensing and fiddling. Moving data to the cloud was very difficult; if you look at most cloud success stories they are not big data applications. Transfer speeds across the network are too slow and difficult to get a handle on what exactly the bottlenecks are. For physics they develop dedicated pipelines to deal with this problem, but biology collaborations are not conducive to this. Within the cloud there are no global filesystems since NFS is not so hot and EC2 inter-node networking is not great.

Why not S3/hadoop/map-reduce? A lot of code expecteds file on a filesystem, not S3. Many existing applications would need to be re-written. How do you manage both hadoop and standard apps. Barrier to entry higher for biologists. So on HPC the cloud hype cycle is still in the trough of disillusionment currently.

The problem with current data archives is that are centralized in a location where you can put/get the data, but not compute on it. Big question: is data in an archive like this really useful? Example use case: 100Tb of data in short read archive. Estimate 3 months to pull down the data and get it ready. Can the cloud help? Well, it sounds good to move the CPU to the data but how do you expose the data? How do you organize the data to make it compute efficient? In terms of funding, whose cloud do we use? Most resources are funded to provide data but not compute: implies a commercial solution like Amazon makes sense. This does solve networking problems, but would need to invest in high speed links at Amazon. In terms of hype cycle, computable archives are still in the peak of inflated expectations: don’t know how this will turn out in practice.

Ron Taylor — Overview of Hadoop/MapReduce/HBase

Ron is presenting a general overview of Hadoop and MapReduce to frame the afternoon talks. Hadoop is a java software framework designed to handle very large datasets and simplifies the development of large-scale fault-tolerant distributed apps on clusters of commodity machines. Data is replicated and single point of failure is only the head node.

MapReduce divides program execution into a map and reduce step, separated by data transfer between nodes. It’s a functional approach that aggregates/reduces data based on key/value pairs. Can fit a large number of tasks into this framework.

Hbase adds random real-time read/write access to data stored in a distributed column-oriented db. Used as input and output for MapReduce jobs. Data is stored in tables ala a relational database. Data at each row and column is versioned. Flexible modification of columns allows modifying the data model on the fly.

Pig provides a high level data language that is designed for batch processing of data within Hadoop.

Hive is a data warehouse infrastructure build on top of hadoop that models a relational database with rows/columns and an SQL-like query language.

Cascading is a java library that sits on top of Hadoop MapReduce that operates at a higher level.

Mahout builds scalable machine learning libraries on top of Hadoop.

Amazon EC2 provides Hadoop as a first class service. Some examples of Bioinformatics projects on EC2: Crossbow which we’ll hear about later.

Matt Hanna — The Genome Analysis Toolkit (GATK)

Matt will discuss the Broad’s toolkit for next gen sequencing analysis. Idea is that dataset size greatly increases the complexity of analysis; how can this be abstracted to deal with common problems associated with big datasets.

There are multiple ways to apply MapReduce to a next-gen sequencing pipeline. GATK provides a traversal infrastructure to simplify. Data is sharded into small chunks that can be processed independently. This data is then streamed based on groupings; for instance by gene loci. Can process either serially or in parallel.

At a high level an end user will write a walker that deals with the underlying libraries. Standard traversals are available to use, like traverse by loci.

An example is a simple bayesian genotyper that calls bases at each position in a reference genome based on reads: SNP calling. Walker needs to be written that specifies the data access pattern and commandline arguments. Second step is to write a reduce function that filters pileup on demand. These are then output ba
sed on the filters. Throwing processors at a job improves performance nearly linearly, but very few changes need to happen in the actual code. Works best on CPU, not IO, bound processes.

Brian O’Connor — SeqWare Query Engine

SeqWare is designed to standardize the process of making biological queries on data. Made up of a REST interface and HBase backend. Analysis is done with SeqWare pipeline. Web Service is accessed with a REST XML client API, or can be presented as HTML forms.

The back end needed to support a rich level of annotation on objects, support large variant databases, be distributed across a cluster, support querying, and scale to crazy data growth.

HBase focuses on random access to data based on column oriented tables. Flexibility allows storage of arbitrary data types. Keys are made up of munged together string data. Data access is through both the HBase API and with MapReduce custom queries. Since multiple genomes are stored in the same table, this makes it easy to run MapReduce jobs. Performance compares favorably to Berkeley DB can does really good for retrieval. Katta (distributed Lucene) might help with queries.

Judy Qiu — Cloud Technologies Applications

Starts off talking about the data deluge and how it can be addressed with cloud technologies. Looking at public health data, PubChem and sequence alignments. Trying to evaluate the cloud to hide the complexity of dealing with infrastructure. Microsoft’s DryadLINQ is comparable to Hadoop for parallelization. Amazon costs are similar to local cluster costs of data processing in tests using CAP3 assembly.

Twister is an open source iterated MapReduce infrastructure for simplifying data mining analysis tasks.

Ben Langmead — Cloud scale genomics: examples and lessons

Crossbow is cloud enabled softward for genotyping. It splits alignments using Bowtie, aggregates them into pileups, and then calls SNPs. Parallelized by reads, genomic bins help with reduction using Hadoop.

Myrna is a cloud approach for looking a differential expression using sequencing of transcriptional data. Reads are aligned in parallel, then aggregated into genomic bins. Bins are normalized based on total reads and count data are re-aggregated into a set of p-values for expression.

Architecture of Crossbow: cloud driver script prepares a pipeline, wrapper runs bowtie, wrapper runs soapsnp, then postprocess. Hadoop ties the parts together. To run this on non-cloud architecture, you can write a Hadoop driver structure. The third mode is non-Hadoop which is implemented in Perl.

Enis Afgan

Enis describes his work deploying Galaxy on the cloud. My coverage of this is not going to be great since I heard this talk earlier at the Galaxy developer’s conference. See the previous summary.