I’m in Berlin at the 2013 Bioinformatics Open Source Conference. The conference focuses on tools and approaches to openly developed community software to support scientific research. These are my notes from the day 1 afternoon session focused on Open Science.
Refinery Platform – Integrating Visualization and Analysis of Large-Scale Biological Data
The Refinery Platform provides an approach to manage and visualize data pipelines. TCGA: 10,000 patients, with mRNA, miRNA, methylation, expression, CNVs, variants and clinical parameters. Lots of heterogeneous data, made more extensive after processing. Need an approach to manage long running pipelines with numerous outputs. Want to integrate horizontally across all data types to gain biological insight. Want to integrate vertically across data levels to provide confirmation and troubleshooting. ISA-Tab provides data model for metadata and provenance evaluation. Web interface performs faceted views of all data based on metadata, and visualizations to explore attribute relationships. Underlying workflow engine is Galaxy. Approach is to setup workflows in Galaxy, then make them available in Refinery at a higher level. Uses the Galaxy API by developing custom workflows based on a template for 100s of samples.
Two approaches to visualization in Refinery. The first is file-based visualization: connect to IGV and display raw BAM data long with associated metadata. Galaxy also supports this well, so the hope is to build off of this. The second approach is database-driven visualization that uses an associated Django server to read/write from a simple API server. Can use callbacks also with REST building off TastyPie so quick and easy to develop custom visualizations.
DGE-Vis: Visualisation of RNA-seq data for Differential Gene Expression analysis
Genomic Visualization Everywhere with Dalliance
Thomas starts by motivating visualization: humans love to look at things and practically scientists write papers around a story told by the figures. Unfortunately we focus on print/old-school visualizations: what more could we present if they weren’t so static. The Dalliance genome browser provides complete interactivity with each loading of custom files and multiple tracks. Designed to be able to fit into other applications easily so embed into your website. Also meant to be usable in more lightweight contexts: blog posts, slides, journal publications. It’s a fully client side implementation but does need CORS allowed header on remote websites that feed data in.
Robust quality control of Next Generation Sequencing alignment data
Goal is to avoid common traps in next-generation sequencing data: avoid poor runs and platform/protocol-specific errors. Provides a more user-friendly tool in comparison to FastQC, samtools, Picard and RNA-seQC. Konstantin’s tool is QualiMap. Provides interactive plots inspired by FastQC’s displays, and also does count quality control, transcript coverage and 5’/3′ bias tools for RNA-seq analyses.
Visualizing bacterial sequencing data with GenomeView
GenomeView provides genome browser for interactive, real-time exploration of NGS data. Allows editing and curation of data. Configurability and extensibility through plug-ins. Designed for bacterial genomes so focuses on consensus plus gaps and missing regions. Handles automated mapping between multiple organisms, show annotations across them. Handles 60,000 contigs for partially sequenced genomes, allowing selection by query to trim down to a reasonable number.
Genomics applications in the Cloud with DNANexus
DNANexus has an open and comprehensive API to talk to the DNANexus platform. Provides genome browser, circos and other visualization tools. Have a nice set of GitHub repositories including client code for interacting with the API and documentation. StackOverflow clone called DNANexus Answers for question/answer and community interaction.
Open source project updates
BioRuby project updates – power of modularity in the community-based open source development model
Toshiaki provides updates on latest developments in the BioRuby community. Important changes in openness during the project: move to GitHub, BioGems system lowers barrier to joining the BioRuby community. Users can publish standalone packages that integrate. Some highlights: bio-gadget, bio-svgenes, bio-synreport, bio-diversity.
Two other associated projects. biointerchange provides RDF converters for GFF3, GVF, Newick and TSV; developed during 2012 and 2013 BioHackathon. The second is basespace-ruby. See the Codefest 2013 report for more details on the project.
Biopython project update
Peter provides updates on the latest updates from the Biopython community. Involvement with GSoC for the last several years with both NESCent and OpenBio foundation. This has been a great source of new contributors as well as code development. It’s an important way to develop and train new programmers interested in open source and biology. Biopython uses continuous integration with BuildBots and Travis. Tests run on multiple environments: Python versions, Linux, Windows, MacOSX. Next release of Biopython supports Python 3.3 through the 2to3 converter. Long term will write code to be compatible with both. Nice tip from discussion: the six tool for Python 2/3 compatibility checks and a blog post on writing for 2 and 3. Peter describes thoughts on how to make Biopython more modular to encourage experimental contributions that could then make their way into officially support releases later on: trying to balance need for well-tested and documented code with trying to encourage new users.
InterMine – Collaborative Data Mining
The GenoCAD 2.2 Grammar Editor
Jean argues for the importance of domain specific languages to make it easier to handle specific tasks. Change the language to your problem. Idea behind GenoCAD is to empower end-users to develop their own DSL. Formal grammars are a set of rules describing how to form sentences in the language’s grammar. Start by defining categories mapping to biological parts, follow with the re-writing rules. All of this happens in a graphical drag and drop interface. For parts, they can use BioBricks as inputs.
Improvements and new features in the 7th major release of the Bio-Linux distro
Bio-Linux is in its 10th year and recently released version 7. Bio-Linux is a set of debian packages and a full bioinformatics linux distribution you can get and live boot from a USB stick. Strong interactions with DebianMed and CloudBioLinux. Working with integration of Galaxy into Debian packages. Large emphasis on teaching and courses with Bio-Linux for learning commandline work.