Bioinformatics Open Source Conference 2013, day 1 afternoon: visualization and project updates

I’m in Berlin at the 2013 Bioinformatics Open Source Conference. The conference focuses on tools and approaches to openly developed community software to support scientific research. These are my notes from the day 1 afternoon session focused on Open Science.

Previous notes:

Visualization

Refinery Platform – Integrating Visualization and Analysis of Large-Scale Biological Data

Nils Gehlenborg

The Refinery Platform provides an approach to manage and visualize data pipelines. TCGA: 10,000 patients, with mRNA, miRNA, methylation, expression, CNVs, variants and clinical parameters. Lots of heterogeneous data, made more extensive after processing. Need an approach to manage long running pipelines with numerous outputs. Want to integrate horizontally across all data types to gain biological insight. Want to integrate vertically across data levels to provide confirmation and troubleshooting. ISA-Tab provides data model for metadata and provenance evaluation. Web interface performs faceted views of all data based on metadata, and visualizations to explore attribute relationships. Underlying workflow engine is Galaxy. Approach is to setup workflows in Galaxy, then make them available in Refinery at a higher level. Uses the Galaxy API by developing custom workflows based on a template for 100s of samples.

Two approaches to visualization in Refinery. The first is file-based visualization: connect to IGV and display raw BAM data long with associated metadata. Galaxy also supports this well, so the hope is to build off of this. The second approach is database-driven visualization that uses an associated Django server to read/write from a simple API server. Can use callbacks also with REST building off TastyPie so quick and easy to develop custom visualizations.

DGE-Vis: Visualisation of RNA-seq data for Differential Gene Expression analysis

David Powell

DGE-vis provides a visualization framework to identify differentially expressed genes from RNA-seq analysis. Provides approaches to handle two comparison differentially expressed list. To generalize to three-comparison compares it creates a Venn diagram and allows selection of each of the subcomponents to inspect individually. Given the limitations of this, they then developed a new approach. David shows a live demo of comparisons between 4 conditions, which identifies changes over the conditions. A heatmap groups conditions based on differential expression similarities. The heatmap is nicely linked to expression differences for each gene and subselection shows you list of genes. All three items link so change in real time as others adjust. Provides integrated pathway maps with colors linked to each experiment, allowing biologists to identify changed genes via pathways. Written with Haskell on the backend, R for analysis, CoffeScript and Javascript using D3 for visualization.

Genomic Visualization Everywhere with Dalliance

Thomas Down

Thomas starts by motivating visualization: humans love to look at things and practically scientists write papers around a story told by the figures. Unfortunately we focus on print/old-school visualizations: what more could we present if they weren’t so static. The Dalliance genome browser provides complete interactivity with each loading of custom files and multiple tracks. Designed to be able to fit into other applications easily so embed into your website. Also meant to be usable in more lightweight contexts: blog posts, slides, journal publications. It’s a fully client side implementation but does need CORS allowed header on remote websites that feed data in.

Robust quality control of Next Generation Sequencing alignment data

Konstantin Okonechnikov

Goal is to avoid common traps in next-generation sequencing data: avoid poor runs and platform/protocol-specific errors. Provides a more user-friendly tool in comparison to FastQC, samtools, Picard and RNA-seQC. Konstantin’s tool is QualiMap. Provides interactive plots inspired by FastQC’s displays, and also does count quality control, transcript coverage and 5’/3′ bias tools for RNA-seq analyses.

Visualizing bacterial sequencing data with GenomeView

Thomas Abeel

GenomeView provides genome browser for interactive, real-time exploration of NGS data. Allows editing and curation of data. Configurability and extensibility through plug-ins. Designed for bacterial genomes so focuses on consensus plus gaps and missing regions. Handles automated mapping between multiple organisms, show annotations across them. Handles 60,000 contigs for partially sequenced genomes, allowing selection by query to trim down to a reasonable number.

Genomics applications in the Cloud with DNANexus

Andrey Kislyuk

DNANexus has an open and comprehensive API to talk to the DNANexus platform. Provides genome browser, circos and other visualization tools. Have a nice set of GitHub repositories including client code for interacting with the API and documentation. StackOverflow clone called DNANexus Answers for question/answer and community interaction.

Open source project updates

BioRuby project updates – power of modularity in the community-based open source development model

Toshiaki Katayama

Toshiaki provides updates on latest developments in the BioRuby community. Important changes in openness during the project: move to GitHub, BioGems system lowers barrier to joining the BioRuby community. Users can publish standalone packages that integrate. Some highlights: bio-gadget, bio-svgenes, bio-synreport, bio-diversity.

Two other associated projects. biointerchange provides RDF converters for GFF3, GVF, Newick and TSV; developed during 2012 and 2013 BioHackathon. The second is basespace-ruby. See the Codefest 2013 report for more details on the project.

Biopython project update

Peter Cock

Peter provides updates on the latest updates from the Biopython community. Involvement with GSoC for the last several years with both NESCent and OpenBio foundation. This has been a great source of new contributors as well as code development. It’s an important way to develop and train new programmers interested in open source and biology. Biopython uses continuous integration with BuildBots and Travis. Tests run on multiple environments: Python versions, Linux, Windows, MacOSX. Next release of Biopython supports Python 3.3 through the 2to3 converter. Long term will write code to be compatible with both. Nice tip from discussion: the six tool for Python 2/3 compatibility checks and a blog post on writing for 2 and 3. Peter describes thoughts on how to make Biopython more modular to encourage experimental contributions that could then make their way into officially support releases later on: trying to balance need for well-tested and documented code with trying to encourage new users.

InterMine – Collaborative Data Mining

Alex Kalderimis

Intermine is a data-integration system and query engine that supplies data analysis tools, graphical web-app components and a REST API. It provides a modular set of parts that you can use to build tools in addition to the pre-packaged solution. The InterMOD consortium organizes all the Intermine installations to make them able to better interact and share data. Recent work: re-write of Intermine javascript tools. Also can use external tools more cleanly: shows nice interaction of jBrowse with Intermine. Working on rebuilding their web interface on top of the more modular approach.

The GenoCAD 2.2 Grammar Editor

Jean Peccoud

Jean argues for the importance of domain specific languages to make it easier to handle specific tasks. Change the language to your problem. Idea behind GenoCAD is to empower end-users to develop their own DSL. Formal grammars are a set of rules describing how to form sentences in the language’s grammar. Start by defining categories mapping to biological parts, follow with the re-writing rules. All of this happens in a graphical drag and drop interface. For parts, they can use BioBricks as inputs.

Improvements and new features in the 7th major release of the Bio-Linux distro

Tim Booth

Bio-Linux is in its 10th year and recently released version 7. Bio-Linux is a set of debian packages and a full bioinformatics linux distribution you can get and live boot from a USB stick. Strong interactions with DebianMed and CloudBioLinux. Working with integration of Galaxy into Debian packages. Large emphasis on teaching and courses with Bio-Linux for learning commandline work.

2 thoughts on “Bioinformatics Open Source Conference 2013, day 1 afternoon: visualization and project updates

  1. Pingback: Bioinformatics Open Source Conference 2013, day 2 morning: Sean Eddy and Software Interoperability | Small Change Bioinformatics

  2. Pingback: Bioinformatics Open Source Conference 2013, day 2 afternoon: cloud computing, translational genomics and funding | Small Change Bioinformatics

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s