BOSC 2012 day 2 pm: Panel discussion on bioinformatics review and open source project updates

Talk notes from the 2012 Bioinformatics Open Source Conference.

Herv?? M??nager: Mobyle Web Framework: New Features

Mobyle provides easy commandline tool integration. Provides a tool-based XML to describe programs that converts into web-based interface. BMPS provides easy pipeline design and execution. Workflow execution can be dynamically reused as a simple form. Provides data versioning with integrated correction: for instance, visualize an alignment in JalView, correct, then save as updated data file. Now that Taverna and Galaxy workflows integrate, it would be great to be able to do the same with Mobyle.

Eric Talevich: Biopython Project Update

Eric talks about Biopython, discussing new and exciting features in the past year. GenomeDiagram provides beautiful graphics of sequences and features. Lots of new format parsing included in Biopython: SeqXML, Abi chromatograms, Phylip relaxed format. Bio.phylo has merged in PAML wrappers and new drawing functionality, plus paper in late review. Now have BGZF support which helps with BAM and Tabix support. Working to support PyPy and Python 3. Bug fixes for PDB via Lenna, who is now a GSoC student that I’m mentoring doing variant work, including with PyVCF.

Hiroyuki Mishima: Biogem, Ruby UCSC API, and BioRuby

BioRuby update on latest work. The community has been working on ways to make being a BioRuby member easy. Original way to contribute is to be a commiter or get patches accepted. To get more people involved, have moved to GitHub to help make it easier to accept pull requests. They’ve also introduced BioGems, a plugin system so that anyone can contribute associated packages. This includes a nice website displaying packages along with popularity metrics to make it easy to identify associated packages. bio-ucsc-api provides ActiveRecord API on top of UCSC tables. The future direction of BioGems will involve more quality control by peer-review, including required documentation and style.

Jeremy Goecks: A Framework for Interactive Visual Analysis of NGS Data using Galaxy

Jeremy talks about the Galaxy visualization framework to make highly interactive visual analysis for NGS datasets. The goal is to integrate visualizations + web tools. Jeremy then bravely launches into a live demo with Trackster. Trackster has dynamic filtering so can use sliders to view based on coverage, FKPM, or other metrics. Integration with Galaxy allows you to re-run tools with alternative parameters based on visualizations. Can create a cool tree of possible parameters than you can set in the Galaxy tool, easily varying selected parameters. This can then be dynamically re-run on a subset of the data letting you re-run and visualize multiple parameters easily. This is an incredibly easy way to find the best settings based on some known regions.

Spencer Bliven: Why Scientists Should Contribute to Wikipedia

New initiative through PLoS Computational Biology called Topic Pages. Why don’t scientists contribute more to Wikipedia? Some identified concerns: perceived inaccuracies, little time for outreach like this and no direct annotation or citation. If you contribute to Wikipedia, you get a citation. Don’t use it to fill up your CV. Topic pages are peer reviewed via Open Review, have a CC-BY license and are similar to a review article. Already have published topic pages and interest in contributing.

Markus Gumbel: scabio – a framework for bioinformatics algorithms in Scala

scabio contains algorithms written in Scala for the bioinformatics domain. Designed as teaching tool for a lecture + lab. Scala combines object oriented and functional paradigms. Akka framework provides concurrent and distributed functionality. Contains lots of teaching code for dynamic programming as a great resource. Easy BioJava 3 integration and reuse of existing libraries. Code available from GitHub

Jianwu Wang: bioKepler: A Comprehensive Bioinformatics Scientific Workflow Module for Distributed Analysis of Large-Scale Biological Data

bioKepler build on top of Kepler, a scientific workflow system. It uses distributed frameworks for parallelization. Plans are to build bioActors for alignment, NGS mapping, gene prediction and more.

Limin Fu: Dao: a novel programming language for bioinformatics

Dao is a new programming language that supports concurrent programming, based on LLVM and easily loads C libraries with Clang. Provides native concurrent iterators, map, apply and find.

Scott Cain: GMOD in the Cloud

Generic Model Organism Database project has a running Cloud instance at https://cloud.gmod.org. Has Chado, Gbrowse, Jbrowse plus sample data. AMI information available from GMOD wiki. Tripal is a Drupal based web interface to Chado.

Ben Temperton: Bioinformatics Testing Consortium: Codebase peer-review to improve robustness of bioinformatic pipelines

Ben kicks off the panel discussion with a short lightning talk about the Bioinformatics Testing Consortium which provides a way to do peer-review on codebases. Idea came from dedicated unit testers, but need a "non-cost" way to do this that fits with current workflows: peer review. Idea is to register a project and have volunteer testers actually test it.

Panel discussion

BOSC wrapped up with a panel discussion centered around ideas to improve reviewing of the bioinformatics components of papers. The panel emerged from an open review discussion between myself and Titus Brown about ideas for establishing a base set of criteria for reviewers of bioinformatics methods. The 5 panelists were Titus Brown, a professor at Michigan State; Iain Hrynaszkiewicz, an open-access publisher with BMC; Hilmar Lapp, an editor at PLoS computational biology; Scott Markel from Accelrys; and Ben Temperton from the Bioinformatics Testing Consortium.

I took these notes while chairing, so apologies if I missed any points. Please send correction and updates.

The main area of focus was around improving the bioinformatics component of papers at the time of review. Titus’ opening slides presented ideas to help improve replicability of results with the key idea being: does the software do what it claims in the paper?

  • Existing communities to connect with
  • When do tests get put in place? Last minute at review time is going to be painful for peopl
    e. There is a lot of hard work involved overall.

    • Difficult to setup VM + replicable
    • Barriers to entry
    • On the other hand, are you doing good science? What is the baseline?
    • How can you help people do this?
    • Learning to develop this way from the start with training courses like Software Carpentry.
    • Can Continuous integration play a role? travis-ci
  • Defining what to do for reviewers
  • Tough question is that editors also must review as well, so job falls on both reviewers and editors. Get a before submission seal before being able to send in for review. This is where the Bioinformatics Testing Consortium could fit in.

  • Start up idea: provide service for testing software
    • Insight journals
    • Could you incentivize for testing? Provide journal credit.
  • Tight relationship between reviews + grants: need to enforce base level of actually having minimum criteria.

  • Provide incentives + credit

Another component of discussion was around openness of reviews:

  • BMC has even split between open and non-open peer review
  • What is the policy for who owns copyright on review?
  • From a testing side, it does need to be open to iterate
  • What effect can this have on your career? Bad reviews for a senior professor.

The final conclusion was to draw up a set of best practice guidelines for reviewers, publish this as a white paper, then move forward with website implementations that help make this process easier for scientists, editors and reviewers. If we as a community can set out what best practice is, and then make it as easy as possible, this should help spread adoption.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s