Galaxy Community Conference 2012, notes from day 2

These are my notes from day 2 of the 2012 Galaxy Community Conference.

Ira Cooke: Proteomics tools for Galaxy

Goal is to develop pipelines and interactive visualizations for proteomics work. Awesome tools that provide raw data + pipeline run as part of a visualization built into Galaxy. Connects to raw spectrum data from higher level summary reports. On a higher level, trying to integrate Proteomic and Genomic approaches inside Galaxy. Available from two bitbucket repositories: protk and Protvis.

James Ireland: Using Galaxy for Molecular Assay Design

James works at 5AM solutions, where they’ve been using Galaxy for a year. He’s working on molecular assay design: identifying oligos to detect or quantify molecular targets. Need to design short assays avoiding known locations of genomic variation. Developed a Galaxy workflow for assay design, including wrapping of primer3, prioritizing and filtering of designed primers, examination of secondary structure.

Richard LeDuc: The National Center for Genome Analysis Support and Galaxy

NCGAS provides large memory clusters, bioinformatics consulting. You can access infrastructure if you have NFS funding. They provide a virtual machine hosting Galaxy on top of cluster infrastructure. The VM approach allows them to spin up Galaxy instances for particular labs. Underlying infrastructure is Lustre filesystem. Do custom work on libraries: helped improved Trinity resource usage.

Liram Vardi: Window2Galaxy ??? Enabling Linux-Windows Hybrid Workflows

Provide hybrid galaxy workflow with steps done on linux and windows: transparent to the user. Works by creating an interaction between Linux and Windows VMs using Sambda and a VM shared directory. Works by using Windows2Galaxy command in Galaxy tool which does all of the wrapping.

David van Enckevort: NBIC Galaxy to Strengthen the Bioinformatics Community in the Netherlands

NCIB BioAssist provides bioinformatics assistance to help with analysis of biological data. Galaxy used for training, collaboration and sharing of developed tools and pipelines. Also used to deal with reproducible research workflows for publications. Provide a NBIC public instance and moving to a cloud Galaxy VM plus Galaxy module repository.

Ted Liefeld: GenomeSpace

GenomeSpace aims to make it easier to do integrative analysis with multiple datasets and tools. Facilitates connections between tools: Cytoscape, Galaxy, GenePattern, IGV, Genomica, UCSC. Provides an online filesystem for data, importing and exporting to data.

Greg von Kuster: Tool Shed and Changes to Galaxy Distributions

Galaxy Tool Shed improvements to integrate closer with Galaxy. Galaxy now provides ability to install tools directly from the user interface. Kicks into live demo mode: when importing workflows it will tell you missing tools that require installation from tool sheds. Tools handle custom datatypes. Allow removal of tools through user interface. Can install dependencies directly. Incredibly awesome automation and interaction improvements for managing tools. External dependencies linked with exact versions for full reproducibility.

Larry Helseth: Customizing Galaxy for a Hospital Environment

Larry describes a use case in a HIPAA environment: locked down internet and corporate browser standards. Bonuses are solid IT and resources. Exome sequence analysis work: annotation with SeattleSeq and Annovar. Everything requires validation before full clinical use.

Nate Coraor: Galaxy Object Store

Galaxy can access object stores like S3 and iRODS using plugin architecture. Extracted out access of data to not be directly on files, but rather through high level accessor methods. This lets you have complete flexibility for storage, managing where data is behind the scenes. This lets you push data to compute resources: so you could store on S3 and compute directly on Amazon.

Jaimie Frey: Galaxy and Condor integration

Wrote Galaxy module to run tasks on Condor clusters. Checked into galaxy-central as of yesterday. Use Parrot virtual filesystem to manage disk I/O to analysis machines.

Brian Ondov: Krona

Krona displays hierarchical data as zoomable pie charts. Has a tool in tool shed and can interact with charts directly in Galaxy.

Clare Sloggett: Reusable, scalable workflows

Usage example: cuffdiff analysis for large number of inputs. How can you readily do this without a lot of clicking on different workflows? Current approach: write a script with the API, but not a great way to do this through the user interface currently. John steps up, ready to work on the problem.

John Chilton: Galaxy VM Launcher

Built a Galaxy workflow for clinical variant detection. One concern about CloudMan was storage costs: CloudMan depends heavily on EBS but you can save money by using the local instance store. Galaxy VM Launcher configures tools, genome indices, users and upload data all from commandline. Awesome.

Pratik Jagtap: Galaxy-P

Galaxy-P works with Galaxy for proteomics. Proteomics work is super popular at this year’s GCC. Trying to tie together lots of discussions today: windows access from Galaxy, visualization, and push to cloud resources.

Geir Kjetil Sandve: The Genomic Hyperbrowser: Prove Your Point

Genomic Hyperbrowser provides custom Galaxy with 100 built in statistical analyses for biological datasets. Provides top level questions, using the correct statistical test under the covers. Provides nice output with simplified and detailed answers along with full set of tests used.

Bj??rn Gr??ning: ChemicalToolBoX

Provides a Galaxy instance for Cheminformatics: drug design work. Tools allow drawing of structures, upload into Galaxy. Wrapped lots of tools for chemical properties, structure search, compound plotting and molecular modification.

Breakout: Automation Strategies for Data, Tools, & Config

During the Galaxy breakout sessions, I joined folks who’ve been working on strategies to automate post-Galaxy tool and data installation. The goal was to consolidate implementations that install reference data, update Galaxy location files, and eventually install tools and software. The overall goal is to make moving to a production Galaxy instance as easy as getting up and running using ‘sh run.sh.’

The work plan moving forward is:

  • Community members will look at building tools that include dependencies and sort out any issues that might arise with independent dependency installation scripts through Fabric.

  • Galaxy team is working on exposing tool installation and data installation scripts through the API to allow access through automated script
    s. The current data installation code is in the Galaxy tree.

  • Community is going to work on consolidating preparation of pre-Galaxy base machines using the CloudBioLinux framework. The short term goal is to use CloudBioLinux flavors to generalize existing scripts. Longer term, we will explore moving to a framework like Chef that handles high level configuration details.

It was great to bring all these projects together and I’m looking forward to building well supported approaches to automating the full Galaxy installation process.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s