BioHackathon 2010: Day 5 — the final day

BioHackathon 2010 wrapped up today after five excellent days of discussion, coding and community. Sometimes it feels like you hit your stride right as things are wrapping up, which just goes to show how much a great set of people can get done once they are organized and comfortable working together. Or it could be Parkinson’s principle kicking in.

Hackathon summary

After a full day of work, everyone came together in the late afternoon to help summarize the accomplishments this week. Toshiaki kicked us off with a summary of everything that was accomplished during the week, highlighting the community and a nice demo of RDF TogoDB populated with the international liquor selection assembled for evening discussion time.

Below are summaries for the various projects and groups. For more details check out:

OpenBio

If you’ve been reading these updates, you have a good idea of what I presented for the Biopython summary of our work: interfaces to query BioGateway and InterMine. Raoul following this with a description of the BioRuby work. They were working with us to develop a similar API for accessing SPARQL endpoints at BioGateway and Bio2RDF. Thomas reported on the state of RDF support in Perl, recommending RDF:Trine and providing some working code on GitHub.

G-language

The G-language team presented an awesome javascript tool called cube which allows selection of web text and calling out to external services. We also got a demo of a Japanese video game circa 1986 which inspired the cube interface.

Visualization

Andrea and Kei discussed their work with Cytoscape to handle Semantic Web formats like RDF. It can access triple stores and load RDF data directly from it, and query against SPARQL endpoints. See RDFScape for more details.

Text mining

Alberto discussed the work on semantic text mining with Heiko and folks from Reflect to get results as triplets. This was also done with Whatzit which recognizes items in biological text and makes them available as RDF.

Semantic Data Exchange

Gos talked about the discussion by data provider folks in improving semantics so that you can readily combine results between multiple sources of data. You can see the full notes on this meeting. There were a few different levels of interoperability considered:

  • File formats
  • Specifying locations (1-based versus 0-based; chromosome names)
  • Controlling the namespaces of columns in tabular data.
  • Specification of genome versions and annotations

Taxonomy

Christian spoke about uses of RDF in Taxonomy. Use cases are biodiversity informatics, metagnomics. One big question: how to deal with uncertainty in biological information?

Converting to RDF from other formats

Pierre spoke about transforming XML resources to RDF using XSLT using an Ontology. Specifically transforming Genotype data on NCBI to RDF with xsltproc. A second example was converting large XML files from dbSNP to RDF. This is a 2 step process: parse with java into parts and then use XSLT to convert each part to RDF.

SADI

Mark gave an overview of work on SADI at the hackathon. He started with a bit of evangelism to encourage people to change their thinking to help with adopting RDF across multiple providers. SADI was added to provide additional tools to access it from Perl and Java. It was also integrated with Taverna support. Now older WSDL frameworks can be converted into SADI services without any coding.

DDBJ-PDB-KEGG

Akira spoke about work this week integrating DDBJ, PDB and KEGG using RDF. The initial plan is to convert the data to RDF tables, and then link between tables. This was demonstrated with KEGG pathway to PDB queries. By transferring this to a powerful server, huge RDF stores like KEGG are accessible with rapid queries.

UniProt

Jerven described the work on UniProt RDF this week. Using Pellet, they compared their current RDF output to the OWL description file. The consistency was improved greatly this week, improving the downstream applicability of RDF from UniProt.

Bio2RDF

Francois presented the work on generalizing Bio2RDF to multiple providers, and several decisions make at the Hackathon on using RDF: one important decision was polite URIs letting us how to name things.

Code

I’m planning to get together a fully buttoned up post for Blue Collar Bioinformatics on the libraries for anyone who has been following along and is interested in using the libraries. The code is available on GitHub:

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s