Day 4 of BioHackathon 2010 is all about discussion and coding. Things settle in as everyone finds their group projects starting to get going. Then the reality hits that you only have two days left to finish your work, and things start moving at a more hectic pace.
Last night Peter and I had the chance to meet up with Michiel, another fellow Biopython coder who lives nearby in Japan. We had an authentic Japanese dinner consisting of some meat; I was unable to identify the animal or organ from which it was derived. More important than the fabulous food is the opportunity to reconnect with old friends and discuss some biology and programming. It’s amazing the things you can get accomplished just by talking through them.
After working up an improved coding interface to build queries for modMine yesterday, today I came back to the BioGateway interface from day 2 and expanded and simplified making queries. A nice discussion with Erick, who put together BioGateway, helped me understand some additional items that could be put into the queries.
Here is our query from day 2, redone and simplified. The query looks for human proteins that are:
- Involved in generating or regulating insulin response.
- Implicated in causing diabetes.
We retrieve the name of the proteins, along with the associated gene, known interacting proteins, and a Gene Ontology (GO) description of the protein function.
Here is a second query that looks at a different type of retrieval task. With a known protein, what papers should I look at to start understanding it’s function? The following query searches for your known protein name and returns references to the primary literature in PubMed.
To further automate this, the journal IDs can be used to automate retrieval of the paper details using Biopython and the Entrez interface: