Links: R code, clustering gotchas, visualization example and RNAi screening

Kicking off this roundup of blog posts of interest from the last few weeks, Pietro discusses how to improve the writing of technical blog posts: be concise, be interesting to start off posts, and present your information as a story. Lots of nice examples from popular technical blogs.

  • R code examples are always interesting. Jeremy provides code to trim fastq sequencing reads using the Bioconductor ShortRead package. If you’re a self taught R coder like myself, Chris’ summary of the R type system will be useful to detangle vectors and lists in your mind. He also describes how to use the reshape package to do pivot tables in R, transposing two category data from a table into a matrix of values.

  • As a reminder that you should always be rethinking your data sources and analysis methods, Lars digs into an issue clustering proteins using Markov clustering and discovers a case where unconnected nodes are clustered together due to having many shared edges. Following up, he proposes a fix and demonstrates the issue in the wild in both ortholog and protein complex datasets.

  • Visualization and analysis examples are always good sources of inspiration for future projects. FlowingData has a visualization challenge to take a crossing line graph and make it easier to read; my favorite from the comments separated the graphs and provided a reference line. Another nice source of charts and comparisons is Juice Analytics analysis of survery results.

  • Rajarshi’s presentation on high throughput RNAi screens at the NIH is a great resource of techniques, approaches and high level questions. It is well worth reviewing both for the thoughtful approach and for tips and tricks.

  • Nico summarizes visualization tools for large graphs with the goal of dealing with RDF graphs and ontologies. Several of these tools are also useful for phylogenetic taxonomy work.


