The bigger your phylogenetic tree, the bigger your headache! And I’m not just talking about the huge amount of time it will take to calculate the alignments and the actual tree using PhyloBayes or PhyML. Interpretation will become near impossible. ‘Simplify’ is the magic word here, I think, but at some point you have to look at your phylogenetic tree as a whole…
I recently teamed up with a collegue to work together on a project because our individual research projects crossed paths. We devided work and each picked a gene family to work on. The resulting phylogenetic trees are depicted below.
Sometimes you can not beat good old fashioned paper
.
I’m using Dendroscope as my prefered tree viewer, but I don’t know if there is anything better out there for viewing large phylogenies. If somebody can recomment me something, please drop me a comment!
September 16th, 2008
John
Right! Bioinformaticians are a industrious lot. Writing all kinds of software to make life easier for other scientists… But why do some write their software to work on only one OS? I know JAVA is not everything, but it works! And most scripting languages you can install and run on nearly every OS (Perl, Python, etc).
I found this article in my email and at first sight it looked nice and useful to me, but I’m not even gonna try to read the article and try the software… Because I don’t have a Mac…
Should I get a Mac?
My very first article is out!
van Dam, T.J.P., Snel, B. (2008). Protein Complex Evolution Does Not Involve Extensive Network Rewiring. PLoS Computational Biology, 4(7), e1000132. DOI: 10.1371/journal.pcbi.1000132
This is the author summary:
Protein complexes are a pivotal part of the functioning of cells in health and disease. Studying the evolution of these essential cellular features is of great intrinsic as well as practical interest. However, the study of the evolution of protein complexes by comparative analysis is fraught with difficulties. Hence current reports that reveal low overlap in the interactome between species are often reluctant to equate this low level of overlap to a low level of conservation. Here we exploit new public data sets, which display unparalleled coverage, to study the amount of co-complex membership conservation, and we present a novel measure for the absence of interactions. We thereby observe a hitherto unreported high level of conservation of 90% of the interactions when the presence of the genes coding for the protein pairs that participate in the same protein complex is also conserved. This allows for new insights into the evolution of protein complexes: the evolutionary dynamics of protein complexes are, by and large, not the result of network rewiring (i.e. acquisition or loss of co-complex memberships), but mainly due to genomic acquisition or loss of genes coding for subunits.
I thought I had them all, but I was wrong. When is something not sharing homology and when do you just fail to detect? I think this is still a big problem in bioinformatics… Bitscores or E-values? Which method? Blast, PSI-Blast, HMMER? Argggh!!!!
Last week I read an article by Fourment and Gillings, A comparison of common programming languages used in bioinformatics [pubmed][doi] in BMC Bioinformatics. It basically is about a comparison of programming languages often used in bioinformatics. They compare Perl, Python, C, C++, C#(.NET) and java. The authors stress that each particular language has advantages for use in different bioinformatic applications. Fine, I can agree with that, but…
The more I think about the article, the more I am vexed by it. Besides kicking down obvious open doors, the results and methods leave many things to be desired. There are no error bars or standard deviations in any of the figures which would have been stupidly easy to do and necessary. All programs were written by the same person with varying experience in Perl, C++ and Java, other languages where learned while writing the programs. I think this is a recipe for disaster. Every language has its peculiarities which can be avoided or used to the fullest only when one has some decent experience with the language. A colleague of mine (who is the resident python expert) classified the blast parser as ‘rather messy’ after one short glance.
I simply can not get my head around the fact that Python parses a 9.6 gig blast output file in 38 minutes while Perl does the same thing in a little more than 7… 38 minutes! 9.6 gig blast output file! I have tried these scripts myself on some blast output (not nearly as large) I had lying around and found huge differences in processing time using the same script and blast file. Also… 9.6 gig! They mention the sequence used to search, but not the database they searched in… How do you end up with a relevant blast search output so large?
I think this article still needs a lot of work to convince me of the numbers they report. I am willing to agree that Python is better than Perl in some things and vice versa, but I have strong opinion with how this study was performed. Although it is nice that someone presents actual numbers and figures about how different languages perform, I do not think it is good enough to be published in BMC Bioinformatics.