Inparablog

A comparative genomics and bioinformatics blog

  • Home
  • About
  • Perl
    • Arrays
    • RegExp
  • Links
  • Photos
  • Contact
RSS
Author Archives: John

If publishing in BMC Bioinformatics is that simple

Posted on February 13, 2008 by John
No comments

Last week I read an article by Fourment and Gillings, A comparison of common programming languages used in bioinformatics [pubmed][doi] in BMC Bioinformatics. It basically is about a comparison of programming languages often used in bioinformatics. They compare Perl, Python, C, C++, C#(.NET) and java. The authors stress that each particular language has advantages for use in different bioinformatic applications. Fine, I can agree with that, but…

The more I think about the article, the more I am vexed by it. Besides kicking down obvious open doors, the results and methods leave many things to be desired. There are no error bars or standard deviations in any of the figures which would have been stupidly easy to do and necessary. All programs were written by the same person with varying experience in Perl, C++ and Java, other languages where learned while writing the programs. I think this is a recipe for disaster. Every language has its peculiarities which can be avoided or used to the fullest only when one has some decent experience with the language. A colleague of mine (who is the resident python expert) classified the blast parser as ‘rather messy’ after one short glance.

I simply can not get my head around the fact that Python parses a 9.6 gig blast output file in 38 minutes while Perl does the same thing in a little more than 7… 38 minutes! 9.6 gig blast output file! I have tried these scripts myself on some blast output (not nearly as large) I had lying around and found huge differences in processing time using the same script and blast file. Also… 9.6 gig! They mention the sequence used to search, but not the database they searched in… How do you end up with a relevant blast search output so large?

I think this article still needs a lot of work to convince me of the numbers they report. I am willing to agree that Python is better than Perl in some things and vice versa, but I have strong opinion with how this study was performed. Although it is nice that someone presents actual numbers and figures about how different languages perform, I do not think it is good enough to be published in BMC Bioinformatics.

Categories: Bioinformatics, Perl | Tags: Python, Review

E-values!

Posted on November 20, 2007 by John
No comments

This actually works!

#!/usr/bin/perl
use warnings;
use strict;
print "1e-1" * "3", "\n";

This certainly does make my life easier with respect to HMMER and Blast E-values!

Categories: Perl | Tags: E-values

BBC 2007: Day 2

Posted on November 14, 2007 by John
No comments

Second day at the BBC in Leuven. Quite some interesting stories that day! One talk was about using structure information to analyze how proteins bind to protein domains which are common in signal transduction. The second speaker actually told us something similar to what I have been doing for my Masters. Though not the same, he did use some of the ideas we also had for comparing protein interaction networks. He gave a link to the pre-published text and I am printing it at this exact same moment!

The keynote speaker M. Madan Babu had a brilliant presentation about the structure, evolution and dynamics of transcription regulation networks. As the first speaker after the break, his microphone stopped working. When that was fixed, the beamer broke down. He could still laugh about it though. When he could finally continue he told us about regulation motives in yeast. When analyzing the network they found a limited set of motives which were predominant in the network. These motives were analyzed in an evolutionary context by looking at duplications of the transcription factors and their target genes. Only in rare instances were these motives explained by duplications, which was counter intuitive. Also an a priori assumption in which transcriptional “hubs” should control relatively more duplicated genes was found not to be the case. They did find enrichment for some types of motives in specific processes such as DNA replication and sporulation. Feed forward loops for example are enriched in slow processes.

When looking at chromosomal localization of target genes and transcription factors they found a clear preference for target genes to be concentrated in one or at most two chromosomes. Even within the chromosomes target genes display regional preferences or avoidance. This mapping of preferences could help for optimizing expression of exogenous genes regulated by endogenous transcription factors.

The following talks included evolution of chromalveolates, which was very interesting, as well as a talk about MANTiS, which is an orthology database which is supposed to go on line in December. Instead of Inparanoid or Bi-directional best hits they use phylogenetic trees, which of course is much better. Instead of general orthology it can infer orthologs vs. paralogs and in-paralogs vs. out-paralogs. This depends on the quality of the trees used, and how the gene families have been determined, but it is good to do this so others can use it.

Wrap up: After a bad start with the poster session on Monday, the BBC took off with some very interesting talks. I especially liked the keynote talk by Madan Babu. I’ve noticed that a lot of research presented in the talks and especially on the posters, involved making bioinformatic tools for biologists who will not use them. I am a bit pessimistic in this I know, but as a molecular biologist myself I can only wonder.

Categories: Bioinformatics | Tags: BBC, Conference

BBC 2007: Day 1

Posted on November 12, 2007 by John
No comments

First day at the BBC in Leuven. My low expectations about the organization were confirmed, but the talks were good and that´s what counts. First was one of the keynote speakers: Charles Lawrence. He works on RNA secondary structure prediction. He warned us about maximizing likelyhoods, free energies etc. because they might not represent the actual population of RNA structures in a sample. Eventhough a MFE is the most optimal structure one must not forget entropy. This was also true for sequence alignments so it is definately something for me to think about. He recommended an article to read: Miyazawa et al. Prot. Eng. 1994.

The next couple of talks were about Micro Array data and (transcription) networks, of which my knowledge is limited. The bottom line of most of these talks were “This is how it is normally done, but our algorithm is better“. This is good I suppose, but the last session of the day was most interesting though.

The last session had many interesting talks, but the third one stood out: Victor Guryev of the Hubrecht lab at Utrecht showed us high CNV´s within lab rat strains, thereby showing high variation within species. He could identify CNV´s by finding regions were the amount of coverage by WGS mapping was two fold or more higher than average. The regions of these CNV´s seem to be conserved in human. Great stuff!!! I´ll keep my eyes open for the publication.

Categories: Bioinformatics | Tags: BBC, Conference

Benelux Bioinformatics Conference 2007

Posted on November 12, 2007 by John
No comments

Today is the first day of the Benelux Bioinformatics Conference at Leuven. I have been here since saturday so I`m sort of getting used to the French keyboard layout :) . Leuven is a beautifull little city with lots of old buildings and good food, so I can recommend visiting. Anyway, I have to be off to the conference. I`ll try to make a post tonight about the first day.

Categories: Bioinformatics | Tags: BBC, Conference

What I do: building gene trees

Posted on June 27, 2007 by John
1 comment

Since October last year I’ve been working as a PhD student at the Theoretical Biology group of the Faculty of Science at Utrecht University. I actually work for the Physiological Chemistry group of Prof. dr. Bos at the Academic Medical Centre, but that’s another story… Below I will explain a bit about what I am doing with my current project. I will try to keep it as uncomplicated as possible.

My project involves studying the evolution of signaling pathways in Eukaryotes and trying to understand specifically the emergence of new signaling pathways. Signaling pathways are a chain of events in the cell, carried out by proteins, which have evolved to ‘let the cell know’ what happens outside of the cell so it can react accordingly.

The observation on which my project is based is the fact that the complex eukaryotes (like us) tend to have had many duplications of key proteins which have gained their own function and regulate different processes. My job, in short, is to find out approximately when, why and how this happened for some specific protein families.

Read more …

Categories: Biology | Tags: PhD
Next Entries
  • Search

  • The author

    Gravatar My name is John van Dam and I am a Post-Doc at St. Radboud University Medical Center (NL). My research involves bioinformatics and comparative genomics on cilia and signal transduction pathways.
  • About me

    • LinkedIn Profile
    • Mendeley Profile
    • Research Blogging Profile
  • Bioinformatics Blogs

    • Bioinformatics
    • Bioinformatics Zen
    • Fisheye Perspective
    • nodalpoint
    • Omics! Omics!
    • Public Rambling
    • The Tree of Life
    • What You’re Doing Is Rather Desperate
    • YOKOFAKUN
  • Perl

    • Beginning Perl
    • Bio::Perl
    • PerlMonks
  • Tags

    Backreferences BBC Conference Cordyceps E-values Fungus Hardware Homology Insects Lightning Mac OS Meiosis Office paradox permalinks PhD Phylogenetic tree phylogeny Python Quadrupel radio Regexp Regular Expressions research Review software Thunder Trappist tree Upgrade Weather Westvleteren Westvleteren 12 Wordpress Youtube
  • Copyright notice

    Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
    Creative Commons Licentie
  • Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
© Inparablog. Proudly Powered by WordPress | Nest Theme by YChong