Proseminars at our group

Beabsichtigen Sie einen Vortrag im Rahmen eines Problemseminars zu halten, so melden Sie dies bitte vorher unverbindlich an: Anmeldeformular für Problemseminare

Vortragszeit: 30 min + anschliessende Diskussion

If you plan to give a talk as a proseminar, please register first: Registration-Form Proseminars

Length of talk: 30 min + discussion afterwards

Registered proseminars for the current semester

[ edit ]
osama shahin, Wintersemester 2020/2021

Herr shahin
ich bin Student
und studiere Maste Bioinformatik.

[ edit ]
osama shahin, 03_20.01.2015, Wintersemester 2020/2021

Herr shahin
ich bin Student
und studiere Maste Bioinformatik.

Proseminars given in past semesters

[ edit ]
Nancy Retzlaff, 02_20.01.2015, Wintersemester 2014/2015

Comparison of directed and weighted co-occurrence networks of six languages
"To study commonalities and differences among different languages, we select 100 reports from the documents of the United Nations, each of which was written in Arabic, Chinese, English, French, Russian and Spanish languages, separately. Based on these corpora, we construct 6 weighted and directed word co-occurrence networks. Besides all the networks exhibit scale-free and small-world features, we find several new non-trivial results, including connections among English words are denser, and the expression of English language is more flexible and powerful; the connection way among Spanish words is more stringent and this indicates that the Spanish grammar is more rigorous; values of many statistical parameters of the French and Spanish networks are very approximate and this shows that these two languages share many commonalities; Arabic and Russian words have many varieties, which result in rich types of words and a sparse connection among words; connections among Chinese words obey a more uniform distribution, and one inclines to use the least number of Chinese words to express the same complex information as those in other five languages. This shows that the expression of Chinese language is quite concise. In addition, several topics worth further investigating by the complex network approach have been observed in this study."

[ edit ]
Enrico Macholdt, 02_20.01.2015, Wintersemester 2014/2015

Inferring the soybean (/Glycine max/) microRNA functional network based on target gene network
Motivation: The rapid accumulation of microRNAs (miRNAs) and experimental evidence for miRNA interactions has ushered in a new area of miRNA research that focuses on network more than individual miRNA interaction, which provides a systematic view of the whole microRNome. So it is a challenge to infer miRNA functional interactions on a system-wide level and further draw a miRNA functional network (miRFN). A few studies have focused on the well-studied human species; however, these methods can neither be extended to other non-model organisms nor take fully into account the information embedded in miRNA–target and target–target interactions. Thus, it is important to develop appropriate methods for inferring the miRNA network of non-model species, such as soybean (Glycine max), without such extensive miRNA-phenotype associated data as miRNA-disease associations in human.

Results: Here we propose a new method to measure the functional similarity of miRNAs considering both the site accessibility and the interactive context of target genes in functional gene networks. We further construct the miRFNs of soybean, which is the first study on soybean miRNAs on the network level and the core methods can be easily extended to other species. We found that miRFNs of soybean exhibit a scale-free, small world and modular architecture, with their degrees fit best to power-law and exponential distribution. We also showed that miRNA with high degree tends to interact with those of low degree, which reveals the disassortativity and modularity of miRFNs. Our efforts in this study will be useful to further reveal the soybean miRNA–miRNA and miRNA–gene interactive mechanism on a systematic level.

Availability and implementation: A web tool for information retrieval and analysis of soybean miRFNs and the relevant target functional gene networks can be accessed at SoymiRNet:

[ edit ]
Dan Haeberlein, 01_19.01.2015, Wintersemester 2014/2015

NetComm a network analysis tool based on communicability
Motivation: Set-based network similarity metrics are increasingly
used to productively analyze genome-wide data. Conventional
approaches, such as mean shortest path and clique-based metrics,
have been useful but are not well suited to all applications.
Computational scientists in other disciplines have developed communicability
as a complementary metric. Network communicability considers
all paths of all lengths between two network members. Given
the success of previous network analyses of protein–protein interactions,
we applied the concepts of network communicability to this
problem. Here we show that our communicability implementation has
advantages over traditional approaches. Overall, analyses suggest
network communicability has considerable utility in analysis of largescale
biological networks.

[ edit ]
Steffi Grote, 02_20.01.2015, Wintersemester 2014/2015

Using random walks to identify cancer-associated modules in expression data
Background: The etiology of cancer involves a complex series of genetic and
environmental conditions. To better represent and study the intricate genetics of cancer onset and progression, we construct a network of biological interactions to search for groups of genes that compose cancer-related modules. Three cancer expression datasets are investigated to prioritize genes and interactions associated with cancer outcomes. Using a graph-based approach to search for communities of phenotype-related genes in microarray data, we find modules of genes associated with cancer phenotypes in a weighted interaction network.
Results: We implement Walktrap, a random-walk-based community detection algorithm, to identify biological modules predisposing to tumor growth in 22 hepatocellular carcinoma samples (GSE14520), adenoma development in 32 colorectal cancer samples (GSE8671), and prognosis in 198 breast cancer patients
(GSE7390). For each study, we find the best scoring partitions under a maximum
cluster size of 200 nodes. Significant modules highlight groups of genes that are functionally related to cancer and show promise as therapeutic targets; these include interactions among transcription factors (SPIB, RPS6KA2 and RPS6KA6), cell-cycle regulatory genes (BRSK1, WEE1 and CDC25C), modulators of the cell-cycle and proliferation (CBLC and IRS2) and genes that regulate and participate in the mapkinase pathway (MAPK9, DUSP1, DUSP9, RIPK2). To assess the performance of Walktrap to find genomic modules (Walktrap-GM), we evaluate our results against other tools recently developed to discover disease modules in biological networks. Compared with other highly cited module-finding tools, jActiveModules and Matisse, Walktrap-GM shows strong performance in the discovery of modules enriched with known cancer genes.
Conclusions: These results demonstrate that the Walktrap-GM algorithm identifies
modules significantly enriched with cancer genes, their joint effects and promising candidate genes. The approach performs well when evaluated against similar tools and smaller overall module size allows for more specific functional annotation and facilitates the interpretation of these modules.

[ edit ]
Stefan Leger, 01_19.01.2015, Wintersemester 2014/2015

From Function to Interaction: A New Paradigm for Accurately Predicting Protein Complexes Based on P2P In. NW.
Identification of protein complexes is critical to understand complex formation and protein functions. Recent advances in
high-throughput experiments have provided large data sets of protein-protein interactions (PPIs). Many approaches, based on the
assumption that complexes are dense subgraphs of PPI networks (PINs in short), have been proposed to predict complexes using
graph clustering methods. In this paper, we introduce a novel from-function-to-interaction paradigm for protein complex detection. As
proteins perform biological functions by forming complexes, we first cluster proteins using biology process (BP) annotations from gene
ontology (GO). Then, we map the resulting protein clusters onto a PPI network (PIN in short), extract connected subgraphs consisting
of clustered proteins from the PPI network and expand each connected subgraph with protein nodes that have rich links to the proteins
in the subgraph. Such expanded subgraphs are taken as predicted complexes. We apply the proposed method (called CPredictor) to
two PPI data sets of S. cerevisiae for predicting protein complexes. Experimental results show that CPredictor outperforms the existing
methods. The outstanding precision of CPredictor proves that the from-function-to-interaction paradigm provides a new and effective
way to computational detection of protein complexes.

[ edit ]
Gabor Balogh, 01_19.01.2015, Wintersemester 2014/2015

Mapping Small-World Properties through Development in the Human Brain: Disruption in Schizophrenia
Evidence from imaging studies suggests that the human brain has a small-world network topology that might be disrupted
in certain brain disorders. However, current methodology is based on global graph theory measures, such as clustering, C,
characteristic path length, L, and small-worldness, S, that lack spatial specificity and are insufficient to identify regional brain
abnormalities. Here we propose novel ultra-fast methodology for mapping local properties of brain network topology such
as local C, L and S (lC, lL and lS) in the human brain at 3-mm isotropic resolution from ‘resting-state’ magnetic resonance
imaging data. Test-retest datasets from 40 healthy children/adolescents were used to demonstrate the overall good
reliability of the measures across sessions and computational parameters (intraclass correlation . 0.5 for lC and lL) and their
low variability across subjects (, 29%). Whereas regions with high local functional connectivity density (lFCD; local degree)
in posterior parietal and occipital cortices demonstrated high lC and short lL, subcortical regions (globus pallidus, thalamus,
hippocampus and amygdala), cerebellum (lobes and vermis), cingulum and temporal cortex also had high, lS,
demonstrating stronger small-world topology than other hubs. Children/adolescents had stronger lFCD, higher lC and
longer lL in most cortical regions and thalamus than 74 healthy adults, consistent with pruning of functional connectivity
during maturation. In contrast, lFCD, lC and lL were weaker in thalamus and midbrain, and lL was shorter in frontal cortical
regions and cerebellum for 69 schizophrenia patients than for 74 healthy controls, suggesting exaggerated pruning of
connectivity in schizophrenia. Follow up correlation analyses for seeds in thalamus and midbrain uncovered lower positive
connectivity of these regions in thalamus, putamen, cerebellum and frontal cortex (cingulum, orbitofrontal, inferior frontal)
and lower negative connectivity in auditory, visual, motor, premotor and somatosensory cortices for schizophrenia patients
than for controls, consistent with prior findings of thalamic disconnection in schizophrenia.

[ edit ]
Verena Ruloffs, 02_20.01.2015, Wintersemester 2014/2015

Towards Breaking the Histone Code – Bayesian Graphical Models for Histone Modifications
Background—Histones are proteins that wrap DNA around in small spherical structures called
nucleosomes. Histone modifications (HMs) refer to the post-translational modifications to the
histone tails. At a particular genomic locus, each of these HMs can either be present or absent, and
the combinatory patterns of the presence or absence of multiple HMs, or the ‘histone codes,’ are
believed to co-regulate important biological processes. We aim to use raw data on HM markers at
different genomic loci to (1) decode the complex biological network of HMs in a single region and
(2) demonstrate how the HM networks differ in different regulatory regions. We suggest that these
differences in network attributes form a significant link between histones and genomic functions.
Methods and Results—We develop a powerful graphical model under Bayesian paradigm.
Posterior inference is fully probabilistic, allowing us to compute the probabilities of distinct
dependence patterns of the HMs using graphs. Furthermore, our model-based framework allows
for easy but important extensions for inference on differential networks under various conditions,
such as the different annotations of the genomic locations (e.g., promoters versus insulators). We
applied these models to ChIP-Seq data based on CD4+ T lymphocytes. The results confirmed
many existing findings and provided a unified tool to generate various promising hypotheses.
Differential network analyses revealed new insights on co-regulation of HMs of transcriptional
activities in different genomic regions.
Conclusions—The use of Bayesian graphical models and borrowing strength across different
conditions provide high power to infer histone networks and their differences.

[ edit ]
Janine Rauch, 02_20.01.2015, Wintersemester 2014/2015

Hyper-Brain Networks Support Romantic Kissing in Humans
Coordinated social interaction is associated with, and presumably dependent on, oscillatory couplings within and between brains, which, in turn, consist of an interplay across different frequencies. Here, we introduce a method of network construction based on the cross-frequency coupling (CFC) and examine whether coordinated social interaction is associated with CFC within and between brains. Specifically, we compare the electroencephalograms (EEG) of 15 heterosexual couples during romantic kissing to kissing one’s own hand, and to kissing one another while performing silent arithmetic. Using graph-theory methods, we identify theta–alpha hyper-brain networks, with alpha serving a cleaving or pacemaker function.
Network strengths were higher and characteristic path lengths shorter when individuals were kissing each other than when they were kissing their own hand. In both partner-oriented kissing conditions, greater strength and shorter path length for 5-Hz oscillation nodes correlated reliably with greater partner-oriented kissing satisfaction. This correlation was especially strong for inter-brain connections in both partner-oriented kissing conditions but not during kissing one’s own hand.
Kissing quality assessed after the kissing with silent arithmetic correlated reliably with intra-brain strength of 10-Hz oscillation nodes during both romantic kissing and kissing with silent arithmetic. We conclude that hyper-brain networks based on CFC may capture neural mechanisms that support interpersonally coordinated voluntary action and bonding behavior.

[ edit ]
Christian Meister, 02_20.01.2015, Wintersemester 2014/2015

Spatiotemporal dynamics of surface water networks across a global biodiversity hotspot
The concept of habitat networks represents an important tool for landscape conservation and management at regional scales. Previous studies simulated degradation of temporally fixed networks but few quantified the change in network connectivity from disintegration of key features that undergo naturally occurring spatiotemporal dynamics. This is particularly of concern for aquatic systems, which typically show high natural spatiotemporal variability. Here we focused on the Swan Coastal Plain, a bioregion that encompasses a global biodiversity hotspot in Australia with over 1500 water bodies of high biodiversity. Using graph theory, we conducted a temporal analysis of water body connectivity over 13 years of variable climate. We derived large networks of surface water bodies using Landsat data (1999–2011). We generated an ensemble of 278 potential networks at three dispersal distances approximating the maximum dispersal distance of different water dependent organisms. We assessed network connectivity through several network topology metrics and quantified the resilience of the network topology during wet and dry phases. We identified

[ edit ]
Adrian Viehweger, 01_19.01.2015, Wintersemester 2014/2015

Alignment-free protein interaction network comparison.

Biological network comparison software largely relies on the concept of alignment where close matches between the nodes of two or more networks are sought. These node matches are based on sequence similarity and/or interaction patterns. However, because of the incomplete and error-prone datasets currently available, such methods have had limited success. Moreover, the results of network alignment are in general not amenable for distance-based evolutionary analysis of sets of networks. In this article, we describe Netdis, a topology-based distance measure between networks, which offers the possibility of network phylogeny reconstruction.


We first demonstrate that Netdis is able to correctly separate different random graph model types independent of network size and density. The biological applicability of the method is then shown by its ability to build the correct phylogenetic tree of species based solely on the topology of current protein interaction networks. Our results provide new evidence that the topology of protein interaction networks contains information about evolutionary processes, despite the lack of conservation of individual interactions. As Netdis is applicable to all networks because of its speed and simplicity, we apply it to a large collection of biological and non-biological networks where it clusters diverse networks by type.

[ edit ]
Rico Feist, 02_20.01.2015, Wintersemester 2014/2015

Numerical modelling and graph theory tools to study ecological connectivity in the Great Barrier Reef
The process of coral larval dispersal is important for coral reef ecosystems, but remains poorly understood and hard to gauge. Better knowledge of inter-reef connectivity patterns would be useful in enabling better management of coral reef waters however. By employing a spatially explicit numerical modelling approach, we simulate larval dispersal through the central section of the Great Barrier Reef (GBR), comprising over 1000 reefs, and identify spatial patterns in the inter-reef connectivity network using a community detection method from network science. This paper presents the modelling approach used and discusses the significance of the results. Inter-reef connectivity networks were estimated for 4 different coral species, and significant differences between them were found. We show how we can partition reefs into clusters, or “communities”, that are sparsely connected with each other, and therefore identify important barriers to larval dispersal. By fine-tuning a resolution parameter in the community detection method, we can find dispersal barriers of
varying strength. Finally, we show that the average connectivity length scale varies significantly across the different reef communities, and suggest that this may have repercussions for the optimal placement of marine protected areas (MPAs) to maximise connectivity with surrounding reefs.

[ edit ]
Lukas Hirsch, 02_20.01.2015, Wintersemester 2014/2015

RNA Graph Partitioning for the Discovery of RNA Modularity: A Novel Application of Graph Partition Algorithm to Biology
Graph representations have been widely used to analyze and design various economic, social, military, political, and
biological networks. In systems biology, networks of cells and organs are useful for understanding disease and medical
treatments and, in structural biology, structures of molecules can be described, including RNA structures. In our RNA-As-
Graphs (RAG) framework, we represent RNA structures as tree graphs by translating unpaired regions into vertices and
helices into edges. Here we explore the modularity of RNA structures by applying graph partitioning known in graph theory
to divide an RNA graph into subgraphs. To our knowledge, this is the first application of graph partitioning to biology, and
the results suggest a systematic approach for modular design in general. The graph partitioning algorithms utilize
mathematical properties of the Laplacian eigenvector ( m 2) corresponding to the second eigenvalues (l2) associated with
the topology matrix defining the graph: l2 describes the overall topology, and the sum of m 29s components is zero. The
three types of algorithms, termed median, sign, and gap cuts, divide a graph by determining nodes of cut by median, zero,
and largest gap of m 29s components, respectively. We apply these algorithms to 45 graphs corresponding to all solved RNA
structures up through 11 vertices (,220 nucleotides). While we observe that the median cut divides a graph into two
similar-sized subgraphs, the sign and gap cuts partition a graph into two topologically-distinct subgraphs. We find that the
gap cut produces the best biologically-relevant partitioning for RNA because it divides RNAs at less stable connections while
maintaining junctions intact. The iterative gap cuts suggest basic modules and assembly protocols to design large RNA
structures. Our graph substructuring thus suggests a systematic approach to explore the modularity of biological networks.
In our applications to RNA structures, subgraphs also suggest design strategies for novel RNA motifs.

[ edit ]
Stefan Börner, 01_19.01.2015, Wintersemester 2014/2015

Manifold de Bruijn Graphs
Genome assembly is usually abstracted as the problem of reconstructing a string from a set of its k-mers. This abstraction naturally leads to the classical de Bruijn graph approach—the key algorithmic technique in genome assembly. While each vertex in this approach is labeled by a string of the fixed length k, the recent genome assembly studies suggest that it would be useful to generalize the notion of the de Bruijn graph to the case when vertices are labeled by strings of variable lengths. Ideally, we would like to choose larger values of k in high-coverage regions to reduce repeat collapsing and smaller values of k in the low-coverage regions to avoid fragmentation of the de Bruijn graph. To address this challenge, the iterative de Bruijn graph assembly (IDBA) approach allows one to increase k at each iterations of the graph construction. We introduce the Manifold de Bruijn (M-Bruijn) graph (that generalizes the concept of the de Bruijn graph) and show that it can provide benefits similar to the IDBA approach in a single iteration that considers the entire range of possible k-mer sizes rather than varies k from one iteration to another.

[ edit ]
Daniel Gerighausen, 04_24.06.2013, Sommersemester 2013

From brain topography to brain topology: relevance of graph theory to functional neuroscience
Although several brain regions show significant specialization, higher functions such as cross-modal information integration, abstract reasoning and conscious awareness are viewed as emerging from interactions across distributed functional networks. Analytical approaches capable of capturing the properties of such networks can therefore enhance our ability to make inferences from functional MRI, electroencephalography and magnetoencephalography data. Graph theory is a branch of mathematics that focuses on the formal modelling of networks and offers a wide range of theoretical tools to quantify specific features of network architecture (topology) that can provide information complementing the anatomical localization of areas responding to given stimuli or tasks (topography).
Explicit modelling of the architecture of axonal connections and interactions among areas can furthermore reveal peculiar topological properties that are conserved across diverse biological networks, and highly sensitive to disease
states. The field is evolving rapidly, partly fuelled by computational developments that enable the study of connectivity at fine anatomical detail and the simultaneous interactions among multiple regions. Recent publications
in this area have shown that graph-based modelling can enhance our ability to draw causal inferences from functional MRI experiments, and support the early
detection of disconnection and the modelling of pathology spread in eurodegenerative disease, particularly Alzheimer’s disease. Furthermore, neurophysiological studies have shown that network topology has a profound
link to epileptogenesis and that connectivity indices derived from graph models aid in modelling the onset and spread of seizures. Graph-based analyses may
therefore significantly help understand the bases of a range of neurological conditions. This review is designed to provide an overview of graph-based analyses of brain connectivity and their relevance to disease aimed
principally at general neuroscientists and clinicians.

[ edit ]
Daniel Gerighausen, 06_24.06.2013, Sommersemester 2013

RNAlyzer—novel approach for quality analysis of RNA structural models
The continuously increasing amount of RNA sequence and experimentally determined 3D structure data drives the development of computational methods supporting exploration of these data. Contemporary functional analysis of RNA molecules, such as ribozymes or riboswitches, covers various issues, among which tertiary structure modeling becomes more and more important. A growing number of tools to model and predict RNA structure calls for an evaluation of these tools and the quality of outcomes their produce. Thus, the development of reliable methods designed to meet this need is relevant in the context of RNA tertiary structure analysis and can highly influence the quality and usefulness of RNA tertiary structure prediction in the nearest future. Here, we present RNAlyzer—a computational method for comparison of RNA 3D models with the reference structure and for discrimination between the correct and incorrect models. Our approach is based on the idea of local neighborhood, defined as a set of atoms included in the sphere centered around a user-defined atom. A unique feature of the RNAlyzer is the simultaneous visualization of the model-reference structure distance at different levels of detail, from the individual residues to the entire molecules.

[ edit ]
Assen Tarlov, 23_28.06.2013, Sommersemester 2013

Protein localization prediction using random walks on graphs


Understanding the localization of proteins in cells is vital to characterizing their functions and possible interactions. As a result, identifying the (sub)cellular compartment within which a protein is located becomes an important problem in protein classification. This classification issue thus involves predicting labels in a dataset with a limited number of labeled data points available. By utilizing a graph representation of protein data, random walk techniques have performed well in sequence classification and functional prediction; however, this method has not yet been applied to protein localization. Accordingly, we propose a novel classifier in the site prediction of proteins based on random walks on a graph.


We propose a graph theory model for predicting protein localization using data generated in yeast and gram-negative (Gneg) bacteria. We tested the performance of our classifier on the two datasets, optimizing the model training parameters by varying the laziness values and the number of steps taken during the random walk. Using 10-fold cross-validation, we achieved an accuracy of above 61% for yeast data and about 93% for gram-negative bacteria.


This study presents a new classifier derived from the random walk technique and applies this classifier to investigate the cellular localization of proteins. The prediction accuracy and additional validation demonstrate an improvement over previous methods, such as support vector machine (SVM)-based classifiers.

[ edit ]
Sandra Gerstl, 02_24.06.2013, Sommersemester 2013

Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinformati
Background: The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure.

Results: In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred (G,R,K,S,H), (G,K,T,S,D,N), (T,G,Y), (G,Y,W) and (Y,D,W,N,E) residues respectively, whereas vitamins preferred (Y,F,S,W,T,G,H) residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred (F,I,W,Y,L,V), (S,Y,G,T,H,W,N,E) and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets.

Conclusions: This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred

Prediction of vitamin interacting residues in a vitamin binding protein using evolutionary information. BMC Bioinformatics, 2013
Autoren: Panwar, B.; Gupta, S.; Raghava, G.P.S

[ edit ]
Lieselotte Erber, 22_28.06.2013, Sommersemester 2013

Transient RNA structure features are evolutionarily conserved and can be computationally predicted
Functional RNA structures tend to be conserved during evolution. This finding is, for example, exploited by comparative methods for RNA secondary structure prediction that currently provide the stateofart in terms of prediction accuracy. We here provide strong evidence that homologous RNA genes not only fold into similar final RNA structures, but that their folding pathways also share common transient structural features that have been evolutionarily conserved. For this, we compile and investigate a non-redundant data set of 32 sequences with known transient and final RNA secondary structures and devise a dedicated computational analysis pipeline.

Transient RNA structure features are evolutionarily conserved and can be computationally predicted
(Jing Yun A. Zhu, Adi Steif, Jeff R. Proctor and Irmtraud M. Meyer. Received November 30, 2012; Revised March 25, 2013; Accepted April 5, 2013)

[ edit ]
Enrico Reich, 21_28.06.2013, Sommersemester 2013

An expert fuzzy cognitive map for reactive navigation of mobile robots
A control technique is described for reactive navigation of mobile robots. The problems of large number of rules, and inefficient definition of contributing factors, e.g., robot wheel slippage, are resolved. Causal inference mechanism of the fuzzy cognitive map (FCM) is hired for deriving the required control values from the FCM

[ edit ]
Sarah Berkemer, 20_28.06.2013, Sommersemester 2013

Predicting pseudoknotted structures across two RNA sequences
Motivation: Laboratory RNA structure determination is demanding and costly and thus, computational structure prediction is an import- ant task. Single sequence methods for RNA secondary structure pre- diction are limited by the accuracy of the underlying folding model, if a structure is supported by a family of evolutionarily related sequences,one can be more confident that the prediction is accurate. RNA pseudoknots are functional elements, which have highly conserved structures. However, few comparative structure prediction methods can handle pseudoknots due to the computational complexity.
Results: A comparative pseudoknot prediction method called DotKnot-PW is introduced based on structural comparison of sec- ondary structure elements and H-type pseudoknot candidates.
DotKnot-PW outperforms other methods from the literature on a hand-curated test set of RNA structures with experimental support.

[ edit ]
Sarah Berkemer, 19_28.06.2013, Sommersemester 2013

A new network class: Collatz Step Graphs
In this paper, we introduce a biologically inspired model to generate complex networks. In contrast to many other construction procedures for growing networks introduced so far, our method generates networks from one-dimensional symbol sequences that are related to the so called Collatz problem from number theory. The major purpose of the present paper is, first, to derive a symbol sequence from the Collatz problem, we call the step sequence, and investigate its structural properties. Second, we introduce a construction procedure for growing networks that is based on these step sequences.
Third, we investigate the structural properties of this new network class including their finite scaling and asymptotic behavior of their complexity, average shortest path lengths and clustering coefficients. Interestingly, in contrast to many other network models including the small-world network from Watts und Strogatz, we find that CS graphs become ‘smaller’
with an increasing size.

[ edit ]
Benjamin Standfuß, 17_27.06.2013, Sommersemester 2013

Reconstruction of regulatory networks through temporal enrichment profiling and its application to H1N1 influenza viral
Reconstruction of regulatory networks through temporal enrichment profiling and its application to H1N1 influenza viral infection

Reconstruction of regulatory networks through
temporal enrichment profiling and its application
to H1N1 influenza viral infection
Elena Zaslavsky1*, German Nudelman1, Susanna Marquez2,4, Uri Hershberg2,5, Boris M Hartmann1, Juilee Thakar2,
Stuart C Sealfon1, Steven H Kleinstein2,3*
From 10th International Conference on Artificial Immune Systems (ICARIS)
Cambridge, UK. 18-21 July 2011,

[ edit ]
Felix Kühnl, 01_24.06.2013, Sommersemester 2013

A graph-theoretic approach for inparalog detection
Understanding the history of a gene family that evolves through duplication, speciation, and loss is a fundamental problem in comparative genomics. Features such as function, position, and structural similarity between genes are intimately connected to this history; relationships between genes such as orthology (genes related through a speciation event) or paralogy (genes related through a duplication event) are usually correlated with these features. For example, recent work has shown that in human and mouse there is a strong connection between function and inparalogs, the paralogs that were created since the speciation event separating the human and mouse lineages. Methods exist for detecting inparalogs that either use information from only two species, or consider a set of species but rely on clustering methods. In this paper we present a graph-theoretic approach for finding lower bounds on the number of inparalogs for a given set of species; we pose an edge covering problem on the similarity graph and give an efficient 2/3-approximation as well as a faster heuristic. Since the physical position of inparalogs corresponding to recent speciations is not likely to have changed since the duplication, we also use our predictions to estimate the types of duplications that have occurred in some vertebrates and drosophila.

Tremblay-Savard and Swenson BMC Bioinformatics 2012, 13(Suppl 19):S16 -- A graph-theoretic approach for inparalog detection

[ edit ]
Lisa Duchstein, 18_27.06.2013, Sommersemester 2013

PRince: a web server for structural and physicochemical analysis of Protein-RNA interface
We have developed a web server, PRince, which analyzes the structural features and physicochemical properties of the protein–RNA interface. Users need to submit a PDB file containing the atomic coordinates of both the protein and the RNA molecules in complex form. They should also mention the chain identifiers
of interacting protein and RNA molecules. The size of the protein–RNA interface is estimated by measuring the solvent accessible surface area buried in contact. For a given protein–RNA complex, PRince calculates structural, physicochemical and hydration properties of the interacting surfaces. All these parameters generated by the server are presented in a tabular format. The interacting surfaces can also be visualized with software plug-in like Jmol. In addition,
the output files containing the list of the atomic coordinates of the interacting protein, RNA and interface water molecules can be downloaded. The
parameters generated by PRince are novel, and users can correlate them with the experimentally determined biophysical and biochemical parameters for better understanding the specificity of the protein–RNA recognition process. This server will be continuously upgraded to include more parameters. PRince is publicly accessible and free for use.

[ edit ]
Benjamin Standfuß, 14_26.06.2013, Sommersemester 2013

Computational identification of biologically functional non-hairpin GC-helices in human Argonaute mRNA
Background: Perfectly formed duplex elements in RNA occur within folding units, often as a part of hairpin motifs which can be reliably predicted by various RNA folding algorithms. Double helices with consecutive Watson-Crick
base-pairing may also be formed between distant RNA segments thereby facilitating long-range interactions of
long-chain RNA that may be biologically functional. Here we addressed the potential formation of RNA duplex
motifs by long-range RNA-RNA interactions of distantly located matching sequence elements of a single long-chain
Results: We generated a Python-based software tool that identifies consecutive RNA duplex elements at any given
length and nucleotide content formed by distant sequences. The software tool, dubbed RNAslider, is built on the
theoretical RNA structure prediction algorithm Mfold. Source code and sample data sets are available on demand.
We found that a small ratio of human genes including the Argonaute (Ago)-like gene family encode mRNAs
containing highly GC-rich non-hairpin duplex elements (GC-helix) of equal to or more than 8 base pairs in length
and we provide experimental evidence for their biological significance.
Conclusion: GC-helices are observed preferentially within the 50-region of mRNAs in an evolutionarily conserved
fashion indicating their potential biological role. This view is supported experimentally by post-transcriptional
regulation of gene expression of a fusion transcript containing 50-sequences of human mRNAAgo2 harbouring
GC-helices and down-stream coding sequences of Renilla luciferase.

Computational identification of biologically
functional non-hairpin GC-helices in human
Argonaute mRNA
Simon Dornseifer1,2 and Georg Sczakiel1*
aus BMC
Dornseifer and Sczakiel BMC Bioinformatics 2013, 14:122

[ edit ]
Martin Junghanns, 13_26.06.2013, Sommersemester 2013

Neo4j - Graphdatenbanksystem
Vortrag über Neo4j:
- Datenmodell
- Vorteile ggü. relationalen DBs
- Anfragen (API, Sprachen)
- Use Cases

[ edit ]
Tariq Yousef, 08_24.06.2013, Sommersemester 2013

A New Overlapping Clustering Algorithm Based on Graph Theory
Most of the clustering algorithms reported in the literature build disjoint
clusters; i.e., they do not allow objects to belong to more than one cluster.
Although this approach has been successful for unsupervised learning, there are
some applications, like information retrieval, social network analysis, Bioinformatics and news stream analysis, among others, where objects could belong to more than one cluster.
In this paper, we introduce a new clustering algorithm for building overlapping
clusterings. The proposed algorithm, called OCDC (Overlapping Clustering
based on Density and Compactness), introduces a new graph-covering strategy
and a new filtering strategy, which together allow obtaining a small set of overlapping

A New Overlapping Clustering Algorithm
Based on Graph Theory
Airel P´erez-Su´arez1,2, Jos´e Fco. Mart´ınez-Trinidad1,
Jes´us A. Carrasco-Ochoa1, and Jos´e E. Medina-Pagola2
1 Instituto Nacional de Astrof´ısica, ´Optica y Electr´onica (INAOE)
Luis E. Erro No. 1, Sta. Mar´ıa Tonantzintla, Puebla, CP:72840, Mexico
2 Advanced Technologies Application Center (CENATAV)
7ma A  21406 e/ 214 and 216, Playa, C.P. 12200, La Habana, Cuba

Batyrshin and M. Gonz´alez Mendoza (Eds.): MICAI 2012, Part I, LNAI 7629, pp. 61–72, 2013.
Springer-Verlag Berlin Heidelberg 2013

[ edit ]
Jie Xu, 07_24.06.2013, Sommersemester 2013

Trees and/or networks to display intraspecific DNA sequence variation?
Phylogenetic trees and networks are both used in the scientific literature to display DNA sequence variation at the intraspecific level. Should we rather use trees or networks? I argue that the process of inferring the most parsimonious genealogical relationships among a set of DNA sequences should be dissociated from the problem of displaying this information in a graph. A network graph is probably more appropriate than a strict consensus tree if many alternative, equally most parsimonious, genealogies are to be included. Within the maximum parsimony framework, current phylogenetic inference and network-building algorithms are both unable to guarantee the finding of all most parsimonious (MP) connections. In fact, each approach can find MP connections that the other does not. Although it should be possible to improve at least the maximum parsimony approach, current implementations of these algorithms are such that it is advisable to use both approaches to increase the probability of finding all possible MP connections among a set of DNA sequences.

Bandelt HJ, Dress AW (1992) Split decomposition: a new and useful
approach to phylogenetic analysis of distance data. Molecular
phylogenetics and evolution, 1, 242–252.
Bandelt HJ, Forster P, Sykes BC, Richards MB (1995) Mitochondrial
portraits of human populations using median networks. Genetics,
141, 743–753.
Bandelt HJ, Forster P, Ro¨hl A (1999) Median-joining networks for
inferring intraspecific phylogenies. MMolecular biology and evolution,
16, 37–48.

[ edit ]
Tariq Yousef, 15_26.06.2013, Sommersemester 2013

ssifyingMultigraphModels of Secondary RNA Structure Using Graph-Theoretic Descriptors
The prediction of secondary RNA folds from primary sequences continues to be an important area of research given the
significance of RNA molecules in biological processes such as gene regulation. To facilitate this effort, graph models of secondary
structure have been developed to quantify and thereby characterize the topological properties of the secondary folds. In this
work we utilize a multigraph representation of a secondary RNA structure to examine the ability of the existing graph-theoretic
descriptors to classify all possible topologies as either RNA-like or not RNA-like. We use more than one hundred descriptors and
several different machine learning approaches, including nearest neighbor algorithms, one-class classifiers, and several clustering
techniques.We predict thatmanymore topologies will be identified as those representing RNA secondary structures than currently
predicted in the RAG (RNA-As-Graphs) database. The results also suggest which descriptors and which algorithms are more
informative in classifying and exploring secondary RNA structures.

Research Article
ClassifyingMultigraphModels of Secondary RNA Structure Using
Graph-Theoretic Descriptors
Copyright © 2012

Debra Knisley,1, 2 Jeff Knisley,1, 2 Chelsea Ross,2 and Alissa Rockney2
1 Institute for Quantitative Biology, East Tennessee State University, Johnson City, TN 37614-0663, USA
2Department of Mathematics and Statistics, East Tennessee State University, Johnson City, TN 37614-0663, USA

[ edit ]
Sandra Gerstl, 16_27.06.2013, Sommersemester 2013

Discovery and analysis of consistent active subnetworks in cancers
Gene expression profiles can show significant changes when genetically diseased cells are compared with nondiseased cells. Biological networks are often used to identify active subnetworks (ASNs) of the diseases from the expression profiles to understand the reason behind the observed changes. Current methodologies for discovering ASNs mostly use undirected PPI networks and node centric approaches. This can limit their ability to find the meaningful ASNs when using integrated networks having comprehensive information than the traditional proteinprotein interaction networks. Using appropriate scoring functions to assess both genes and their interactions may allow the discovery of better ASNs. In this paper, we present CASNet, which aims to identify better ASNs using (i) integrated interaction networks (mixed graphs), (ii) directions of regulations of genes, and (iii) combined node and edge scores. We simplify and extend previous methodologies to incorporate edge evaluations and lessen their sensitivity to significance thresholds. We formulate our objective functions using mixed integer programming (MIP) and show that optimal
<br/>solutions may be obtained. We compare the ASNs obtained by CASNet and similar other approaches to show that CASNet can often discover more meaningful and stable regulatory ASNs. Our analysis of a breast cancer dataset finds that the positive feedback loops across 7 genes, AR, ESR1, MYC, E2F2, PGR, BCL2 and CCND1 are conserved across the basal/triple negative subtypes in multiple datasets that could potentially explain the aggressive nature of this cancer subtype. Furthermore, comparison of the basal subtype of breast cancer and the mesenchymal subtype of glioblastoma ASNs shows that an ASN in the vicinity of IL6 is conserved across the two subtypes. This result suggests that
<br/>subtypes of different cancers can show molecular similarities indicating that the therapeutic approaches in different types of cancers may be shared.

Gaire et a.: Discovery and analysis of consistent active subnetworks
<br/>in cancers. BMC Bioinformatics 2013, 14(Suppl 2):S7

[ edit ]
shichao Hu, 11_25.06.2013, Sommersemester 2013

Steiner tree methods for optimal sub-network
Analysis and interpretation of biological networks is one of the primary goals of systems
biology. In this context identification of sub-networks connecting sets of seed proteins or
seed genes plays a crucial role. Given that no natural node and edge weighting scheme is
available retrieval of a minimum size sub-graph leads to the classical Steiner tree problem,
which is known to be NP-complete. Many approximate solutions have been published and
theoretically analyzed in the computer science literature, but far less is known about their
practical performance in the bioinformatics field.

1 Alon U Introduction into Systems Biology Design Principles of Biological Circuits.
2 Kanehisa M Kegg for linking genomes to life and the environment. Nucleic Acids Res

[ edit ]
Yan Jin, 12_25.06.2013, Sommersemester 2013

Discovery and analysis of consistent active subnetworks
Gene expression profiles can show significant changes when genetically diseased cells are compared with nondiseased cells. Biological networks are often used to identify active subnetworks (ASNs) of the diseases from the expression profiles to understand the reason behind the observed changes. Current methodologies for discovering ASNs mostly use undirected PPI networks and node centric approaches. This can limit their ability to find the meaningful ASNs when using integrated networks having comprehensive information than the traditional proteinprotein interaction networks. Using appropriate scoring functions to assess both genes and their interactions may allow the discovery of better ASNs.
In this paper, we present CASNet, which aims to identify better ASNs using (i) integrated interaction networks (mixed graphs), (ii) directions of regulations of genes, and (iii) combined node and edge scores. We simplify and extend previous methodologies to incorporate edge evaluations and lessen their sensitivity to significance thresholds. We formulate our objective functions using mixed integer programming (MIP) and show that optimal solutions may be obtained.
We compare the ASNs obtained by CASNet and similar other approaches to show that CASNet can often discover more meaningful and stable regulatory ASNs. Our analysis of a breast cancer dataset finds that the positive feedback loops across 7 genes, AR, ESR1, MYC, E2F2, PGR, BCL2 and CCND1 are conserved across the basal/triple negative subtypes in multiple datasets that could potentially explain the aggressive nature of this cancer subtype. Furthermore, comparison of the basal subtype of breast cancer and the mesenchymal subtype of glioblastoma ASNs shows that an ASN in the vicinity of IL6 is conserved across the two subtypes. This result suggests that subtypes of different cancers can show molecular similarities indicating that the therapeutic approaches in different types of cancers may be shared.

1.Discovery and analysis of consistent active subnetworks
in cancers
Raj K Gaire1,2*, Lorey Smith3, Patrick Humbert3, James Bailey1, Peter J Stuckey1, Izhak Haviv4,5
From The Eleventh Asia Pacific Bioinformatics Conference (APBC 2013)
Vancouver, Canada. 21-24 January 2013
2. McDermott U, Downing JR, Stratton MR: Genomics and the Continuum of
Cancer Care. N Engl J Med 2011, 364(January 27):340-50.
3. Lai Y, Eckenrode SE, She J: A statistical framework for integrating two
microarray data sets in differencial expression analysis. BMC
Bioinformatics 2009, 10(Suppl 1):S23.

[ edit ]
Iman Gharib, 10_25.06.2013, Sommersemester 2013

A subgraph isomorphism algorithm and its application to biochemical data
Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query
is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and
the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing
algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.
Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly
reduce the search space without using any complex pruning rules or domain reduction procedures. We compare
our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++
implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and
interaction networks data. We show a significant reduction in the running time of our approach compared with
these other excellent methods and show that our algorithm scales well as memory demands increase.
Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a
comprehensive comparison of different software approaches to subgraph isomorphism highlighting their
weaknesses and strengths. This will help researchers make a rational choice among methods depending on their
application. We also distribute an open-source package including our system and our own C++ implementation of
FocusSearch together with all the used datasets ( In future work, our findings may
be extended to approximate subgraph isomorphism algorithms.

A subgraph isomorphism algorithm and its
application to biochemical data From Ninth Annual Meeting of the Italian Society of Bioinformatics (BITS)
Catania, Sicily. 2-4 May 2012

McKay B: Practical graph isomorphism. Congressus Numerantium 1981,

[ edit ]
Georg Mühlenberg, 07.02.2013, Wintersemester 2012/2013

Phylogeographic characterization of tick-borne encephalitis virus from patients, rodents and ticks in Slovenia.

Tick-borne encephalitis virus (TBEV) is the most important arboviral agent causing infections of the central nervous system in central Europe. Previous studies have shown that TBEV exhibits pronounced genetic variability, which is often correlated to the geographical origin of TBEV. Genetic variability of TBEV has previously been studied predominantly in rodents and ticks, while information about the variability in patients is scarce. In order to understand the molecular relationships of TBEV between natural hosts, vectors and humans, as well as correlation between phylogenetic and geographical clustering, sequences of TBEV E and NS5 protein genes, were obtained by direct sequencing of RT-PCR products from TBE-confirmed patients as well as from rodents and ticks collected from TBE-endemic regions in Slovenia. A total of 27 partial E protein gene sequences representing 15 human, 4 rodent and 8 tick samples and 30 partial NS5 protein gene sequences representing 17 human, 5 rodent and 8 tick samples were obtained. The complete genome sequence of TBEV strain Ljubljana I was simultaneously obtained. Phylogenetic analysis of the E and NS5 protein gene sequences revealed a high degree of TBEV variability in patients, ticks and rodents. Furthermore, an evident correlation between geographical and phylogenetic clustering was shown that was independent of the TBEV host. Moreover, we show the presence of a possible recombination event in the TBEV genome obtained from a patient sample, which was supported with multiple recombination event detection methods. This is the first study that simultaneously analyzed the genetic relationships of directly sequenced TBEV samples from patients, ticks and rodents and provides the largest set of patient-derived TBEV sequences up to date. In addition, we have confirmed the geographical clustering of TBEV sequences in Slovenia and have provided evidence of a possible recombination event in the TBEV genome, obtained from a patient.

Das Paper stammt von

Fajs L, Durmiši E, Knap N, Strle F, Avši&#269;-Županc T.

Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia

[ edit ]
Katharina Hoessel, 07.02.2013, Wintersemester 2012/2013

The GEM mapper: fast, accurate and versatile alignment by filtration
Because of ever-increasing throughput requirements of sequencing data, most existing short-read aligners have been designed to focus on speed at the expense of accuracy. The Genome Multitool (GEM) mapper can leverage string matching by filtration to search the alignment space more efficiently, simultaneously delivering precision (performing fully tunable exhaustive searches that return all existing matches, including gapped ones) and speed (being several times faster than comparable state-of-the-art tools).

Marco-Sola, S., Sammeth, M., Guigo, R., & Ribeca, P. (2012). The GEM mapper: fast, accurate and versatile alignment by filtration. Nat Meth, advance online publication. doi:10.1038/nmeth.2221

[ edit ]
Marina Chkolnikov, 07.02.2013, Wintersemester 2012/2013

What do molecules do when we are not looking? State sequence analysis for stochastic chemical systems.
Many biomolecular systems depend on orderly sequences of chemical transformations or reactions. Yet, the dynamics of single molecules or small-copy-number molecular systems are significantly stochastic. Here, we propose state sequence analysis--a new approach for predicting or visualizing the behaviour of stochastic molecular systems by computing maximum probability state sequences, based on initial conditions or boundary conditions. We demonstrate this approach by analysing the acquisition of drug-resistance mutations in the human immunodeficiency virus genome, which depends on rare events occurring on the time scale of years, and the stochastic opening and closing behaviour of a single sodium ion channel, which occurs on the time scale of milliseconds. In both cases, we find that our approach yields novel insights into the stochastic dynamical behaviour of these systems, including insights that are not correctly reproduced in standard time-discretization approaches to trajectory analysis.

[ edit ]
Georg Muehlenberg, 28.01.2013, Wintersemester 2012/2013

TBEV Paper
Das Paper wertet eine Studie aus der Slowakei aus.
Ein Abstract wird noch nachgereicht.

[ edit ]
Tariq Yousef, 07.02.2013, Wintersemester 2012/2013

clustering algorithms in bioinformatics
Clustering is a process which has a great importance in many fields including bioinformatics. One of the examples is finding groups of genes performing similar functions or being involved in the same biological processes.

My talk will focus at the following points :

- What is clustering?
- K-means algorithm
- Graph-based Clustering
- Hierarchical Clustering

[ edit ]
Lisa Duchstein, 07.02.2013, Wintersemester 2012/2013

Alternative Protein-Protein Interfaces Are Frequent Exceptions
The intricate molecular details of protein-protein interactions (PPIs) are crucial for function. Therefore, measuring the same interacting protein pair again, we expect the same result. This work measured the similarity in the molecular details of interaction for the same and for homologous protein pairs between different experiments. All scores analyzed suggested that different experiments often find exceptions in the interfaces of similar PPIs: up to 22% of all comparisons revealed some differences even for sequence-identical pairs of proteins. The corresponding number for pairs of close homologs reached 68%. Conversely, the interfaces differed entirely for 12–29% of all comparisons. All these estimates were calculated after redundancy reduction. The magnitude of interface differences ranged from subtle to the extreme, as illustrated by a few examples. An extreme case was a change of the interacting domains between two observations of the same biological interaction. One reason for different interfaces was the number of copies of an interaction in the same complex: the probability of observing alternative binding modes increases with the number of copies. Even after removing the special cases with alternative hetero-interfaces to the same homomer, a substantial variability remained. Our results strongly support the surprising notion that there are many alternative solutions to make the intricate molecular details of PPIs crucial for function.

[ edit ]
Lieselotte Erber, 07.02.2013, Wintersemester 2012/2013

High-throughput analysis of epistasis in genome-wide association studies with BiForce

Gene-gene interactions (epistasis) are thought to be important in shaping complex traits, but they have been under-explored in genome-wide association studies (GWAS) due to the computational challenge of enumerating billions of single nucleotide polymorphism (SNP) combinations. Fast screening tools are needed to make epistasis analysis routinely available in GWAS.

We present BiForce to support high-throughput analysis of epistasis in GWAS for either quantitative or binary disease (case-control) traits. BiForce achieves great computational efficiency by using memory efficient data structures, Boolean bitwise operations and multithreaded parallelization. It performs a full pair-wise genome scan to detect interactions involving SNPs with or without significant marginal effects using appropriate Bonferroni-corrected significance thresholds. We show that BiForce is more powerful and significantly faster than published tools for both binary and quantitative traits in a series of performance tests on simulated and real datasets. We demonstrate BiForce in analysing eight metabolic traits in a GWAS cohort (323 697 SNPs, 4500 individuals) and two disease traits in another (340 000 SNPs, 1750 cases and 1500 controls) on a 32-node computing cluster. BiForce completed analyses of the eight metabolic traits within 1 day, identified nine epistatic pairs of SNPs in five metabolic traits and 18 SNP pairs in two disease traits. BiForce can make the analysis of epistasis a routine exercise in GWAS and thus improve our understanding of the role of epistasis in the genetic regulation of complex traits.

[ edit ]
Sabina Kanton, 28.01.2013, Wintersemester 2012/2013

REvolver modeling sequence evolution under domain constraints
Simulating the change of protein sequences over time in a biologically realistic way is fundamental for a broad range of studies with a focus on evolution. It is, thus, problematic that typically simulators evolve individual sites of a sequence identically and independently. More realistic simulations are possible, however, they are often prohibited by limited knowledge concerning site-specific evolutionary constraints or functional dependencies between amino acids. As a consequence, a proteins functional and structural characteristics are rapidly lost in the course of simulated evolution. Here, we present REvolver, a program that simulates protein sequence alteration such that evolutionarily stable sequence characteristics, like functional domains, are maintained. For this purpose, REvolver recruits profile hidden Markov models (pHMMs) for parameterizing site-specific models of sequence evolution in an automated fashion. pHMMs derived from alignments of homologous proteins or protein domains capture information regarding which sequence sites remained conserved over time and where in a sequence insertions or deletions are more likely to occur. Thus, they describe constraints on the evolutionary process acting on these sequences. To demonstrate the performance of REvolver as well as its applicability in large-scale simulation studies, we evolved the entire human proteome up to 1.5 expected substitutions per site. Simultaneously, we analyzed the preservation of Pfam and SMART domains in the simulated sequences over time. REvolver preserved 92 percent of the Pfam domains originally present in the human sequences. This value drops to 15 percent when traditional models of amino acid sequence evolution are used. Thus, REvolver represents a significant advance toward a realistic simulation of protein sequence evolution on a proteome-wide scale. Further, REvolver facilitates the simulation of a protein family with a user-defined domain architecture at the root.

[ edit ]
Henrike Indrischek, 28.01.2013, Wintersemester 2012/2013

Prediction of Cell Penetrating Peptides by Support Vector Machines
Cell penetrating peptides (CPPs) are those peptides that can transverse cell membranes to enter cells. Once inside the cell, different CPPs can localize to different cellular components and perform different roles. Some generate pore-forming complexes resulting in the destruction of cells while others localize to various organelles. Use of machine learning methods to predict potential new CPPs will enable more rapid screening for applications such as drug delivery. We have investigated the influence of the composition of training datasets on the ability to classify peptides as cell penetrating using support vector machines (SVMs). We identified 111 known CPPs and 34 known non-penetrating peptides from the literature and commercial vendors and used several approaches to build training data sets for the classifiers. Features were calculated from the datasets using a set of basic biochemical properties combined with features from the literature determined to be relevant in the prediction of CPPs. Our results using different training datasets confirm the importance of a balanced training set with approximately equal number of positive and negative examples. The SVM based classifiers have greater classification accuracy than previously reported methods for the prediction of CPPs, and because they use primary biochemical properties of the peptides as features, these classifiers provide insight into the properties needed for cell-penetration. To confirm our SVM classifications, a subset of peptides classified as either penetrating or non-penetrating was selected for synthesis and experimental validation. Of the synthesized peptides predicted to be CPPs, 100% of these peptides were shown to be penetrating.

[ edit ]
Maggie Moosig, 16.01.2013, Wintersemester 2012/2013



[ edit ]
Toni Foerster, 28.01.2013, Wintersemester 2012/2013

Detection of differentially expressed segments in tiling array data
Motivation: Tiling arrays have been a mainstay of unbiased
genome-wide transcriptomics over the last decade. Currently
available approaches to identify expressed or differentially expressed
segments in tiling array data are limited in the recovery of the
underlying gene structures and require several parameters that are
intensity-related or partly dataset-specific.
Results: We have developed TileShuffle, a statistical approach
that identifies transcribed and differentially expressed segments
as significant differences from the background distribution while
considering sequence-specific affinity biases and cross-hybridization.
It avoids dataset-specific parameters in order to provide better
comparability of different tiling array datasets, based on different
technologies or array designs. TileShuffle detects highly and
differentially expressed segments in biological data with significantly
lower false discovery rates under equal sensitivities than commonly
used methods. Also, it is clearly superior in the recovery of exonintron
structures. It further provides window z-scores as a normalized
and robust measure for visual inspection.

[ edit ]
Loreen Knoebel, 16.01.2013, Wintersemester 2012/2013

Exploiting Gene Families for Phylogenomic Analysis of Myzstomid Transcriptome Data
- Einordnung der Myzostomida in den Tree of Life der Metazoa mit bisherigen Methoden eher schwierig
- neue bioinformatische Methoden und Hochdurchsatz-Sequenzierung koennen helfen
- Methoden = Perlskripte, frei erhaeltliche Software, Multiple Sequenzalignments, Gene Tree Parsimony, Maximum Likelihood Trees

[ edit ]
Franziska Hopfe, 16.01.2013, Wintersemester 2012/2013

Seven new dolphin mitochondrial genomes and a time-calibrated phylogeny of whales und Cytochrom b and Bayesian
Mit Hilfe 7 neu sequnezierter mitochondrialer Genome von Delphinen und einer Molekularen Uhr wurde die Phylogenie der Wale konstruiert. Daten wurden mit Bayesian analysiert und die Molekulare Uhr gab Aufschluss ueber die Radiation verschiedener Walfamilien. Man konnte nachweisen das Tursiops und Stenella nicht monophyletisch sind.
Mit Hilfe eines mitochondrialen Genoms konnte zum ersten mal Odontoceti als eine eigene Gruppe dargestellt werden (monophyletisch). Es wurden die Cytochrom b Daten von 66 Waltaxa und 24 Aussengruppen mit Bayesian analysiert. Paper stellt klar, wie wichtig das Einbeziehen von Aussengruppen ist.

[ edit ]
Ronny Richter, 16.07.2012, Sommersemester 2012

Modeling population connectivity by ocean currents, a graph-theoretic approach for marine conservation
Abstract The dispersal of individuals among
marine populations is of great importance to metapopulation
dynamics, population persistence, and
species expansion. Understanding this connectivity
between distant populations is key to their effective
conservation and management. For many marine
species, population connectivity is determined largely
by ocean currents transporting larvae and juveniles
between distant patches of suitable habitat. Recent
work has focused on the biophysics of marine larval
dispersal and its importance to population dynamics,
although few studies have evaluated the spatial and
temporal patterns of this potential dispersal. Here, we
show how an Eulerian advection–diffusion approach
can be used to model the dispersal of coral larvae
between reefs throughout the Tropical Pacific. We
illustrate how this connectivity can be analyzed using
graph theory—an effective approach for exploring
patterns in spatial connections, as well as for
determining the importance of each site and pathway
to local and regional connectivity. Results indicate
that the scale (average distance) of dispersal in the
Pacific is on the order of 50–150 km, consistent with
recent studies in the Caribbean (Cowen, et al. 2006).
Patterns in the dispersal graphs highlight pathways
for larval dispersal along major ocean currents and
through island chains. A series of critical island
‘stepping stones’ are discovered providing potential
pathways across the equatorial currents and connecting
distant island groups. Patterns in these dispersal
graphs highlight possible pathways for species
expansions, reveal connected upstream/downstream
populations, and suggest areas that might be prioritized
for marine conservation efforts.

Andrews JC, Gay S, Sammarco PW (1988) Influence of circulation
on self-seeding patterns at Helix Reef-Great
Barrier Reef. In: Proceedings of the 6th Int. Coral Reef
Symposium, Townsville, Australia vol 2, pp 469–474
Barber PH, Palumbi SR, Erdmann MV et al (2002) Sharp
genetic breaks among populations of Haptosquilla pulchella
(Stomatopoda) indicate limits to larval transport:
patterns, causes, and consequences. Mol Ecol 11:659–674
Benzie JAH (1999) Genetic structure of coral reef organisms:
ghosts of dispersal past. Am Zool 39:131–145
Benzie JAH, Williams ST (1997) Genetic structure of giant
clam (Tridacna maxima) populations in the west Pacific is
not consistent with dispersal by present-day ocean currents.
Evolution 51:768–783
Botsford LW, Hastings A, Gaines SD (2001) Dependence of
sustainability on the configuration of marine reserves and
larval dispersal distance. Ecol Lett 4:144–150
Botsford LW, Micheli F, Hastings A (2003) Principles for the
design of marine reserves. Ecol Appl 13:S25–S31
Calabrese JM, Fagan WF (2004) A comparison-shopper’s
guide to connectivity metrics. Front Ecol Environ 2:529–
Cantwell MD, Forman RTT (1993) Landscape graphs—ecological
modeling with graph-theory to detect
configurations common to diverse landscapes. Landsc
Ecol 8:239–255
Clark JS, Silman M, Kern R et al (1999) Seed dispersal near
and far: patterns across temperate and tropical forests.
Ecology 80:1475–1494
Connolly SR, Bellwood DR, Hughes TP (2003) Indo-Pacific
biodiversity of coral reefs: deviations from a mid-domain
model. Ecology 84:2178–2190
Connolly SR, Hughes TP, Bellwood DR et al (2005) Community
structure of corals and reef fishes at multiple
scales. Science 309:1363–1365
Cowen RK, Lwiza KMM, Sponaugle S et al (2000) Connectivity
of marine populations: open or closed? Science
Cowen RK, Paris CB, Olson D et al (2003) The role of long
distance dispersal versus local retention in replenshing
marine populations. Gulf Caribb Res 14:129–137
Cowen RK, Paris CB, Srinivasan A (2006) Scaling of connectivity
in marine populations. Science 311:522–527
de Queiroz A (2005) The resurrection of oceanic dispersal in
historical biogeography. Trends Ecol Evol 20:68–73
Dijkstra EW (1959) A note on two problems in connection
with graphs. Numer Math 1:269–271
Dunbar RB, Wellington GM, Colgan MW et al (1994) Eastern
Pacific sea-surface temperature since 1600-AD—the
Delta-O18 record of climate variability in Galapagos
corals. Paleoceanography 9:291–315
Dunne JA, Williams RJ, Martinez ND (2002) Food-web
structure and network theory: the role of connectance and
size. Proc Natl Acad Sci USA 99:12917–12922
Dyer RJ, Nason JD (2004) Population graphs: the graph theoretic
shape of genetic structure. Mol Ecol 13:1713–1727
Fahrig L, Merriam G (1985) Habitat patch connectivity and
population survival. Ecology 66:1762–1768
Freeman LC (1979) Centrality in social networks conceptual
clarification. Soc Netw 1:215–239
Gaines SD, Gaylord B, Largier JL (2003) Avoiding current
oversights in marine reserve design. Ecol Appl 13:S32–
Gaines SD, Lafferty KD (1995) Modeling the dynamics of
marine species: the importance of incorporating larval
34 Landscape Ecol (2008) 23:19–36
dispersal. In: McEdward LR (ed) Ecology of marine
invertebrate larvae. CRC Press, Boca Raton, pp 389–412
Gastner MT, Newman MEJ (2006) The spatial structure of
networks. Eur Phys J B 49:247–252
Gay SL, Andrews JC (1994) The effects of recruitment strategies
on coral larvae settlement distributions at Helix
Reef. In: Sammarco PW, Heron ML (eds) The bio-physics
of marine larval dispersal. American Geophysical Union,
Washington, DC, pp 73–88
Gaylord B, Gaines SD (2000) Temperature or transport? Range
limits in marine species mediated solely by flow. Am Nat
Gerber LR, Botsford LW, Hastings A et al (2003) Population
models for marine reserve design: a retrospective and
prospective synthesis. Ecol Appl 13:S47–S64
Gilg MR, Hilbish TJ (2003) The geography of marine larval
dispersal: coupling genetics with fine-scale physical
oceanography. Ecology 84:2989–2998
Glynn PW, Ault JS (2000) A biogeographic analysis and
review of the far eastern Pacific coral reef region. Coral
Reefs 19:1–23
Grantham BA, Eckert GL, Shanks AL (2003) Dispersal
potential of marine invertebrates in diverse habitats. Ecol
Appl 13:S108–S116
Guichard F, Levin SA, Hastings A et al (2004) Toward a
dynamic metacommunity approach to marine reserve
theory. Bioscience 54:1003–1011
Halpern BS, Warner RR (2002) Marine reserves have rapid and
lasting effects. Ecol Lett 5:361–366
Hare JA, Quinlan JA, Werner FE et al (

[ edit ]
Marcel Kansy, 13.07.2012, Sommersemester 2012

Detecting breakdown points in metabolic networks
Background: A complex network of biochemical reactions present in an organism generates various biological moieties necessary for its survival. It is seen that biological systems are robust to genetic and environmental changes at all levels of organization. Functions of various organisms are sustained against
mutational changes by using alternative pathways. It is also seen that if any one of the paths for production of the same metabolite is hampered, an alternate path tries to overcome this defect and helps in combating the damage.
Methodology: Certain physical, chemical or genetic change in any of the precursor substrate of a biochemical reaction may damage the production of the ultimate product. We employ a quantitative approach for simulating this phenomena of causing a physical change in the biochemical reactions by performing external perturbations to 12 metabolic pathways under carbohydrate metabolism in Saccharomyces cerevisae as well as 14 metabolic pathways under carbohydrate metabolism in Homo sapiens. Here, we investigate the relationship between structure and degree of compatibility of metabolites against external
perturbations, i.e., robustness. Robustness can also be further used to identify the extent to which a metabolic pathway can resist a mutation event. Biological networks with a certain connectivity distribution may be very resilient to a particular attack but not to another. The goal of this work is to determine the exact boundary of network breakdown due to both random and targeted attack, thereby analyzing its robustness. We also find that compared to various non-standard models, metabolic networks are exceptionally robust. Here, we report the use of a ‘Resilience-based’ score for enumerating the concept of ‘network-breakdown’. We also use this approach for analyzing metabolite essentiality providing insight into cellular robustness that can be further used for future drug development.
Results: We have investigated the behavior of metabolic pathways under carbohydrate metabolism in S. cerevisae and H. sapiens against random and targeted attack. Both random as well as targeted resilience were calculated by formulating a measure, that we termed as ‘Resilience score’.
Datasets of metabolites were collected for 12 metabolic pathways belonging to carbohydrate metabolism in S. cerevisae and 14 metabolic pathways belonging to carbohydrate metabolism in H. sapiens from Kyoto Encyclopedia for Genes and Genomes (KEGG).

[ edit ]
Christoph Krell, 29.06.2012, Sommersemester 2012

Matching Index of Uncertain Graph: Concept and Algorithm
In practical applications of graph theory, there is no doubt that some uncertain factors may
appear in graphs. This paper employs the uncertainty theory to deal with uncertain factors in uncertain
graph. Matching index and perfect matching index of uncertain graph are proposed. Some properties of
the matching index are discussed. Furthermore, we give an algorithm to calculate the matching index of
uncertain graph.

[ edit ]
Konrad Abicht, 29.06.2012, Sommersemester 2012

Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems
of interacting objects, unifying the study of diverse phenomena
including biological organisms and human society1–3. One crucial
step when studying the structure and dynamics of networks is to
identify communities4,5: groups of related nodes that correspond
to functional subunits such as protein complexes6,7 or social
spheres8–10. Communities in networks often overlap9,10 such that
nodes simultaneously belong to several groups. Meanwhile, many
networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical struc-
ture11–13. However, the fact that many real networks have com-
munities with pervasive overlap, where each and every node
belongs to more than one group, has the consequence that a global
hierarchy of nodes cannot capture the relationships between over-
lapping groups. Here we reinvent communities as groups of links
rather than nodes and show that this unorthodox approach suc-
cessfully reconciles the antagonistic organizing principles of over-
lapping communities and hierarchy. In contrast to the existing
literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hier-
archical organization. We find relevant link communities in many
networks, including major biological networks such as protein–
protein interaction6,7,14 and metabolic networks11,15,16, and show
that a large social network10,17,18 contains hierarchically organized
community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link com-
munities are fundamental building blocks that reveal overlap and
hierarchical organization in networks to be two aspects of the
same phenomenon.

[1] Link communities reveal multiscale complexity in networks, Yong-Yeol Ahn, James P. Bagrow & Sune Lehmann

[ edit ]
Tony Mey, 13.07.2012, Sommersemester 2012

Network Centrality in the Human Functional Connectome
The network architecture of functional connectivity within the human
brain connectome is poorly understood at the voxel level. Here, using
resting state functional magnetic resonance imaging data from 1003
healthy adults, we investigate a broad array of network centrality
measures to provide novel insights into connectivity within the
whole-brain functional network (i.e., the functional connectome).We
first assemble and visualize the voxel-wise (4 mm) functional
connectome as a functional network. We then demonstrate that each
centrality measure captures different aspects of connectivity,
highlighting the importance of considering both global and local
connectivity properties of the functional connectome. Beyond
‘‘detecting functional hubs,’’ we treat centrality as measures of
functional connectivity within the brain connectome and demonstrate
their reliability and phenotypic correlates (i.e., age and sex).
Specifically, our analyses reveal age-related decreases in degree
centrality, but not eigenvector centrality, within precuneus and
posterior cingulate regions. This implies that while local or (direct)
connectivity decreases with age, connections with hub-like regions
within the brain remain stable with age at a global level. In sum, these
findings demonstrate the nonredundancy of various centrality
measures and raise questions regarding their underlying physiological
mechanisms that may be relevant to the study of neurodegenerative
and psychiatric disorders.

[ edit ]
Didier Cherix, 12.07.2012, Sommersemester 2012

Characterization of the anterior cingulate
Um die Veraenderungen im Gehirn bei Patienten mit einem erhoeten Risiko zur Schizophrenie wurden Bilder mittels fMRT aufgenommen. Von den Bildern werden Graphen hergestellt und mehrere Zentralitaetsmassen berechnet, um signifikante Unterschiede herauszufinden.

[ edit ]
Lisa Falkowski, 29.06.2012, Sommersemester 2012

New insights into RNA secondary structure in the alternative splicing of pre-mRNAs.
Alternative splicing is an important mechanism in generating proteomic diversity, and RNA secondary structure is an important element in splicing regulation. The use of high-throughput sequencing and other approaches has increased the number of known pre-mRNA secondary structures by several orders of magnitude, and we now have new insights into the role of RNA secondary structure in alternative splicing and the mechanisms involved (e.g., physical competition, long-range RNA pairing, the structural splicing code, and co-transcriptional splicing). Furthermore, an RNA pairing-based mechanism ensures the selection of only one of several available exons (e.g., Dscam splicing). Here we review several recent discoveries related to the role of RNA secondary structure in alternative splicing and the underlying mechanisms.

[ edit ]
Katharina Theuerkorn, 29.06.2012, Sommersemester 2012

Exploring hierarchical and overlapping modular structure in the yeast protein interaction network
Background: Developing effective strategies to reveal modular structures in protein interaction networks is crucial
for better understanding of molecular mechanisms of underlying biological processes. In this paper, we propose a
new density-based algorithm (ADHOC) for clustering vertices of a protein interaction network using a novel
subgraph density measurement.
Results: By statistically evaluating several independent criteria, we found that ADHOC could significantly improve
the outcome as compared with five previously reported density-dependent methods. We further applied ADHOC
to investigate the hierarchical and overlapping modular structure in the yeast PPI network. Our method could
effectively detect both protein modules and the overlaps between them, and thus greatly promote the precise
prediction of protein functions. Moreover, by further assaying the intermodule layer of the yeast PPI network, we
classified hubs into two types, module hubs and inter-module hubs. Each type presents distinct characteristics both
in network topology and biological functions, which could conduce to the better understanding of relationship
between network architecture and biological implications.
Conclusions: Our proposed algorithm based on the novel subgraph density measurement makes it possible to
more precisely detect hierarchical and overlapping modular structures in protein interaction networks. In addition,
our method also shows a strong robustness against the noise in network, which is quite critical for analyzing such
a high noise network.

[ edit ]
Sascha Ludwig, 29.06.2012, Sommersemester 2012

Inferring Boolean network structure via correlation
Motivation: Accurate, context-specific regulation of gene
expression is essential for all organisms. Accordingly, it is very
important to understand the complex relations within cellular gene
regulatory networks. A tool to describe and analyze the behavior
of such networks are Boolean models. The reconstruction of a
Boolean network from biological data requires identification of
dependencies within the network. This task becomes increasingly
computationally demanding with large amounts of data created by
recent high-throughput technologies. Thus, we developed a method
that is especially suited for network structure reconstruction from
large-scale data. In our approach, we took advantage of the fact that
a specific transcription factor often will consistently either activate
or inhibit a specific target gene, and this kind of regulatory behavior
can be modeled using monotone functions.
Results: To detect regulatory dependencies in a network, we
examined how the expression of different genes correlates to
successive network states. For this purpose, we used Pearson
correlation as an elementary correlation measure. Given a Boolean
network containing only monotone Boolean functions, we prove that
the correlation of successive states can identify the dependencies
in the network. This method not only finds dependencies in
randomly created artificial networks to very high percentage, but also
reconstructed large fractions of both a published Escherichia coli
regulatory network from simulated data and a yeast cell cycle
network from real microarray data.

Wunschtermin für Präsentation: Montag, 09.07.

[ edit ]
Jan Rüdiger, 29.06.2012, Sommersemester 2012

Protein Docking by the Interface Structure Similarity: How Much Structure Is Needed?
The increasing availability of co-crystallized protein-protein complexes provides an opportunity to use template-based modeling for protein-protein docking. Structure alignment techniques are useful in detection of remote target-template similarities. The size of the structure involved in the alignment is important for the success in modeling. This paper describes a systematic large-scale study to find the optimal definition/size of the interfaces for the structure alignment-based docking applications. The results showed that structural areas corresponding to the cutoff values ,12 A° across the interface inadequately represent structural details of the interfaces. With the increase of the cutoff beyond 12 A°, the success rate for the benchmark set of 99 protein complexes, did not increase significantly for higher accuracy models, and decreased for lower-accuracy models. The 12 A° cutoff was optimal in our interface alignment-based docking, and a likely best choice for the large-scale (e.g., on the scale of the entire genome) applications to protein interaction networks. The results provide guidelines for the docking approaches, including high-throughput applications to modeled structures.

[ edit ]
Fabian Externbrink, 29.06.2012, Sommersemester 2012

Efficient RNA pairwise structure comparison by SETTER method.
Motivation: Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their 3D structures. While a structural alignment of short RNAs is achievable in a reasonable amount of time, large structures represent much bigger challenge. However the growth of the number of large RNAs deposited in the PDB database calls for the development of fast and accurate methods for analyzing their structures, as well as for rapid similarity searches in databases.
Results: In this article a novel algorithm for an RNA structural comparison SETTER (SEcondary sTructure-based TERtiary Structure Similarity Algorithm) is introduced. SETTER utilizes a pairwise comparison method based on 3D similarity of the so-called generalized secondary structure units (GSSU). For each pair of structures, SETTER produces a distance score and an indication of its statistical significance. SETTER can be used both for the structural alignments of structures that are already known to be homologous, as well as for 3D structure similarity searches and functional annotation. The algorithm presented is both accurate and fast and does not impose limits on the size of aligned RNA structures.

[ edit ]
Tobias Mede, 29.06.2012, Sommersemester 2012

GraphClust: alignment-free structural clustering of local RNA secondary structures
Motivation: Clustering according to sequence–structure similarity
has now become a generally accepted scheme for ncRNA
annotation. Its application to complete genomic sequences as well
as whole transcriptomes is therefore desirable but hindered by
extremely high computational costs.
Results: We present a novel linear-time, alignment-free method
for comparing and clustering RNAs according to sequence and
structure. The approach scales to datasets of hundreds of thousands
of sequences. The quality of the retrieved clusters has been
benchmarked against known ncRNA datasets and is comparable
to state-of-the-art sequence–structure methods although achieving
speedups of several orders of magnitude. A selection of applications
aiming at the detection of novel structural ncRNAs are presented.
Exemplarily, we predicted local structural elements specific to
lincRNAs likely functionally associating involved transcripts to vital
processes of the human nervous system. In total, we predicted 349
local structural RNA elements.
Availability: The GraphClust pipeline is available on request.
Supplementary information: Supplementary data are available at
Bioinformatics online.

[ edit ]
Francine Klausnitzer, 29.06.2012, Sommersemester 2012

A new protein-ligand binding sites prediction method based on the integration of protein sequence conservation informati
Background: Prediction of protein-ligand binding sites is an important issue for protein function annotation and
structure-based drug design. Nowadays, although many computational methods for ligand-binding prediction have
been developed, there is still a demanding to improve the prediction accuracy and efficiency. In addition, most of
these methods are purely geometry-based, if the prediction methods improvement could be succeeded by
integrating physicochemical or sequence properties of protein-ligand binding, it may also be more helpful to
address the biological question in such studies.
Results: In our study, in order to investigate the contribution of sequence conservation in binding sites prediction
and to make up the insufficiencies in purely geometry based methods, a simple yet efficient protein-binding sites
prediction algorithm is presented, based on the geometry-based cavity identification integrated with sequence
conservation information. Our method was compared with the other three classical tools: PocketPicker, SURFNET,
and PASS, and evaluated on an existing comprehensive dataset of 210 non-redundant protein-ligand complexes.
The results demonstrate that our approach correctly predicted the binding sites in 59% and 75% of cases among
the TOP1 candidates and TOP3 candidates in the ranking list, respectively, which performs better than those of
SURFNET and PASS, and achieves generally a slight better performance with PocketPicker.
Conclusions: Our work has successfully indicated the importance of the sequence conservation information in
binding sites prediction as well as provided a more accurate way for binding sites identification.

[ edit ]
Lisa Falkowski, 03.02.2012, Wintersemester 2011/2012

Histone exchange and histone modifications during transcription and aging.
The organization of the eukaryotic genome into chromatin enables DNA to fit inside the nucleus while also regulating the access of proteins to the DNA to facilitate genomic functions such as transcription, replication and repair. The basic repeating unit of chromatin is the nucleosome, which includes 147bp of DNA wrapped 1.65 times around an octamer of core histone proteins comprising two molecules each of H2A, H2B, H3 and H4 [1]. Each nucleosome is a highly stable unit, being maintained by over 120 direct protein-DNA interactions and several hundred water mediated ones [1]. Accordingly, there is considerable interest in understanding how processive enzymes such as RNA polymerases manage to pass along the coding regions of our genes that are tightly packaged into arrays of nucleosomes. Here we present the current mechanistic understanding of this process and the evidence for profound changes in chromatin dynamics during aging. This article is part of a Special Issue entitled: Histone chaperones and Chromatin assembly.

[ edit ]
Stefanie Heidenreich, 03.02.2012, Wintersemester 2011/2012

Histone methylation makes its mark on longevity
How long organisms live is not entirely written in their
genes. Recent findings reveal that epigenetic factors that
regulate histone methylation, a type of chromatin modification,
can affect lifespan. The reversible nature of
chromatin modifications suggests that therapeutic targeting
of chromatin regulators could be used to extend
lifespan and healthspan. This review describes the epigenetic
regulation of lifespan in diverse model organisms,
focusing on the role and mode of action of
chromatin regulators that affect two epigenetic marks,
trimethylated lysine 4 of histone H3 (H3K4me3) and
trimethylated lysine 27 of histone H3 (H3K27me3), in

Greer, E.L. et al. (2010) Members of the H3K4 trimethylation complex
regulate lifespan in a germline-dependent manner in C. elegans.
Nature 466, 383–387

Kenyon, C.J. (2010) The genetics of ageing. Nature 464, 504–512

[ edit ]
Linda Arnold, 03.02.2012, Wintersemester 2011/2012

On the Connection between RNAi and Heterochromatin at Centromeres
RNA interference (RNAi) is a conserved silencing mechanism whereby double-strand RNA induces specific down-regulation
of homologous sequences. In the fission yeast Schizosaccharomyces pombe, centromeric heterochromatin assembly is an
RNAi-dependent process. Noncoding RNAs transcribed from pericentromeric repeat sequences are processed into short interfering
RNAs (siRNAs) that direct the Argonaute-containing RNA-induced transcriptional silencing (RITS) effector complex
to homologous nascent transcripts. RITS is required for H3K9 methylation by the histone methyltransferase (HMT) Clr4;
conversely, H3K9 methylation can attract RITS to chromatin via binding of the chromodomain protein Chp1. This codependency
has hampered dissection of the order of events and mechanisms of cross talk between the RNAi and chromatin modification
machineries. To tackle this problem, we have developed systems that reconstitute heterochromatin at a euchromatic
locus, using either hairpin triggers or DNA-tethered chromatin-modifying complexes. These systems reveal that RNAi is sufficient
to promote heterochromatin assembly in cis and that direct recruitment of the HMT Clr4 can bypass the role of RNAi
in heterochromatin assembly. We have also characterized a new pathway component, Stc1, that translates the RNAi signal
into chromatin marks. We discuss the implications of these findings for our understanding of the mechanism and function of
RNAi-directed heterochromatin assembly at centromeres.

[ edit ]
Henrike Indrischek, 30.01.2012, Wintersemester 2011/2012

Genomic characterization reveals a simple histone H4 acetylation code
The histone code hypothesis holds that covalent posttranslational modifications of histone tails are interpreted by the cell to yield a rich combinatorial transcriptional output. This hypothesis has been the subject of active debate in the literature. Here, we investigated the combinatorial complexity of the acetylation code at the four lysine residues of the histone H4 tail in budding yeast. We constructed yeast strains carrying all 15 possible combinations of mutations among lysines 5, 8, 12, and 16 to arginine in the histone H4 tail, mimicking positively charged, unacetylated lysine states, and characterized the resulting genome-wide changes in gene expression by using DNA microarrays. Only the lysine 16 mutation had specific transcriptional consequences independent of the mutational state of the other lysines (affecting approximately 100 genes). In contrast, for lysines 5, 8, and 12, expression changes were due to nonspecific, cumulative effects seen as increased transcription correlating with an increase in the total number of mutations (affecting approximately 1,200 genes). Thus, acetylation of histone H4 is interpreted by two mechanisms: a specific mechanism for lysine 16 and a nonspecific, cumulative mechanism for lysines 5, 8, and 12.

Steven Henikoff: Histone modifications: Combinatorial complexity or cumulative simplicity?

[ edit ]
Juliane Meißner, 03.02.2012, Wintersemester 2011/2012

Genome Digging: Insight into the Mitochondrial Genome of Homo
Background: A fraction of the Neanderthal mitochondrial genome sequence has a similarity with a 5,839-bp nuclear DNA
sequence of mitochondrial origin (numt) on the human chromosome 1. This fact has never been interpreted. Although this
phenomenon may be attributed to contamination and mosaic assembly of Neanderthal mtDNA from short sequencing
reads, we explain the mysterious similarity by integration of this numt (mtAncestor-1) into the nuclear genome of the
common ancestor of Neanderthals and modern humans not long before their reproductive split.
Principal Findings: Exploiting bioinformatics, we uncovered an additional numt (mtAncestor-2) with a high similarity to the
Neanderthal mtDNA and indicated that both numts represent almost identical replicas of the mtDNA sequences ancestral to
the mitochondrial genomes of Neanderthals and modern humans. In the proteins, encoded by mtDNA, the majority of
amino acids distinguishing chimpanzees from humans and Neanderthals were acquired by the ancestral hominins. The
overall rate of nonsynonymous evolution in Neanderthal mitochondrial protein-coding genes is not higher than in other
lineages. The model incorporating the ancestral hominin mtDNA sequences estimates the average divergence age of the
mtDNAs of Neanderthals and modern humans to be 450,000–485,000 years. The mtAncestor-1 and mtAncestor-2 sequences
were incorporated into the nuclear genome approximately 620,000 years and 2,885,000 years ago, respectively.
Conclusions: This study provides the first insight into the evolution of the mitochondrial DNA in hominins ancestral to
Neanderthals and humans. We hypothesize that mtAncestor-1 and mtAncestor-2 are likely to be molecular fossils of the
mtDNAs of Homo heidelbergensis and a stem Homo lineage. The dN/dS dynamics suggests that the effective population size
of extinct hominins was low. However, the hominin lineage ancestral to humans, Neanderthals and H. heidelbergensis, had a
larger effective population size and possessed genetic diversity comparable with those of chimpanzee and gorilla.

[ edit ]
Vera Lede, 03.02.2012, Wintersemester 2011/2012

Evolutionary Origins of Transcription Factor Binding Site Clusters
Empirical studies have revealed that regulatory DNA sequences such as enhancers or promoters often harbor multiple
binding sites for the same transcription factor. Such ‘‘homotypic site clustering’’ has been hypothesized as arising out of
functional requirements of the sequences. Here, we propose an alternative explanation of this phenomenon that multisite
enhancers are common because they are favored by evolutionary sampling of the genotype–phenotype landscape. To test
this hypothesis, we developed a new computational framework specialized for population genetic simulations of enhancer
evolution. It uses a thermodynamics-based model of enhancer function, integrating information from strong as well as
weak binding sites, to determine the strength of selection. Using this framework, we found that even when simpler
genotypes exist for a desired strength of regulation, relatively complex genotypes (enhancers with more sites) are more
readily reached by the simulated evolutionary process. We show that there are more ways to ‘‘build’’ a fit genotype with
many weak sites than with a few strong sites, and this is why evolution finds complex genotypes more often. Our claims
are consistent with an empirical analysis of binding site content in enhancers characterized in Drosophila melanogaster and
their orthologs in other Drosophila species. We also characterized a subtle but significant difference between genotypes
likely to be sampled by evolution and equally fit genotypes one would obtain by uniform sampling of the fitness landscape,
that is, an ‘‘evolutionary signature’’ in enhancer sequences. Finally, we investigated potential effects of other factors, such as
rugged fitness landscapes, short local duplications, and noise characteristics of enhancers, on the emergence of homotypic
site clustering.
Homotypic site clustering is an important contributor to the complexity and function of cis-regulatory sequences. This
work provides a simple null hypothesis for its origin, against which alternative adaptationist explanations may be
evaluated, and cautions against ‘‘evolutionary mirages’’ present in common features of genomic sequence. The quantitative
framework we develop here can be used more generally to understand how mechanisms of enhancer action influence their
composition and evolution.

[ edit ]
Christian Sonnendecker, 30.01.2012, Wintersemester 2011/2012

Computational approaches toward the design of pools for the in vitro selection of complex aptamers
It is well known that using random RNA/DNA sequences for SELEX experiments will generally yield low-complexity structures. Early experimental results suggest that having a structurally diverse library, which, for instance, includes high-order junctions, may prove useful in finding new functional motifs. Here, we develop two computational methods to generate sequences that exhibit higher structural complexity and can be used to increase the overall structural diversity of initial pools for in vitro selection experiments. Random Filtering selectively increases the number of five-way junctions in RNA/DNA pools, and Genetic Filtering designs RNA/DNA pools to a specified structure distribution, whether uniform or otherwise. We show that using our computationally designed DNA pool greatly improves access to highly complex sequence structures for SELEX experiments (without losing our ability to select for common one-way and two-way junction sequences).

Computational approaches toward the design of pools for the in vitro selection of complex aptamers.

Luo X, McKeague M, Pitre S, Dumontier M, Green J, Golshani A, Derosa MC, Dehne F.

RNA. 2010 Nov;16(11):2252-62. Epub 2010 Sep 24.


[ edit ]
Caroline Wilde, 30.01.2012, Wintersemester 2011/2012

Defining an epigenetic code
The nucleosome surface is decorated with an array of enzyme-catalysed modifications on histone tails. These modifications have well-defined roles in a variety of ongoing chromatin functions, often by acting as receptors for non-histone proteins, but their longer-term effects are less clear. Here, an attempt is made to define how histone modifications operate as part of a predictive and heritable epigenetic code that specifies patterns of gene expression through differentiation and development.

[ edit ]
Falko Altenkirch, 03.02.2012, Wintersemester 2011/2012

De novo assembly of human genomes with massively parallel short read sequencing
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

[ edit ]
Sabina Kanton, 03.02.2012, Wintersemester 2011/2012

An enhanced RNA alignment benchmark for sequence alignment programs

The performance of alignment programs is traditionally tested on sets of protein sequences, of which a reference alignment is known. Conclusions drawn from such protein benchmarks do not necessarily hold for the RNA alignment problem, as was demonstrated in the first RNA alignment benchmark published so far. For example, the twilight zone – the similarity range where alignment quality drops drastically – starts at 60 percent for RNAs in comparison to 20 percent for proteins. In this study we enhance the previous benchmark.


The RNA sequence sets in the benchmark database are taken from an increased number of RNA families to avoid unintended impact by using only a few families. The size of sets varies from 2 to 15 sequences to assess the influence of the number of sequences on program performance. Alignment quality is scored by two measures: one takes into account only nucleotide matches, the other measures structural conservation. The performance order of parameters – like nucleotide substitution matrices and gap-costs – as well as of programs is rated by rank tests.


Most sequence alignment programs perform equally well on RNA sequence sets with high sequence identity, that is with an average pairwise sequence identity (APSI) above 75 percent. Parameters for gap-open and gap-extension have a large influence on alignment quality lower than APSI 75 percent; optimal parameter combinations are shown for several programs. The use of different 4 × 4 substitution matrices improved program performance only in some cases. The performance of iterative programs drastically increases with increasing sequence numbers and/or decreasing sequence identity, which makes them clearly superior to programs using a purely non-iterative, progressive approach. The best sequence alignment programs produce alignments of high quality down to APSI higher than 55 percent; at lower APSI the use of sequence+structure alignment programs is recommended.

[ edit ]
Toni Förster, 30.01.2012, Wintersemester 2011/2012

Metabolic flux analysis
One of the ultimate goals of systems biology
research is to obtain a comprehensive understanding of the
control mechanisms of complex cellular metabolisms. Metabolic
Flux Analysis (MFA) is a important method for the
quantitative estimation of intracellular metabolic flows through
metabolic pathways and the elucidation of cellular physiology.
The primary challenge in the use of MFA is that many biological
networks are underdetermined systems; it is therefore difficult
to narrow down the solution space from the stoichiometric
constraints alone. In this tutorial, we present an overview of Flux
Balance Analysis (FBA) and 13C-Metabolic Flux Analysis (13CMFA),
both of which are frequently used to solve such underdetermined
systems, and we demonstrate FBA and 13C-MFA using the genome-scale model and the central carbon metabolism model, respectively. Furthermore, because such comprehensive study of intracellular fluxes is inherently complex, we subsequently introduce various pathway mapping and visualization tools to facilitate understanding of these data in the context of the pathways.

Yoshihiro Toya, Nobuaki Kono, Kazuharu Arakawa and Masaru Tomita (2011) Metabolic Flux Analysis and Visualization. Journal of proteome research 10: 3313-3323

[ edit ]
Fabian Externbrink, 03.02.2012, Wintersemester 2011/2012

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

[ edit ]
Ying-Chi Lin, 03.02.2012, Wintersemester 2011/2012

Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity
Identification of transcription factor binding sites is necessary for deciphering gene regulatory networks.
Several new methods provide extensive data about the specificity of transcription factors but most methods for analyzing these data to obtain specificity models are limited in scope by, for example, assuming additive interactions or are inefficient in their exploration of more complex models. This article describes an approach—encoding of DNA sequences as the vertices of a regular simplex—that allows simultaneous direct comparison of simple and complex models, with higher-order parameters fit to the residuals of lower-order models. In addition to providing an efficient assessment of all model parameters, this approach can yield valuable insight into the mechanism of binding by highlighting features that are critical to accurate models.

Gary D. Stormo (2011) Maximally Efficient Modeling of DNA Sequence Motifs at All Levels of Complexity. Genetics 187(4): 1219-1224.

[ edit ]
Jan Engelhardt, 04.07.2011, Sommersemester 2011

The Role of RNA Sequence and Structure in RNA-Protein Interactions.
We investigate the sequence and structural properties of RNA-protein interaction sites in 211 RNA-protein chain pairs, the largest set of RNA-protein complexes analyzed to date. Statistical analysis confirms and extends earlier analyses made on smaller data sets. There are 24.6% of hydrogen bonds between RNA and protein that are nucleobase specific, indicating the importance of both nucleobase-specific and -nonspecific interactions. While there is no significant difference between RNA base frequencies in protein-binding and non-binding regions, distinct preferences for RNA bases, RNA structural states, protein residues, and protein secondary structure emerge when nucleobase-specific and -nonspecific interactions are considered separately. Guanine nucleobase and unpaired RNA structural states are significantly preferred in nucleobase-specific interactions; however, nonspecific interactions disfavor guanine, while still favoring unpaired RNA structural states. The opposite preferences of nucleobase-specific and -nonspecific interactions for guanine may explain discrepancies between earlier studies with regard to base preferences in RNA-protein interaction regions. Preferences for amino acid residues differ significantly between nucleobase-specific and -nonspecific interactions, with nonspecific interactions showing the expected bias towards positively charged residues. Irregular protein structures are strongly favored in interactions with the protein backbone, whereas there is little preference for specific protein secondary structure in either nucleobase-specific interaction or -nonspecific interaction. Overall, this study shows strong preferences for both RNA bases and RNA structural states in protein-RNA interactions, indicating their mutual importance in protein recognition.

PMID: 21514302

[ edit ]
Christoph Kaempf, 08.07.2011, Sommersemester 2011

A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.

Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists.

Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes.

[ edit ]
Markus Mueller, 08.07.2011, Sommersemester 2011

Sequence assembly
Despite the rapidly increasing number of sequenced and re-sequenced genomes, many issues regarding the computational
assembly of large-scale sequencing data have remain unresolved. Computational assembly is crucial in large genome
projects as well for the evolving high-throughput technologies and plays an important role in processing the information
generated by these methods. Here, we provide a comprehensive overview of the current publicly available sequence assembly
programs. We describe the basic principles of computational assembly along with the main concerns, such as
repetitive sequences in genomic DNA, highly expressed genes and alternative transcripts in EST sequences.

[1] K. Liolios, K. Mavromatis, N. Tavernarakis, N. Kyrpides, The genomes on line database (gold) in 2007: status of
genomic and metagenomic projects and their associated metadata., Nucleic Acids Res 36 (Database Issue) (2008)
[2] F. Sanger, G. Air, B. Barrell, N. Brown, A. Coulson, C. Fiddes, C. Hutchison, P. Slocombe, M. Smith, Nucliotide
sequence of bacteriophage phi X174 DNA., Nature 265 (5596) (1977) 687–95.
[3] F. Sanger, A. Coulson, T. Friedmann, G. Air, B. Barrell, N. Brown, J. Fiddes, C. r. Hutchison, P. Slocombe,
M. Smith, The nucleotide sequence of bacteriophage phiX174., J Mol Biol 125 (2) (1978) 225–46.
[4] F. Sanger, A. Coulson, G. Hong, D. Hill, G. Petersen, Nucleotide sequence of bacteriophage lambda DNA., J Mol
Biol 162 (4) (1982) 729–73.
[5] W. Fiers, R. Contreras, G. Haegemann, R. Rogiers, A. Van de Voorde, H. Van Heuverswyn, J. Van Herreweghe,
G. Volckaert, M. Ysebaert, Complete nucleotide sequence of SV40 DNA., Nature 273 (5658) (1978) 113–20.
[6] S. Anderson, A. Bankier, B. Barrell, M. de Bruijn, A. Coulson, J. Drouin, I. Eperon, D. Nierlich, B. Roe, F. Sanger,
et al., Sequence and organization of the human mitochondrial genome., Nature 290 (5806) (1981) 457–65.
[7] S. Anderson, Shotgun DNA sequencing using cloned DNase I-generated fragments., Nucleic Acids Res 9 (13)
(1981) 3015–27.
[8] P. Deininger, Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis., Anal
Biochem 129 (1) (1983) 216–23.
[9] A. Edwards, H. Voss, P. Rice, A. Civitello, J. Stegemann, C. Schwager, J. Zimmermann, H. Erfle, C. Caskey,
W. Ansorge, Automated DNA sequencing of the human HPRT locus., Genomics 6 (4) (1990) 593–608.
[10] R. Wooster, Identification of the breast cancer susceptibility gene BRCA2., Nature 378 (1995) 789–92.
[11] R. Fleischmann, M. Adams, O. White, R. Clayton, E. Kirkness, A. Kerlavage, C. Bult, J. Tomb, B. Dougherty,
J. Merrick, et al.;, Whole-genome random sequencing and assembly of Haemophilus influenzae Rd., Science
269 (5223) (1995) 496–512.
[12] M. Adams, J. Kelley, J. Gocayne,M. Dubnick,M. Polymeropoulos,H. Xiao, C.Merril, A.Wu, B. Olde, R.Moreno,
et al, Complementary DNA sequencing: expressed sequence tags and human genome project., Science. 252 (5013)
(1991) 1651–6.
[13] A. Christoffels, A. van Gelder, G. Greyling, R. Miller, T. Hide, W. Hide, STACK: Sequence Tag Alignment and
Consensus Knowledgebase., Nucleic Acids Res 29 (1) (2001) 234–8.
[14] M. Boguski, The turning point in genome research., Trends Biochem Sci. 20 (8) (1995) 295–6.
[15] M. Marra, L. Hillier, R. Waterston, Expressed sequence tags–ESTablishing bridges between genomes., Trends
Genet. 14 (1) (1998) 4–7.
[16] M. Adams, M. Dubnick, A. Kerlavage, R. Moreno, J. Kelley, T. Utterback, J. Nagle, C. Fields, J. Venter, Sequence
identification of 2,375 human brain genes., Nature. 355 (6361) (1992) 632–4.
[17] M. Adams, A. Kerlavage, C. Fields, J. Venter, 3,400 new expressed sequence tags identify diversity of transcripts
in human brain., Nat Genet. 4 (3) (1993) 256–67.
[18] T. Nakamura, G. Morin, K. Chapman, S. Weinrich, W. Andrews, J. Lingner, C. Harley, T. Cech, Telomerase
catalytic subunit homologs from fission yeast and human., Science. 277 (5328) (1997) 955–9.
[19] R. Medzhitov, P. Preston Hurlburt, C. J. Janeway, A human homologue of the Drosophila Toll protein signals
activation of adaptive immunity., Nature. 388 (6640) (1997) 394–7.
[20] F. Liang, I. Holt, G. Pertea, S. Karamycheva, S. Salzberg, J. Quackenbush, Gene index analysis of the human
genome estimates approximately 120,000 genes., Nat Genet. 25 (2) (2000) 239–40.
[21] T. Hudson, L. Stein, S. Gerety, J. Ma, A. Castle, J. Silva, D. Slonim, R. Baptista, L. Kruglyak, S. Xu, et al.;, An
STS-based map of the human genome., Science. 270 (5244) (1995) 1945–54.
[22] G. Schuler, M. Boguski, E. Stewart, L. Stein, G. Gyapay, K. Rice, R. White, P. Rodriguez Tome, A. Aggarwal,
E. Bajorek, et al., A gene map of the human genome., Science. 274 (5287) (1996) 540–6.
[23] P. Deloukas, G. Schuler, G. Gyapay, E. Beasley, C. Soderlund, P. Rodriguez Tome, L. Hui, T.Matise, K.McKusick,
Beckmann, et al., A physical map of 30,000 human genes., Science. 282 (5389) (1998) 744–6.
[24] R. Waterston, C. Martin, M. Craxton, C. Huynh, A. Coulson, L. Hillier, R. Durbin, P. Green, R. Shownkeen,
N. Halloran, et al.;, A survey of expressed genes in Caenorhabditis elegans., Nat Genet. 1 (2) (1992) 114–23.
[25] W. McCombie, M. Adams, J. Kelley, M. FitzGerald, T. Utterback, M. Khan, M. Dubnick, A. Kerlavage, J. Venter,
C. Fields, Caenorhabditis elegans expressed sequence tags identify gene families and potential disease gene
homologues., Nat Genet. 1 (2) (1992) 124–31.

[ edit ]
Stefan Schaffer, 08.07.2011, Sommersemester 2011

Colonization Process of the Brazilian Common Vesper Mouse, Calomys expulsus (Cricetidae, Sigmodontinae): A Biogeographic
Riverine barriers have been associated to genetic diversification and speciation of several taxa. The Rio Sa˜o Francisco is one
of the largest rivers in South America, representing the third largest river basin in Brazil and operating as a geographic
barrier to gene flow of different taxa. To evaluate the influence of the Rio Sa˜o Francisco in the speciation of small rodents,
we investigated the genetic structure of Calomys expulsus with phylogenetic and network analyses of cytochrome b DNA. Our
results suggested that C. expulsus can be divided into 3 subpopulations, 2 on the left and another one on the right bank of
this river. The time of divergence of these subpopulations, using a Bayesian framework, suggested colonization from the
south to the north/northeast. Spatial analysis using a clustering method and the Monmonier’s algorithm suggested that the
Rio Sa˜o Francisco is a biogeographic barrier to gene flow and indicated that this river may play a role in the incipient
speciation process of these subpopulations.

[ edit ]
Annegret Grimm, 04.07.2011, Sommersemester 2011

Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning
Abstract. Although habitat networks show promise for conservation planning at regional scales, their spatiotemporal dynamics have not been well studied, especially in climatesensitive landscapes. Here I use satellite remote sensing to compile wetland habitat networks from the Prairie Pothole Region (PPR) of North America. An ensemble of networks assembled across a hydrologic gradient from deluge to drought and a range of representative dispersal distances exhibits power-law scaling of important topological parameters. Prairie wetland networks are ‘‘meso-worlds’’ with mean topological distance increasing faster with network size than small-world networks, but slower than a regular lattice (or ‘‘large world’’). This scaling implies rapid dispersal through wetland networks without some of the risks associated with ‘‘small worlds’’ (e.g., extremely rapid propagation of disease or disturbance). Retrospective analysis of wetland networks establishes a climatic envelope for landscape connectivity in the PPR, where I show that a changing climate might severely impact metapopulation viability and restrict long-distance dispersal and range shifts. More generally, this study demonstrates an efficient approach to conservation planning at a level of abstraction addressing key drivers of the global biodiversity crisis: habitat fragmentation and climatic change.

CHRISTOPHER K. WRIGHT (2010): Spatiotemporal dynamics of prairie wetland networks: power-law scaling and implications for conservation planning. Ecology, 91(7): 1924–1930.
Dean L. Urban,* Emily S. Minor, Eric A. Treml and Robert S. Schick (2009): Graph models of habitat mosaics. Ecology Letters 12: 260–273.

[ edit ]
Belinda Kahnt, 04.07.2011, Sommersemester 2011

The social network structure of a wild meerkat population: 2. intragroup interactions
-study of network structure of three interaction forms: grooming, dominance interactions, foraging competition in 8 meerkat pop.
-investigation of:
A) variation of network structure between groups
B) relationship between networks for different interaction forms
C) influence of group attributes (size, sex ratio), individual attributes (tenure of dominants) and ecological factors (ectoparasite load) on network structure
- results: measures of network structure vary between groups and between interaction forms within a group
- ecological factors, group and individual attributes change network structure

[ edit ]
Alice De Mauro, 08.07.2011, Sommersemester 2011

Extending pathways and processes using molecular interaction networks to analyse cancer genome data

Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways.

We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes.

The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.

[ edit ]
michael siebauer, 17.07.2009, Sommersemester 2009

Protein Faltungs Simulationen - Review

Protein folding simulations: from coarse-grained model to all-atom model.
PMID: 19472192

[ edit ]
Tobias Mede, 17.07.2009, Sommersemester 2009

Domain-oriented edge-based alignment of protein interaction networks
Motivation: Recent advances in high-throughput experimental
techniques have yielded a large amount of data on protein–protein
interactions (PPIs). Since these interactions can be organized into
networks, and since separate PPI networks can be constructed for
different species, a natural research direction is the comparative
analysis of such networks across species in order to detect
conserved functional modules. This is the task of network alignment.
Results: Most conventional network alignment algorithms adopt a
node-then-edge-alignment paradigm: they first identify homologous
proteins across networks and then consider interactions among
them to construct network alignments. In this study, we propose
an alternative direct-edge-alignment paradigm. Specifically, instead
of explicit identification of homologous proteins, we directly infer
plausibly alignable PPIs across species by comparing conservation
of their constituent domain interactions. We apply our approach to
detect conserved protein complexes in yeast–fly and yeast–worm
PPI networks, and show that our approach outperforms two recent
approaches in most alignment performance metrics.

[ edit ]
Arli Parikesit, 13.07.2009, Sommersemester 2009

Functional protein divergence in the evolution of Homo sapiens
Background: Protein-coding regions in a genome evolve by sequence divergence and gene gain and loss, altering the gene content of the organism. However, it is not well understood how this has given rise to the enormous diversity of metazoa present today.
Results: To obtain a global view of human genomic evolution, we quantify the divergence of proteins by functional category at different evolutionary distances from human.
Conclusion: This analysis highlights some general systems-level characteristics of human evolution: regulatory processes, such as signal transducers, transcription factors and receptors, have a high degree of plasticity, while core processes, such as metabolism, transport and protein synthesis, are largely conserved. Additionally, this study reveals a dynamic picture of selective forces at short, medium and long evolutionary timescales. Certain functional categories, such as [...]

[ edit ]
Thomas Efer, Sommersemester 2009

Nature-Article: "A simple rule for the evolution of cooperation on graphs"
"A fundamental aspect of all biological systems is cooperation. Cooperative interactions are required for many levels of biological organization ranging from single cells to groups of animals. Human society is based to a large extent on mechanisms that promote cooperation. It is well known that in unstructured populations, natural selection favors defectors over cooperators. There is much current interest, however, for studying evolutionary games in structured populations and on graphs. These efforts recognize the fact that who-meets-whom is not random, but determined by spatial relationships or social networks. Here we describe a surprisingly simple rule, which is a good approximation for all graphs that we have analyzed, including cycles, spatial lattices, random regular graphs, random graphs and scale-free networks: natural selection favors cooperation, if the benefit of the altruistic act, b, divided by the cost, c, exceeds the average number of neighbors, k. Therefore, cooperation can evolve as a consequence of [...]

[OHLE06] Hisashi Ohtsuki, Christoph Hauert, Erez Lieberman and Martin A. Nowak: A simple rule for the evolution of cooperation on graphs. Nature. 2006 May 25; 441(7092): 502–505. doi: 10.1038/nature04605

[ edit ]
Daniel Himmelbach, 17.07.2009, Sommersemester 2009

A practical method for exact computation of subtree prune and regraft distance
Motivation: Subtree prune and regraft (SPR) is one kind of tree rearrangements that has seen applications in solving several computational biology problems. The minimum number of rooted SPR (rSPR) operations needed to transform one rooted binary tree to another is called the rSPR distance between the two trees.
Computing the rSPR distance has been actively studied in recent years. Currently, there is a lack of practical software tools for computing the rSPR distance for relatively large trees with large rSPR distance.
Results: In this article, we present a simple and practical method that computes the exact rSPR distance with integer linear programming.
By applying this new method on several simulated and real biological datasets, we show that our new method outperforms existing software tools in term of accuracy and efficiency. Our experimental results indicate that our method can compute the exact rSPR distance for many large trees with large rSPR distance.

Baroni,M. et al. (2005) Bounding the number of hybridisation events for a consistent evolutionary history. J. Math. Biol., 51, 171-182.
Bordewich,M. and Semple,C. (2004) On the computational complexity of the rooted subtree prune and regraft distance. Ann. Combinatorics, 8, 409-423.
Hein,J. et al. (1996) On the complexity of comparing evolutionary trees. Discrete Appl. Math., 71, 153-169.
Rodrigues,E.M. et al. (2001) Some approximation res [...]

[ edit ]
Franziska Kutzera, Sommersemester 2009

Evolutionary construction of Multiple Graph Alignments for the Structural Analysis of Biomolecules
The concept of multiple graph alignment has recently been
introduced as a novel method for the structural analysis of
biomolecules. Using approximate graph matching techniques, this
method enables the robust identification of approximately conserved
patterns in biologically related structures. In particular, multiple graph
alignment enables the characterization of functional protein families
independent of sequence or fold homology. This paper first recalls the
concept of multiple graph alignment and then addresses the problem
of computing optimal alignments from an algorithmic point of view.
In this regard, a method from the field of evolutionary algorithms is
proposed and empirically compared to a hitherto existing heuristic
approach. Empirically, it is shown that the former yields significantly
better results than the latter, albeit at the cost of an increased runtime.

Bartz-Beielstein, T. (2006). Experimental research in evolutionary computation: The new experimentalism. Springer.
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004). The Pfam protein families database. Nucleic Acids Research, 32, 138-141.
Berg, J. and L¨assig, M. (2004). Local graph alignment and motif search in biological networks. Proceeding

[ edit ]
Thomas Hofmann, Sommersemester 2009

Perrodou, et al: A new protein linear motif benchmark for multiple sequence alignment software
Linear motifs (LMs) are abundant short regulatory sites used for modulating the functions of many eukaryotic proteins. They play important roles in post-translational modification, cell compartment targeting, docking sites for regulatory complex assembly and protein processing and cleavage. Methods for LM detection are now being developed that are strongly dependent on scores for motif conservation in homologous proteins. However, most LMs are found in natively disordered polypeptide segments that evolve rapidly, unhindered by structural constraints on the sequence. These regions of modular proteins are difficult to align using classical multiple sequence alignment programs that are specifically optimised to align the globular domains. As a consequence, poor motif alignment quality is hindering efforts to detect new LMs.

We have developed a new benchmark, as part of the BAliBASE suite, designed to assess the ability of standard multiple alignment methods to detect and align LMs. The reference alignments are organised into different test sets representing real alignment problems and contain examples of experimentally verified functional motifs, extracted from the Eukaryotic Linear Motif (ELM) database. The benchmark has been used to evaluate and compare a number of multiple alignment programs. With distantly related proteins, the worst alignment program correctly aligns 48% of LMs compared to 73% for the best program. However, the performance of all the programs is adversely affected by the introduction of other sequences containing false positive motifs. The ranking of the alignment programs based on LM alignment quality is similar to that observed when considering full-length protein alignments, however little correlation was observed between LM and overall alignment quality for individual alignment test cases.

We have shown that none of the programs currently available is capable of reliabl


[ edit ]
Michael Siebauer, 02.02.2009, Wintersemester 2008/2009

Modifikation des Sankoff Algorithmus zur Homologiesuche
Vorgestellt wird eine Modifikation des Sankoff Algorithmus, die eine schnelle und speichersparende Berechnung eines semiglobalen Sequenz-/StrukturAligments ermöglicht.

Modifikation des Sankoff Algorithmus zur Homologiesuche – Bachelor Arbeit
Variantion on RNA Folding and Alignment – Lessons from Benasque

Inferring Noncoding RNA families and classes by means of Genome-Scale Structure-Based Clustering

Alignment of RNA base pairing probability matrices

Prediction of locally stable RNA secondary structures for genome-wide surveys

Secondary Structure Predicition for Aligned RNA Sequences

[ edit ]
Christoph Kämpf, Wintersemester 2008/2009

Combining statistical alignment and phylogenetic footprinting to detect regulatory elements
siehe gleichnamiges Paper.

siehe gleichnamiges Paper.

[ edit ]
Jan Engelhardt, 02.02.2009, Wintersemester 2008/2009

Something about smyRNAs and slRNAs
Interaction of smyRNAs and slRNAs

wird nachgereicht

[ edit ]
Maria Herberg, 02.02.2009, Wintersemester 2008/2009

Modelling Protein Interaction Networks - Age-Dependent Evolution of the Yeast Protein Interaction
Proteins interact in complex protein–protein interaction (PPI) networks whose topological properties—such as scale-free topology, hierarchical modularity, and dissortativity—have suggested models of network evolution. Currently preferred models invoke preferential attachment or gene duplication and divergence to produce networks whose topology matches that observed for real PPIs, thus supporting these as likely models for network evolution. Here, we show that the interaction density and homodimeric frequency are highly protein age–dependent in real PPI networks in a manner which does not agree with these canonical models. In light of these results, we propose an alternative stochastic model, which adds each protein sequentially to a growing network in a manner analogous to protein crystal growth (CG) in solution. The CG model produces PPI networks consistent in both topology and age distributions with real PPI networks and is well supported by the spatial arrangement of protein complexes of known 3-D structure, suggesting a plausible physical mechanism for network evolution. Kim WK, Marcotte EM (2008)

Kim WK, Marcotte EM (2008) Age-Dependent Evolution of the Yeast Protein Interaction Network Suggests a Limited Role of Gene Duplication and Divergence. PLoS Comput Biol 4(11);
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s
functional organization. Nat Rev Genet 5;
Kim WK, Henschel A, Winter C, Schroeder M (2006) The many faces of protein–protein interactions: A compendium of interface geometry. PLoS Comput Biol 2(9);
Newman ME, Girvan M (2004) Finding an

[ edit ]
Daniel Exner, 11.07.2008, Sommersemester 2008

Intelligente RNA-Komprimierung: Metric fuer Sekundaerstruktur Komplexitaet?
Kombinierte Sequenz und Struktur RNA Informationen als reinen Text zu betrachten und mit Standard Verfahren zu komprimieren wird als einem differenzierten Ansatz unterlegen gezeigt.
Ausserdem bietet die Algorithmus implizite Informationen zur Komplexitaet der Sekundaerstruktur.


[ edit ]
Mandy Fuchs, 11.07.2008, Sommersemester 2008

Computational prediction of host-pathogen protein-protein interactions
Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins.
They present a method that integrates known intra-species PPIs with protein-domain profiles to predict PPIs between host and pathogen proteins.

Computational prediction of host-pathogen protein-protein interactions, Matthew D. Dyer, T. M. Murali and Bruno W. Sobral, Bioinformatics 2007

[ edit ]
Sebastian Bartschat, 01.02.2008, Wintersemester 2007/2008

comparative structure prediction of RNA molecules - using a non Sankoff approach
im wesentlichen dreht sich der vortrag um das tool RNAspa beziehungsweise um den algorithmus der dahinter steckt.
des weiteren wird er dann mit dem algotihmus hinter RNAcast verglichen.



[ edit ]
Mandy Fuchs, 09.07.2007, Sommersemester 2007

Structural Alignment of Two RNA Sequences with Lagrangian Relaxation
RNA is generally a single-stranded molecule where the bases form hydrogen bonds within the same molecule leading to structure formation. In comparing different homologous RNA molecules it is usually not sufficient to consider only the primary sequence, but it is important to consider both the sequence and the structure of the molecules. Traditional alignment algorithms can only account for the sequence of bases, but not for the base pairings. Considering the structure leads to significant computational problems because of the dependencies introduced by the base pairings and the presence of pseudoknots. In this paper we address the problem of optimally aligning two given RNA sequences either with or without known structure (allowing for pseudoknots). We phrase the problem as an integer linear program and then solve it using Lagrangian relaxation. In our computational experiments we could align large problem instances—18S and 23S ribosomal RNA with up to 1500 bases within minutes while preserving pseudoknots.

[ edit ]
Jakob Muehmel, Sommersemester 2007

Multiple alignment by sequence annealing

[ edit ]
Andrej Aderhold, Sommersemester 2007

Inference of miRNA targets using evolutionary conservation and pathway analysis
BACKGROUND: MicroRNAs have emerged as important regulatory genes in a variety of cellular processes and, in recent years, hundreds of such genes have been discovered in animals. In contrast, functional annotations are available only for a very small fraction of these miRNAs, and even in these cases only partially. RESULTS: We developed a general Bayesian method for the inference of miRNA target sites, in which, for each miRNA, we explicitly model the evolution of orthologous target sites in a set of related species. Using this method we predict target sites for all known miRNAs in flies, worms, fish, and mammals. By comparing our predictions in fly with a reference set of experimentally tested miRNA-mRNA interactions we show that our general method performs at least as well as the most accurate methods available to date, including ones specifically tailored for target prediction in fly. An important novel feature of our model is that it explicitly infers the phylogenetic distribution of functional target sites, independently for each miRNA. This allows us to infer species-specific and clade-specific miRNA targeting. We also show that, in long human 3

[ edit ]
Sebastian Bartschat, Sommersemester 2007

Discovering structural motifs using a structural alphabet: Application to magnesium binding sites

[ edit ]
Lydia Steiner, Sommersemester 2007

Ontology development for biological systems: immunology title=gene%20ontology%20owl&andorexacttitle=or&titleabstract=gene%20ontology%20owl&andorexacttitleabs=or& ;fulltext=gene%20ontology%20owl&andorexactfulltext=or&searchid=1&FIRSTINDEX=0&sortspec=date&resource type=HWCIT

[ edit ]
Marcus Lechner, Sommersemester 2007

Understanding and using the meaning of statements in a bio-ontology

[ edit ]
Christian Arnold, Sommersemester 2007

BranchClust: a phylogenetic algorithm for selecting gene families

Automated methods for assembling families of orthologous genes include those based on sequence similarity scores and those based on phylogenetic approaches. The first are easy to automate but usually they do not distinguish between paralogs and orthologs or have restriction on the number of taxa. Phylogenetic methods often are based on reconciliation of a gene tree with a known rooted species tree; a limitation of this approach, especially in case of prokaryotes, is that the species tree is often unknown, and that from the analyses of single gene families the branching order between related organisms frequently is unresolved.

Here we describe an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and outparalogs. The BranchClust algorithm is implemented in Perl with the use of the BioPerl module for parsing trees and is freely available at

BranchClust outperforms the Reciprocal Best Blast hit method in selecting more sets of putatively orthologous genes. In the test cases examined, the correctness of the selected families and of the identified in- and outparalogs was confirmed by inspection of the pertinent phylogenetic trees.

[ edit ]
Christoph Theunert, Sommersemester 2007

miRNA - targetScan

[ edit ]
Christian Otto, Sommersemester 2007

Comparing sequences without using alignments: application to HIV/SIV subtyping

[ edit ]
Julian Joeris, Sommersemester 2007

ILP - Integer Linear Programming

[ edit ]
Michael Siebauer, Sommersemester 2007

Identifying bacterial genes and endosymbiont DNA with Glimmer
Die Autoren haben ein Modul (bzw. Ansatz gefunden) um bakterielle Gene aus dem Organismus Genom zu filtern. Es gibt wohl intrazellulär lebende Bakterien deren Genom beim Sequenzieren mit dem Wirtsgenom vermischt wird. Das OpenSource Programm "Glimmer" kann mittels trainierter Hidden-Markov-Modelle diese beiden Genome wieder trennen.

Bioinformatics 2007 23(6):673-679