| Abstract: |
We will present an algorithm specifically designed for clustering short time series gene expression data. Time series expression experiments are
used to study a wide range of biological systems. More than two thirds of all time series expression datasets are short (7 or fewer time points).
These datasets presents unique challenges. Due to the large number of genes profiled (often tens of thousands) and the small number of time
points many patterns are expected to arise at random. Most clustering algorithms are unable to distinguish between real and random patterns. Our
algorithm works by assigning genes to a pre-defined set of expression profiles that capture the potential distinct patterns that can be expected from the experiment. We will discuss how to obtain such a set of profiles and how to determine the significance of each of these profiles. Significant profiles are combined to form clusters. We tested our method
on both simulated and real biological data. Using immune response data we show that our algorithm can correctly detect the temporal profile of relevant functional categories. Using a Gene Ontology (GO) analysis we show that our algorithm outperforms both general clustering algorithms and algorithms designed specifically for clustering time series gene expression data. The algorithm has been implemented in the software Short Time-series Expression Miner (STEM) that is currently being used by biologists, we will present a demonstration of the software.
Joint work with Ziv Bar-Joseph |