Assessing mechanisms of tumour evolution using alignment-free methods


Session type:


Aideen Roddy1,Anna Jurek2,Alex Stupnikov3,Paul O'Reilly3,Phillip Dunne3,David Gonzales de Castro3,Kevin Prise3,Manuel Salto-Tellez3,Darragh McArt3
1Queen's University Belfast,2School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast,3Centre for Cancer Research and Cell Biology, Queens University Belfast



Next-Generation Sequencing allows for the in-depth sequencing of genetic materials in high throughput. Currently, sequencing data is aligned prior to downstream analysis. However, with alternating pipelines required this over-simplifies the complex nature of the cancer landscape. We aim to highlight the potential applications of alignment-free clustering using glioma as an initial application base. Glioma is the most common malignant brain tumour in adults and unfortunately, frequently recurs. As a result there are urgent requirements to explore this cancer type and target actionable therapeutics.


Initially we implemented the alignment-free approach proposed by Sims et al. as a benchmark before developing alternative approaches such as term frequency-inverse document frequency(tf-idf) and enhanced distance metrics. We developed this framework for application in Exome-Seq data using multi-regional and recurrent data from a glioma cohort. Applying this algorithm involves 4 steps: Pre-processing; Building a feature frequency profile(FFP); Applying a distance metric and visualising the data using Self-organising maps (SOMs).


Given the scale of the Exome-seq files (20-50GB), to benchmark we have created subsamples of sequence information to measure sensitivity and create unrooted neighbour-joining trees. Initially, an FFP was created for a sub-sample of each file using tf-idf which involves normalising the FFP before applying an importance factor to each feature according to how often it appears across all of the files in the cohort. Both Jensen-Shannon Divergence and SOMs were applied to the result.  


Our initial results, using Exome-seq data, along with the abundance of successful applications of alignment-free analysis in sequencing has revealed the power of this approach showing promise for the future of tumour sequencing analysis in cancer research. Furthermore, we aim to continue exploring this methodology with the potential to eventually combine multiple sequencing modalities in order to obtain a more accurate interpretation for a true patient passport.