BME/Genomics Seminar: Supervised and Unsupervised DeepGene Finding and Genome Foundation Models

Presenter: Mario Stanke, Professor of Bioinformatics, University of Greifswald
Description: This talk will explore recent machine learning approaches for eukaryotic genome annotation. Our supervised ab initio deep gene finder, Tiberius, correctly predicts more than four times as many human protein-coding gene structures as its father, Augustus, and in some clades, it approaches the accuracy of evidence-based pipelines such as BRAKER. Genome foundation models can automatically learn annotation-relevant embeddings from unannotated training genomes. I will also present Vipsania, the unsupervised wife of Tiberius. Vipsania is a genome foundation model that learns hidden Markov models to find gene structures from naked genomes using a BERT-style masked language model objective. Finally, I will report on ongoing efforts to use phylogenetic teaching signals from whole-genome vertebrate alignments to train a genome foundation model comparatively.
Keywords: hidden Markov model layer, linear recurrent unit, continuous-time Markov chains on trees
Bio: Mario Stanke studied mathematics and computer science at the University of Göttingen and UCBerkeley, and received his Dr. rer. nat. from the University of Göttingen. He completed a postdoctoral fellowship in the Haussler lab at UC Santa Cruz in 2006–2007. He has been a Professor of Bioinformatics at the Institute of Mathematics and Computer Science at the University of Greifswald since 2010.
Hosted by: Genomics Institute
Location: E2-599 (limited space)
Zoom: https://ucsc.zoom.us/j/95380317295?pwd=0HbwSYKRQqyCtBcPXGfoB0tPOsA16V.1