Afficher la notice abrégée

dc.contributor.author FETNI, Atika
dc.date.accessioned 2024-10-15T10:56:53Z
dc.date.available 2024-10-15T10:56:53Z
dc.date.issued 2024-06-08
dc.identifier.uri http//localhost:8080/jspui/handle/123456789/12105
dc.description.abstract Understanding the language of non-coding DNA is a major topic in genomic research. Gene regulatory code is extremely complicated due to the presence of polysemy and distant semantic relationships, which earlier informatics approaches frequently fail to capture. To address this difficulty, we used DNABERT, a unique pre-trained bidirectional encoder representation that captures global and transferable comprehension of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most popular systems for predicting genome-wide regulatory elements and found that it was easier to use, more accurate, and more efficient. We demonstrate that a single pre-trained transformers model can reach state-of-the-art performance in the prediction of promoters, splice sites, and transcription factor binding sites following simple fine-tuning using modest task-specific labeled data. Furthermore, DNABERT allows for direct display of nucleotide-level significance and semantic relationships within input sequences, resulting in improved interpretability and more accurate identification of conserved sequence motifs and functional genetic variant possibilities. en_US
dc.language.iso en en_US
dc.publisher University Larbi Tébessi – Tébessa en_US
dc.subject DNA, BERT, DNABert, LRM, NLP. en_US
dc.title Bert based DNA pattern recognition en_US
dc.type Thesis en_US


Fichier(s) constituant ce document

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée