Académie d'Excellence "Complexité et diversité du vivant"
on the April 20, 2021
Online
It is our pleasure to announce the Academy 4 Research Webinars on Bio-Medical & Transdisciplinary Topics.
They take place every 3rd Tuesday of the Month.
The 2nd session took place online on April 20th, 2021 at Noon. The video recording of the webinar is here.
PROGRAM: "DATA STORAGE IN DNA: WHY & HOW?"
SPEAKER 1:
Dr Marc ANTONINI, PhD, DR, I3S Laboratory, Université Côte d'Azur & CNRS (https://mediacoding.i3s.unice.fr/index.php/fr/membres/chercheurs/70-marc-antonini)
Title : "Archiving cold digital images on synthetic DNA: Is DNA the future of data storage?"
Abstract : Storage of digital data is becoming challenging for the humanity due to the relatively short life span of storage devices. Furthermore, the exponential increase in the generation of digital data is creating the need for constantly constructing new resources to handle the storage of this data volume. Recent studies suggest the use of the DNA molecule as a promising novel candidate which can hold 500Gbyte/mm3 (1000 times more than HDD drives). Any digital information can be synthesized into DNA in vitro and stored into special tiny storage capsules that can promise reliability for hundreds of years. The stored DNA sequence can be retrieved whenever needed using special machines that are called the sequencers. This whole process is very challenging as the process of DNA synthesis is expensive in terms of money and sequencing is prone to errors. However, studies have shown that when respecting several rules in the encoding the probability of sequencing error is reduced. Consequently, the encoding of digital information is not trivial, and the input data needs to be efficiently compressed before encoding so that the high synthesis cost is reduced. In this presentation we will talk about the state of the art in DNA data storage for the efficient encoding of digital data into a quaternary code that consists of the 4 DNA bases A (Adenine), T (Thymine), C (Cytosine) and G (Guanine). We will also present a new promising encoding solution of digital images into synthetic DNA we developed over the past 3 years which takes into consideration the needs of DNA data storage while optimizing the trade-off compression quality and synthesis cost.
Keywords : Image coding, DNA storage, compression
SPEAKER 2:
Dr Raja APPUSWAMY, PhD, MCU, EURECOM (https://www.eurecom.fr/en/people/appuswamy-raja)
Title : “Scaling edit similarity computations for DNA storage and beyond”
Abstract : In the European Commission-funded Future and Emerging Technologies initiative OligoArchive, we are working on transforming DNA–the biological building block of life–into a digital building block for long-term data archival. One of the key steps in retrieving digital data stored in DNA involves clustering billions of strings with respect to edit distance. The computationally intensive nature of edit distance computation has made this step a critical bottleneck in the DNA data retrieval pipeline. In this talk, we present project OneOligo—our scalable, hardware-accelerated solution for DNA read clustering. In doing so, we first provide an overview of the the DNA data storage pipeline. Then, we present OneJoin—a string-similarity join algorithm that synergistically combines algorithmic advances in low-distortion embedding with cross-architectural programming ability offered by DPC++, to scale-up clustering across CPUs and GPUs.
Keywords: DNA storage, edit distance, clustering, alignment
USEFUL LINKS:
- https://oligoarchive.github.io/
- https://cordis.europa.eu/project/id/863320
- https://news.cnrs.fr/articles/synthetic-dna-holds-great-promise-for-data-storage
- https://imtech.wp.imt.fr/en/2020/03/26/dna-as-the-data-storage-medium-oligoarchive/
ORGANIZERS:
Academy of Excellence 4 "Complexity & Diversity of the Living Systems"
Academy of Excellence 5 "Human Societies, Ideas and Environments"
Graduate School and Research HEALTHY - Health Science Ecosystems
Graduate School and Research LIFE - Life and Health Sciences
Institute NeuroMod - Cognitive Systems, Normality and Pathology of the Human Brain and Computational Neurosciences
Labex SIGNALIFE - Network for Innovation on Signal Transduction Pathways in Life Sciences