2 min read

Module 6: De novo genome assembly: Long reads

Overview

In this module, we will explore the process of de novo genome assembly using long-read sequencing technologies. We will delve into the advantages and challenges of long-read sequencing, the steps involved in assembling a genome from scratch, and the tools and techniques used to analyze and validate the quality of the assembled genome. This module will combine theoretical knowledge with practical labs to ensure a comprehensive understanding of de novo genome assembly with long reads.

Introduction

  • Definition of de novo genome assembly
  • Importance of genome assembly in genomics
  • Overview of sequencing technologies: short reads vs. long reads
  • Advantages and limitations of long-read sequencing for de novo assembly

Topics

Understanding Long-Read Sequencing Technologies

  • Introduction to Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)
  • Comparison of long-read technologies with short-read technologies
  • Error profiles and implications for assembly

Pre-Assembly Data Processing

  • Quality control of long-read sequencing data

  • Error correction strategies for long reads

  • Tools for read trimming and filtering

  • Running De Novo Assembly

  • Overview of de novo assembly algorithms for long reads

  • Step-by-step guide to running a de novo assembly

  • Introduction to popular assembly tools (Canu, Flye, Miniasm)

  • Best practices for parameter selection and computational considerations

Post-Assembly Processing

  • Assembly polishing and error correction using additional data
  • Scaffolding and gap filling to improve contiguity

Analysis of Assembly Summary Statistics

  • Understanding key assembly metrics (N50, L50, Total Length)
  • Tools for assembly evaluation (QUAST, assembly-stats)
  • Interpreting assembly quality and completeness

Labs

  • Lab 1: Quality Control and Read Correction
  • Lab 2: Running a De Novo Assembly with Canu
  • Lab 3: Assembly Polishing and Error Correction
  • Lab 4: Evaluating Assembly Quality with QUAST

Learning Outcomes

By the end of this module, students will be able to:

  • Describe the advantages and challenges of using long-read sequencing for de novo genome assembly.
  • Perform quality control and error correction on long-read sequencing data.
  • Execute a de novo genome assembly using long-read data and appropriate assembly software.
  • Analyze and interpret assembly summary statistics to assess the quality of a genome assembly.
  • Understand the importance of assembly polishing and the methods used to improve assembly accuracy.
  • Apply best practices for de novo assembly and troubleshoot common issues.
  • Critically evaluate the results of a genome assembly and plan subsequent steps for genome finishing.