Overview
In this module, we will explore the process of de novo genome assembly using long-read sequencing technologies. We will delve into the advantages and challenges of long-read sequencing, the steps involved in assembling a genome from scratch, and the tools and techniques used to analyze and validate the quality of the assembled genome. This module will combine theoretical knowledge with practical labs to ensure a comprehensive understanding of de novo genome assembly with long reads.
Introduction
- Definition of de novo genome assembly
- Importance of genome assembly in genomics
- Overview of sequencing technologies: short reads vs. long reads
- Advantages and limitations of long-read sequencing for de novo assembly
Topics
Understanding Long-Read Sequencing Technologies
- Introduction to Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT)
- Comparison of long-read technologies with short-read technologies
- Error profiles and implications for assembly
Pre-Assembly Data Processing
-
Quality control of long-read sequencing data
-
Error correction strategies for long reads
-
Tools for read trimming and filtering
-
Running De Novo Assembly
-
Overview of de novo assembly algorithms for long reads
-
Step-by-step guide to running a de novo assembly
-
Introduction to popular assembly tools (Canu, Flye, Miniasm)
-
Best practices for parameter selection and computational considerations
Post-Assembly Processing
- Assembly polishing and error correction using additional data
- Scaffolding and gap filling to improve contiguity
Analysis of Assembly Summary Statistics
- Understanding key assembly metrics (N50, L50, Total Length)
- Tools for assembly evaluation (QUAST, assembly-stats)
- Interpreting assembly quality and completeness
Labs
- Lab 1: Quality Control and Read Correction
- Lab 2: Running a De Novo Assembly with Canu
- Lab 3: Assembly Polishing and Error Correction
- Lab 4: Evaluating Assembly Quality with QUAST
Learning Outcomes
By the end of this module, students will be able to:
- Describe the advantages and challenges of using long-read sequencing for de novo genome assembly.
- Perform quality control and error correction on long-read sequencing data.
- Execute a de novo genome assembly using long-read data and appropriate assembly software.
- Analyze and interpret assembly summary statistics to assess the quality of a genome assembly.
- Understand the importance of assembly polishing and the methods used to improve assembly accuracy.
- Apply best practices for de novo assembly and troubleshoot common issues.
- Critically evaluate the results of a genome assembly and plan subsequent steps for genome finishing.