Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1

Liina Kinkar, Pasi K. Korhonen, Huimin Cai, Charles G. Gauci, Marshall W. Lightowlers, Urmas Saarma, David J. Jenkins, Jiandong Li, Junhua Li, Neil D. Young, Robin B. Gasser

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (> 1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method. Methods: We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow. Results: We assembled a complete mt genome sequence of 17,675 bp, which is > 4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat. Conclusions: We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such 'new' repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.

Original languageEnglish
Article number238
JournalParasites and Vectors
Volume12
Issue number1
DOIs
Publication statusPublished - 16 May 2019

Fingerprint

Echinococcus granulosus
Tandem Repeat Sequences
Mitochondrial Genome
Genotype
Echinococcus
Cestoda
Genome
Workflow
Echinococcosis
Zoonoses
Genetic Markers
Cysts
Sheep
Public Health
Molecular Weight
Technology
Polymerase Chain Reaction
DNA

Cite this

Kinkar, Liina ; Korhonen, Pasi K. ; Cai, Huimin ; Gauci, Charles G. ; Lightowlers, Marshall W. ; Saarma, Urmas ; Jenkins, David J. ; Li, Jiandong ; Li, Junhua ; Young, Neil D. ; Gasser, Robin B. / Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1. In: Parasites and Vectors. 2019 ; Vol. 12, No. 1.
@article{920b8bab79544ea58811a746aefaf337,
title = "Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1",
abstract = "Background: Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (> 1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method. Methods: We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow. Results: We assembled a complete mt genome sequence of 17,675 bp, which is > 4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat. Conclusions: We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such 'new' repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.",
keywords = "Complete mitochondrial (mt) genome, Echinococcus, Genotype G1, PacBio sequencing, Repetitive DNA",
author = "Liina Kinkar and Korhonen, {Pasi K.} and Huimin Cai and Gauci, {Charles G.} and Lightowlers, {Marshall W.} and Urmas Saarma and Jenkins, {David J.} and Jiandong Li and Junhua Li and Young, {Neil D.} and Gasser, {Robin B.}",
year = "2019",
month = "5",
day = "16",
doi = "10.1186/s13071-019-3492-x",
language = "English",
volume = "12",
journal = "Parasites and Vectors",
issn = "1756-3305",
publisher = "BioMed Central",
number = "1",

}

Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1. / Kinkar, Liina; Korhonen, Pasi K.; Cai, Huimin; Gauci, Charles G.; Lightowlers, Marshall W.; Saarma, Urmas; Jenkins, David J.; Li, Jiandong; Li, Junhua; Young, Neil D.; Gasser, Robin B.

In: Parasites and Vectors, Vol. 12, No. 1, 238, 16.05.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Long-read sequencing reveals a 4.4 kb tandem repeat region in the mitogenome of Echinococcus granulosus (sensu stricto) genotype G1

AU - Kinkar, Liina

AU - Korhonen, Pasi K.

AU - Cai, Huimin

AU - Gauci, Charles G.

AU - Lightowlers, Marshall W.

AU - Saarma, Urmas

AU - Jenkins, David J.

AU - Li, Jiandong

AU - Li, Junhua

AU - Young, Neil D.

AU - Gasser, Robin B.

PY - 2019/5/16

Y1 - 2019/5/16

N2 - Background: Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (> 1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method. Methods: We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow. Results: We assembled a complete mt genome sequence of 17,675 bp, which is > 4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat. Conclusions: We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such 'new' repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.

AB - Background: Echinococcus tapeworms cause a severe helminthic zoonosis called echinococcosis. The genus comprises various species and genotypes, of which E. granulosus (sensu stricto) represents a significant global public health and socioeconomic burden. Mitochondrial (mt) genomes have provided useful genetic markers to explore the nature and extent of genetic diversity within Echinococcus and have underpinned phylogenetic and population structure analyses of this genus. Our recent work indicated a sequence gap (> 1 kb) in the mt genomes of E. granulosus genotype G1, which could not be determined by PCR-based Sanger sequencing. The aim of the present study was to define the complete mt genome, irrespective of structural complexities, using a long-read sequencing method. Methods: We extracted high molecular weight genomic DNA from protoscoleces from a single cyst of E. granulosus genotype G1 from a sheep from Australia using a conventional method and sequenced it using PacBio Sequel (long-read) technology, complemented by BGISEQ-500 short-read sequencing. Sequence data obtained were assembled using a recently-developed workflow. Results: We assembled a complete mt genome sequence of 17,675 bp, which is > 4 kb larger than the complete mt genomes known for E. granulosus genotype G1. This assembly includes a previously-elusive tandem repeat region, which is 4417 bp long and consists of ten near-identical 441-445 bp repeat units, each harbouring a 184 bp non-coding region and adjacent regions. We also identified a short non-coding region of 183 bp, which includes an inverted repeat. Conclusions: We report what we consider to be the first complete mt genome of E. granulosus genotype G1 and characterise all repeat regions in this genome. The numbers, sizes, sequences and functions of tandem repeat regions remain to be studied in different isolates of genotype G1 and in other genotypes and species. The discovery of such 'new' repeat elements in the mt genome of genotype G1 by PacBio sequencing raises a question about the completeness of some published genomes of taeniid cestodes assembled from conventional or short-read sequence datasets. This study shows that long-read sequencing readily overcomes the challenges of assembling repeat elements to achieve improved genomes.

KW - Complete mitochondrial (mt) genome

KW - Echinococcus

KW - Genotype G1

KW - PacBio sequencing

KW - Repetitive DNA

UR - http://www.scopus.com/inward/record.url?scp=85065872549&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065872549&partnerID=8YFLogxK

U2 - 10.1186/s13071-019-3492-x

DO - 10.1186/s13071-019-3492-x

M3 - Article

VL - 12

JO - Parasites and Vectors

JF - Parasites and Vectors

SN - 1756-3305

IS - 1

M1 - 238

ER -