genomic-medicine-sweden/nallo: Output
Aligned reads
Minimap2 is used to map the reads to a reference genome. The aligned reads are sorted, (merged) and indexed using samtools.
Path | Description |
---|---|
aligned_reads/minimap2/{sample}/*.bam |
Alignment file in bam format |
aligned_reads/minimap2/{sample}/*.bai |
Index of the corresponding bam file |
If the pipeline is run with phasing, the aligned reads will be happlotagged using the active phasing tool.
Path | Description |
---|---|
{outputdir}/aligned_reads/{sample}/{sample}_haplotagged.bam |
BAM file with haplotags |
{outputdir}/aligned_reads/{sample}/{sample}_haplotagged.bam.bai |
Index of the BAM file |
Note
Alignments will only be output without haplotags if phasing is off.
Assembly
Hifiasm is used to assemble genomes. The assembled haplotypes are then comverted to fasta files using gfastats. A deconstructed version of dipcall is to map the assembled haplotypes back to the reference genome.
Path | Description |
---|---|
assembly_haplotypes/gfastats/{sample}/*hap1.p_ctg.fasta.gz |
Assembled haplotype 1 |
assembly_haplotypes/gfastats/{sample}/*hap2.p_ctg.fasta.gz |
Assembled haplotype 2 |
assembly_haplotypes/gfastats/{sample}/*.assembly_summary |
Summary statistics |
assembly_variant_calling/dipcall/{sample}/*hap1.bam |
Assembled haplotype 1 mapped to the reference genome |
assembly_variant_calling/dipcall/{sample}/*hap1.bai |
Index of the corresponding BAM file for haplotype 1 |
assembly_variant_calling/dipcall/{sample}/*hap2.bam |
Assembled haplotype 2 mapped to the reference genome |
assembly_variant_calling/dipcall/{sample}/*hap2.bai |
Index of the corresponding BAM file for haplotype 2 |
Methylation pileups
Modkit is used to create methylation pileups, producing bedMethyl files for both haplotagged and ungrouped reads. Additionaly, methylation information can be viewed in the BAM files, for example in IGV.
Path | Description |
---|---|
methylation/modkit/pileup/phased/{sample}/*.modkit_pileup_phased_*.bed.gz |
bedMethyl file with summary counts from haplotagged reads |
methylation/modkit/pileup/phased/{sample}/*.modkit_pileup_phased_ungrouped.bed.gz |
bedMethyl file for ungrouped reads |
methylation/modkit/pileup/unphased/{sample}/*.modkit_pileup.bed.gz |
bedMethyl file with summary counts from all reads |
methylation/modkit/pileup/unphased/{sample}/*.bed.gz.tbi |
Index of the corresponding bedMethyl file |
MultiQC
MultiQC generates an HTML report summarizing all samples' QC results and pipeline statistics.
Path | Description |
---|---|
multiqc/multiqc_report.html |
HTML report summarizing QC results |
multiqc/multiqc_data/ |
Directory containing parsed statistics |
multiqc/multiqc_plots/ |
Directory containing static report images |
Pipeline Information
Nextflow generates reports for troubleshooting, performance, and traceability.
Path | Description |
---|---|
pipeline_info/execution_report.html |
Execution report |
pipeline_info/execution_timeline.html |
Timeline report |
pipeline_info/execution_trace.txt |
Execution trace |
pipeline_info/pipeline_dag.dot |
Pipeline DAG in DOT format |
pipeline_info/pipeline_report.html |
Pipeline report |
pipeline_info/software_versions.yml |
Software versions used in the run |
Phasing
LongPhase, WhatsHap, or HiPhase are used for phasing.
Path | Description |
---|---|
{outputdir}/aligned_reads/{sample}/{sample}_haplotagged.bam |
BAM file with haplotags |
{outputdir}/aligned_reads/{sample}/{sample}_haplotagged.bam.bai |
Index of the BAM file |
{outputdir}/phased_variants/{sample}/*.vcf.gz |
VCF file with phased variants |
{outputdir}/phased_variants/{sample}/*.vcf.gz.tbi |
Index of the VCF file |
{outputdir}/qc/phasing_stats/{sample}/*.blocks.tsv |
Phase block file |
{outputdir}/qc/phasing_stats/{sample}/*.stats.tsv |
Phasing statistics file |
QC
FastQC, cramino, mosdepth, and somalier are used for read quality control.
FastQC
FastQC provides general quality metrics for sequenced reads, including information on quality score distribution, per-base sequence content (%A/T/G/C), adapter contamination, and overrepresented sequences. For more details, refer to the FastQC help pages.
Path | Description |
---|---|
{outputdir}/qc/fastqc/{sample}/*_fastqc.html |
FastQC report containing quality metrics |
{outputdir}/qc/fastqc/{sample}/*_fastqc.zip |
Zip archive with the FastQC report, data files, and plot images |
Mosdepth
Mosdepth is used to report quality control metrics such as coverage and GC content from alignment files.
Path | Description |
---|---|
{outputdir}/qc/mosdepth/{sample}/*.mosdepth.global.dist.txt |
Cumulative distribution of bases covered for at least a given coverage value, across chromosomes and the whole genome |
{outputdir}/qc/mosdepth/{sample}/*.mosdepth.region.dist.txt |
Cumulative distribution of bases covered for at least a given coverage value, across regions (if a BED file is used) |
{outputdir}/qc/mosdepth/{sample}/*.mosdepth.summary.txt |
Mosdepth summary file |
{outputdir}/qc/mosdepth/{sample}/*.regions.bed.gz |
Depth per region (if a BED file is used) |
{outputdir}/qc/mosdepth/{sample}/*.regions.bed.gz.csi |
Index of the regions.bed.gz file |
Cramino
cramino is used to analyze both phased and unphased reads.
Path | Description |
---|---|
{outputdir}/qc/cramino/phased/{sample}/*.arrow |
Read length and quality in Apache Arrow format |
{outputdir}/qc/cramino/phased/{sample}/*.txt |
Summary information in text format |
{outputdir}/qc/cramino/unphased/{sample}/*.arrow |
Read length and quality in Apache Arrow format |
{outputdir}/qc/cramino/unphased/{sample}/*.txt |
Summary information in text format |
Somalier
somalier checks relatedness and sex.
Path | Description |
---|---|
{outputdir}/predigree/{project}.ped |
PED file updated with somalier-inferred sex |
{outputdir}/qc/somalier/relate/{project}/{project}.html |
HTML report |
{outputdir}/qc/somalier/relate/{project}/{project}.pairs.tsv |
Information about sample pairs |
{outputdir}/qc/somalier/relate/{project}/{project}.samples.tsv |
Information about individual samples |
Variants
CNVs
HiFiCNV is used to call CNVs, producing copy number, depth, and MAF tracks for IGV.
Path | Description |
---|---|
cnv_calling/hificnv/{sample}/*.copynum.bedgraph |
Copy number in bedgraph format |
cnv_calling/hificnv/{sample}/*.depth.bw |
Depth track in BigWig format |
cnv_calling/hificnv/{sample}/*.maf.bw |
Minor allele frequencies in BigWig format |
cnv_calling/hificnv/{sample}/*.vcf.gz |
VCF file containing CNV variants |
cnv_calling/hificnv/{sample}/*.vcf.gz.tbi |
Index of the corresponding VCF file |
Paralogous genes
Paraphase is used to call paralogous genes.
Path | Description |
---|---|
paraphase/{sample}/*.bam |
BAM file with haplotypes grouped by HP |
paraphase/{sample}/*.bai |
Index of the BAM file |
paraphase/{sample}/*.json |
Summary of haplotypes and variant calls |
paraphase/{sample}_paraphase_vcfs/{sample}_{gene}_vcf |
VCF file per gene |
paraphase/{sample}_paraphase_vcfs/{sample}_{gene}_vcf.tbi |
Index of the VCF file |
Repeats
TRGT is used to call repeats:
Path | Description |
---|---|
{outputdir}/repeat_calling/trgt/multi_sample/{project}/*.vcf.gz |
Merged VCF file for all samples |
{outputdir}/repeat_calling/trgt/multi_sample/{project}/*.vcf.gz.tbi |
Index of the VCF file |
{outputdir}/repeat_calling/trgt/single_sample/{sample}/*.vcf.gz |
VCF file with called repeats for a sample |
{outputdir}/repeat_calling/trgt/single_sample/{sample}/*.vcf.gz.tbi |
Index of the VCF file |
{outputdir}/repeat_calling/trgt/single_sample/{sample}/*.bam |
BAM file with sorted spanning reads |
{outputdir}/repeat_calling/trgt/single_sample/{sample}/*.bai |
Index of the BAM file |
Stranger is used to annotate them:
Path | Description |
---|---|
{outputdir}/repeat_annotation/stranger/{sample}/*.vcf.gz |
Annotated VCF file |
{outputdir}/repeat_annotation/stranger/{sample}/*.vcf.gz.tbi |
Index of the annotated VCF file |
SNVs
DeepVariant is used to call variants, while bcftools and GLnexus are used for merging variants.
Note
Variants are only output without annotation and ranking if these subworkflows are turned off.
Path | Description |
---|---|
snvs/single_sample/{sample}/{sample}_snv.vcf.gz |
VCF file containing called variants with alternative genotypes for a sample |
snvs/single_sample/{sample}/{sample}_snv.vcf.gz.tbi |
Index of the corresponding VCF file |
snvs/multi_sample/{project}/{project}_snv.vcf.gz |
VCF file containing called variants for all samples |
snvs/multi_sample/{project}/{project}_snv.vcf.gz.tbi |
Index of the corresponding VCF file |
snvs/stats/single_sample/*.stats.txt |
Variant statistics |
echtvar and VEP are used for annotating SNVs, while CADD is used to annotate INDELs with CADD scores.
Note
Variants are only output without ranking if that subworkflows are turned off.
Path | Description |
---|---|
databases/echtvar/encode/{project}/*.zip |
Database with allele frequency (AF) and allele count (AC) for all samples |
snvs/single_sample/{sample}/{sample}_snv_annotated.vcf.gz |
VCF file containing annotated variants with alternative genotypes for a sample |
snvs/single_sample/{sample}/{sample}_snv_annotated.vcf.gz.tbi |
Index of the annotated VCF file |
snvs/multi_sample/{project}/{project}_snv_annotated.vcf.gz |
VCF file containing annotated variants for all samples |
snvs/multi_sample/{project}/{project}_snv_annotated.vcf.gz.tbi |
Index of the annotated VCF file |
GENMOD is used to rank the annotated SNVs and INDELs.
Path | Description |
---|---|
snvs/single_sample/{sample}/{sample}_snv_annotated_ranked.vcf.gz |
VCF file with annotated and ranked variants for a sample |
snvs/single_sample/{sample}/{sample}_snv_annotated_ranked.vcf.gz.tbi |
Index of the ranked VCF file |
snvs/multi_sample/{project}/{project}_snv_annotated_ranked.vcf.gz |
VCF file with annotated and ranked variants for all samples |
snvs/multi_sample/{project}/{project}_snv_annotated_ranked.vcf.gz.tbi |
Index of the ranked VCF file |
SVs
Severus or Sniffles is used to call structural variants, and SVDB is used to merge variants within and between samples.
Note
Variants are only output without annotation if that subworkflow is turned off.
Path | Description |
---|---|
svs/multi_sample/{project}/{project}_svs.vcf.gz |
VCF file with merged structural variants for all samples |
svs/multi_sample/{project}/{project}_svs.vcf.gz.tbi |
Index of the merged VCF file |
svs/single_sample/{sample}/*.vcf.gz |
VCF file with merged structural variants for a single sample |
svs/single_sample/{sample}/*.vcf.gz.tbi |
Index of the VCF file |
SVDB and VEP are used to annotate structural variants.
Path | Description |
---|---|
svs/multi_sample/{project}/{project}_svs_annotated.vcf.gz |
VCF file with annotated merged structural variants for all samples |
svs/multi_sample/{project}/{project}_svs_annotated.vcf.gz.tbi |
Index of the annotated VCF file |
svs/single_sample/{sample}/*.vcf_annotated.gz |
VCF file with annotated structural variants for a single sample |
svs/single_sample/{sample}/*.vcf_annotated.gz.tbi |
Index of the annotated VCF file |