# Data Upload

With the SEQ Platform, you can directly upload your files or use the cloud browser to browse and select the files pre-uploaded to your account.

# Direct Data Upload

Click on the "Upload" button at your homepage and select "Somatic" option. Here, you can select FASTQ or VCF options.

# FASTQ Upload

# Run Selection

You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.

# Diagnostics

Next, you can add the diagnostic information including cancer type and the biomarker status.

# Cancer Type

(Mandatory) Cancer Type (e.g. Breast Cancer)
(Optional) Advanced Solid Tumor: Select this option if the sample or case involves a late-stage or metastatic solid tumor, regardless of the specific cancer type.

# Biomarkers

(Optional) TMB status: tumor mutation burden status (high | intermediate | low).The specified TMB status will be used in associating relevant clinical trials and drugs.
(Optional) MSI status: Microsatellite instability status ( high | low | stable ). The specified MSI status is displayed in the report and used in associating relevant clinical trials and drugs.
(Optional) HRD status: Homologous Recombination Deficiency status (high | low). The HRD status will be used in associating relevant clinical trials and drugs.
(Optional) TILs status: Tumor Infiltrating Lymphocytes status (high | low). The TILs status will be used in associating relevant clinical trials and drugs.
(Optional) PD-L1 status: PD-L1 (CD274) expression status (positive | negative). The PD-L1 (CD274) status will be used in associating relevant clinical trials and drugs.
(Optional) HER2 status: HER2 (ERBB2) expression status (positive | negative). HER2 (ERBB2) status will be used in associating relevant clinical trials and drugs.

# Choose the Technology Type

Choose the next-generation sequencing machine associated with the samples. If you do not know which sequencing platform is used, you can select the "Unknown" option. Mixing different technologies in one run is not permitted.

# Choose the kit type

The SEQ platform has hundreds of different kits predefined in the system. A new kit can be defined with a set of target coordinates and a list of targeted genes. The Addition of a new kit typically takes one business day. For kit requests, please contact us through support@genomize.com.

Every kit is associated with a standardized analysis version in SEQ. Probe-based kits, primer-based kits, Illumina & MGI technology, ION torrent technology, germline analysis, or somatic analyses all have a preset analysis version.

# Select the files to upload

Click the "Browse" button under the “File” to upload all the files you want to analyze. Make sure that you upload both read files for paired-end reads. See the table below for the supported input file types:

	File Types	Batch Sample Upload	RNA-Seq Upload
Illumina	.fastq.gz, .fq.gz	Supported	Supported
MGI	.fastq.gz, .fq.gz	Supported	Supported
Element Biosciences	.fastq.gz, .fq.gz	Supported	Supported
GeneMind	.fastq.gz, .fq.gz	Supported	Supported
Onso (PacBio)	.fastq.gz, .fq.gz	Supported	Supported
Salus	.fastq.gz, .fq.gz	Supported	Supported

If you chose Tumor/Normalanalysis pipeline, Normal Files selection will be active. Using these fields, you can select the matched normal file for your tumor samples. Filename formatting is explained below. Formatting rules apply to normal and tumor samples separately. Matching between the tumor and normal samples is not checked. System will assume correct matching tumor and normal files are selected by the user.

# Filename formatting for batch upload

# Illumina

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Illumina naming convention” e.g.

NA10831_ATCACG_L002_R1_001.fastq.gz
NA10831_ATCACG_L002_R2_001.fastq.gz

The filenames from Illumina platform are handled as below:

<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz

Name field 1, name field 2, lane #, and read # are used to match the corresponding files correctly. You can alter the name fields 1 and 2 without using space, punctuation, or underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

The --no-lane-splitting parameter must not be used during file conversion with bcl2fastq, DRAGEN BCL Convert, or similar tools. Lane identifiers are required to remain present in FASTQ file names for correct file matching. Please refer to the tools’ documentations for details.

Please check the number of samples and matched files on the confirmation screen.

# MGI

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “MGI naming convention” e.g.

V12345678_L01_16_1.fastq.gz
V12345678_L01_16_2.fastq.gz

The filenames from MGI platform are handled as below:

<flowcell_id>_<lane_#>_<barcode>_<read_#>.fq.gz

Flowcell ID, lane #, barcode, and read # are used to match the corresponding files correctly. You can alter the flowcell ID field without using space, punctuation or underscore (_) characters. If you have used more than one barcode for the same sample, you need to rename the file as follows:

Original file names:
sample1234_L01_16_1.fq.gz
sample1234_L01_16_2.fq.gz
sample1234_L01_17_1.fq.gz
sample1234_L01_17_2.fq.gz

Altered file names:
sample1234_L01_16_1.fq.gz
sample1234_L01_16_2.fq.gz
sample1234_L02_16_1.fq.gz
sample1234_L02_16_2.fq.gz

In this example, barcode numbers of the last two files are changed to 16, and their lane numbers are increased by 1.

Please check the number of samples and matched files on the confirmation screen.

# Element Biosciences

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Element Biosciences naming convention” e.g.

SampleName_S1_L001_R1_001.fastq.gz
SampleName_S1_L001_R1_001.fastq.gz

The filenames from Element platforms are handled as below:

<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz

Please make sure the Lane identifiers are present in the file names as they are required for correct file matching. You can use --legacy-fastq parameter in Bases2Fastq to produce files with names adhering to the above format. Please refer to the Bases2Fastq documentation for details.

Please check the number of samples and matched files on the confirmation screen.

# GeneMind

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “GeneMind naming convention” e.g.

SampleName_R1.fq.gz
SampleName_R2.fq.gz

The filenames from GeneMind platform are handled as below:

<name_field>_<read_#>.fq.gz

Name field and read # are used to match the files correctly. You can alter the name field without using space, punctuation and underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Salus

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Salus naming convention” e.g.

ProjectName_SampleID_R1.fastq.gz
ProjectName_SampleID_R2.fastq.gz

The filenames from Salus platform are handled as below:

<project_name>_<sampleID>_<read_#>.fastq.gz

Project name, sampleID and read # are used to match the files correctly. You can alter the project name and sampleID fields without using space, punctuation and underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Onso (Pacific Biosciences)

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Obc2fastq v6.0 naming convention” e.g.

ABCD123_L01_R1_Sample_Information.fastq.gz
ABCD123_L01_R2_Sample_Information.fastq.gz

<Flowcell_id>_<LaneSpec>_<ReadSpec|IndexSpec>_<Sample_id>.fastq.gz

FlowcellID, SampleID and ReadSpec|IndexSpec are used to match the files correctly. You can alter the FlowcellID and sampleID fields without using space or punctuation. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Advanced options

# Variant Calling parameters

A set of parameters is used to assess the quality of every variant called in a sample. Two parameters, the primary coverage threshold and the minimum alternative fraction threshold, can cause the classification of the variant as “FAILED”. The “FAILED” variant calls will not be displayed.

The variant calls with an alternative allele count less than the primary coverage threshold will be classified as “FAILED” and not be displayed.

The variant calls with alternative allele frequency less than the allele fraction threshold will be classified as “FAILED” and not be displayed.

# Other parameters

When calculating coverage metrics for the gene coverage and the kit’s on-target coverage percentages, SEQ uses four different thresholds. 1X and 5X are the preset values. The other two values may be customized by the user per upload.

The default values of the advanced options are set under “Site Settings” in the Settings menu.

# Choose Analysis Version

SEQ has standard analysis versions pre-setup for every kit defined in the system. Data processing and variant calling are handled differently based on the sample type, sequencing platform, and selected analysis pipeline.

The SEQ platform has standard analysis versions pre-setup for every kit defined in the system. Calling variants are performed differently in different analysis versions.

# Analysis versions for Tumor/Normal Matched Samples (Capture based targeted panels, including WES)

Name	Explanation	Alignment/ variant calling	BAM processing	MSI Detection	Available Genome Versions	Available Platforms
Sentieon BWA-TNhaplotyper2- somatic	Optimized for Exome Samples.	Sentieon BWA / TNhaplotyper2	MarkDuplicate	MSIsensor (opens new window)	hg38	Illumina MGI Element GeneMind Onso Salus
Sentieon BWA-TNScope- somatic	Optimized for capture-based somatic kits.	Sentieon BWA / TNScope	MarkDuplicate	MSIsensor (opens new window)	hg38	Illumina MGI Element GeneMind Onso Salus

# Analysis versions for Tumor Only Samples (Capture based targeted panels, including WES)

Name	Explanation	Alignment/ variant calling	BAM processing	Fusion Calling	MSI Detection	Primer Trimming	Available Genome Versions	Available Platforms
BWA-Freebayes-PCR Dedup-Indel Realignment- somatic	Optimized for capture-based somatic kits.	BWA / Freebayes	Indel Realignment	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Sentieon (Solid, UMI) BWA-TNscope- somatic	Optimized for somatic UMI kits with with solid samples.	Sentieon BWA / TNScope	MarkDuplicate UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Sentieon (ctDNA, UMI) BWA-TNscope- somatic	Optimized for somatic UMI kits with with ctDNA samples.	Sentieon BWA / TNScope	MarkDuplicate UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Roche (Solid, UMI)-BWA-VarDict-RNA Fusion Calling- somatic	Optimized for Roche somatic UMI kits with solid samples.	BWA / VarDict	UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Roche (cfDNA, UMI)-BWA-VarDict-RNA Fusion Calling- somatic	Optimized for Roche somatic UMI kits with cfDNA samples.	BWA / VarDict	UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Nanodigmbio (Solid, UMI)-BWA-VarDict-RNA Fusion Calling- somatic	Optimized for Nanodigmbio somatic UMI kits with solid samples.	BWA / VarDict	UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus
Nanodigmbio (cfDNA, UMI)-BWA-VarDict-RNA Fusion Calling- somatic	Optimized for Nanodigmbio somatic UMI kits with cfDNA samples.	BWA / VarDict	UMI Processing*	RNA (STAR-Fusion)	msisensor2 (opens new window)	N/A	hg38	Illumina MGI Element GeneMind Onso Salus

*the UMI barcode sequence is also required. For assistance, please contact support.

# Analysis versions for Tumor Only Samples (Amplicon based panels)

Name	Explanation	Alignment/ variant calling	BAM processing	Fusion Calling	MSI Detection	Primer Trimming	Available Genome Versions	Available Platforms
BWA-Freebayes-BamKeser-Indel Realignment- somatic	Optimized for amplicon based somatic kits.	BWA / Freebayes	Indel Realignment	RNA (STAR-Fusion)	-	BamKeser*	hg38	Illumina MGI Element GeneMind Onso Salus
BWA-Freebayes-BamKeser-Indel Realignment-Long Indel Finder- somatic	Optimized for amplicon-based somatic kits. Performs an additional step for long indel alterations.	BWA / Freebayes	Indel Realignment	RNA (STAR-Fusion)	-	BamKeser*	hg38	Illumina MGI Element GeneMind Onso Salus

*BamKeser is our in-house designed and precisely working primer trimming tool.

# Submit your data

As the last step, you can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.

When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.

# IMPORTANT NOTE: POST-PROCESSING

After variant calling, the Genomize SEQ platform processes the resulting VCF file to form a Genomize standard VCF file, which can be downloaded through the platform. The Genomize standard VCF line will have GSTD=1 in the info field. Standardization of the VCF file includes the following important steps:

Minimal variant representation: Some callers produce redundant bases at the left-hand or right-hand side of either alternative or reference allele. This redundancy has to be removed to obtain the correct annotation of variants in the subsequent steps.

# RNA Fusion Detection

The RNA Fusion Detection pipeline identifies candidate gene fusion events from RNA sequencing (RNA-Seq) data through sequential preprocessing, alignment, and fusion discovery steps summarized in the table below.

Step	Explanation	Tools
Adapter/Quality Trimming	Removes sequencing adapters, low-quality bases, and short reads to produce high-quality trimmed FASTQ files.	fastp (opens new window)
Alignment and Chimeric Read Detection	Aligns non-rRNA reads to the human genome and identifies chimeric junctions suggestive of fusion transcripts.	STAR (opens new window)
Fusion Transcript Detection	Analyzes chimeric junction data against the CTAT genome library* to detect and report candidate gene fusion events.	STAR-Fusion (opens new window)

*The CTAT genome library is a publicly available reference genome library for RNA-Seq analysis (for the details, please refer to CTAT (opens new window)).

# TMB (Tumor Mutation Burden) Calculation

Tumor Mutational Burden is defined as non-synonymous SNP mutations per megabase. If the covered region length is less than 1.1 megabases, TMB cannot be calculated and will be seen as 'N/A'. For tumor only and ctDNA samples, TMB is calculated by the total number of non-synonymous and unique/novel SNP variants found in the sample, divided by the covered region on CDS. For tumor/matched-normalsamples, the variants detected in the normal sample are used to eliminate germline mutations.

Please see the table below for detailed parameters used for TMB calculation.

Parameters	Tumor Only	T/N match	ctDNA
Minimum Allowed Covered Length	1.1mb	1.1mb	1.1mb
Minimum Depth	50	50	1000
Minimum Allele Fraction	0.05	0.05	0.002
Maximum Allele Fraction	0.90	-	0.90
Population Database Allele Count	50	-	50
dbSNP filter	Yes	-	Yes
Chromosome Filter	MT	MT	MT
Filter Non-coding Regions	Yes	Yes	Yes
Filter MNVs	Yes	Yes	Yes
Filter No PASS Variants	Yes	Yes	Yes
Use SNV and Indels	Yes	Yes	Yes

# MSI (Microsatellite Instability) Calculation

For MSI detection, MSIsensor (Niu et al., 2014 (opens new window)) is used on tumor–normal matched samples, while msisensor2 (opens new window) is applied to tumor-only samples.

# VCF Upload

# Run Selection

# Diagnostics

Next, you can add the diagnostic information including cancer type and the biomarker status.

# Cancer Type

(Mandatory) Cancer Type (e.g. Breast Cancer)
(Optional) Advanced Solid Tumor: Select this option if the sample or case involves a late-stage or metastatic solid tumor, regardless of the specific cancer type.

# Biomarkers

(Optional) TMB status: tumor mutation burden status (high | intermediate | low).The specified TMB status will be used in associating relevant clinical trials and drugs.
(Optional) MSI status: Microsatellite instability status ( high | low | stable ). The specified MSI status is displayed in the report and used in associating relevant clinical trials and drugs.
(Optional) HRD status: Homologous Recombination Deficiency status (high | low). The HRD status will be used in associating relevant clinical trials and drugs.
(Optional) TILs status: Tumor Infiltrating Lymphocytes status (high | low). The TILs status will be used in associating relevant clinical trials and drugs.
(Optional) PD-L1 status: PD-L1 (CD274) expression status (positive | negative). The PD-L1 (CD274) status will be used in associating relevant clinical trials and drugs.
(Optional) HER2 status: HER2 (ERBB2) expression status (positive | negative). HER2 (ERBB2) status will be used in associating relevant clinical trials and drugs.

# Select the VCF Files to be Uploaded

You can upload VCF files for SNVs from DRAGEN or other similar tools (See Supported Variant Callers) for the same sample in any combination you choose. VCF files should have vcf.gzextension. Files are matched using the sample info field, not the file names. Multisample VCFs are not supported.

# Submit your data

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

# Supported Variant Callers

Upload Type	Variant Types	Supported Callers	File Format (Extension)	Multisample Support
Small Variant	SNV INDEL	DeepVariant Dragen Freebayes GATK - Haplotype Caller Ion Torrent Variant Caller Isaac Variant Caller Mutect2 Pivat Sentieon DNAscope, TNScope VarDict Clair3 Qiagen CLC	VCF (.vcf.gz)	No

# Unsupported Variant Callers and VCF Version Compatibility

Please note that using unlisted or unsupported variant callers may result in inaccurate VCF metrics. Unpredicted callers are categorized as "other," which may limit the capture of certain metrics. In some cases, custom integration may be required to support these callers fully. Only VCF version 4.1 or newer is supported for Copy Number, Structural and Short Tandem VCF files. If issues persist or critical data appears missing, please contact support for assistance.

# Mitochondrial Variants

The Revised Cambridge Reference Sequence (rCRS, NC_012920.1) is used as the reference for the mitochondrial genome regardless of the genome version used in VCF generation, which is the recommended sequence for clinical use (McCormick et al., 2020 (opens new window)). If the VCF file contains variants called using the older Yoruban (YRI) mitochondrial reference genome, errors may result due to incompatibility with our annotation sources. Unsupported chrM variants should also be removed before upload to prevent genome compatibility issues. For assistance, or if issues arise, please contact support.

# Filtering Parameters Applied to Small Variants

No Call Filter: Small variants with a 'no call' status are excluded, ensuring that only fully determined genotypes are included in the analysis.
Chromosome Filter: Small variants that are not on chromosomes 1-22, X,M are filtered out.

# Cloud Browser

To use the cloud browser, select or create your run, select the sequencing platform and the kit by following the directions above. After the kit selection, you will see the option to select either your “COMPUTER” or the “CLOUD BROWSER” as the data source.

When you select the "CLOUD BROWSER" option, click the PLUS (➕) button to open the cloud browser interface. Using the cloud browser, you can choose the files with which you want to start the analysis and click “DONE”. The rest of the process is the same as described above. Please note that there is a 3-minute duration between each cloud upload process, and files will be removed from your cloud account upon starting the analysis.

← IGV - Raw Data Visualization Somatic Sample Dashboard →