# Data Upload

With the SEQ Platform, you can directly upload your files or use the cloud browser to browse and select the files pre-uploaded to your account.

# Direct Data Upload

Click on the "Upload" button at your homepage and select "Somatic" option. Here, you can select FASTQ or VCF options.

Somatic upload page

# FASTQ Upload

# Run Selection

You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.

Somatic run selection page

# Diagnostics

Next, you can add the diagnostic information including cancer type and the biomarker status.

Somatic cancer type selection page

# Cancer Type

  • (Mandatory) Cancer Type (e.g. Breast Cancer)
  • (Optional) Advanced Solid Tumor: Select this option if the sample or case involves a late-stage or metastatic solid tumor, regardless of the specific cancer type.

# Biomarkers

  • (Optional) TMB status: tumor mutation burden status (high | intermediate | low).The specified TMB status will be used in associating relevant clinical trials and drugs.
  • (Optional) MSI status: Microsatellite instability status ( high | low | stable ). The specified MSI status is displayed in the report and used in associating relevant clinical trials and drugs.
  • (Optional) HRD status: Homologous Recombination Deficiency status (high | low). The HRD status will be used in associating relevant clinical trials and drugs.
  • (Optional) TILs status: Tumor Infiltrating Lymphocytes status (high | low). The TILs status will be used in associating relevant clinical trials and drugs.
  • (Optional) PD-L1 status: PD-L1 (CD274) expression status (positive | negative). The PD-L1 (CD274) status will be used in associating relevant clinical trials and drugs.
  • (Optional) HER2 status: HER2 (ERBB2) expression status (positive | negative). HER2 (ERBB2) status will be used in associating relevant clinical trials and drugs.

# Choose the Technology Type

Choose the next-generation sequencing machine associated with the samples. If you do not know which sequencing platform is used, you can select the "Unknown" option. Mixing different technologies in one run is not permitted.

# Choose the kit type

The SEQ platform has hundreds of different kits predefined in the system. A new kit can be defined with a set of target coordinates and a list of targeted genes. The Addition of a new kit typically takes one business day. For kit requests, please contact us through support@genomize.com.

Every kit is associated with a standardized analysis version in SEQ. Probe-based kits, primer-based kits, Illumina & MGI technology, ION torrent technology, germline analysis, or somatic analyses all have a preset analysis version.

# Select the files to upload

Click the "Browse" button under the “File” to upload all the files you want to analyze. Make sure that you upload both read files for paired-end reads. See the table below for the supported input file types:

File Types Batch Sample Upload RNA-Seq Upload
Illumina .fastq.gz, .fq.gz Supported Supported
MGI .fastq.gz, .fq.gz Supported Supported
Element Biosciences .fastq.gz, .fq.gz Supported Supported
GeneMind .fastq.gz, .fq.gz Supported Supported
Onso (PacBio) .fastq.gz, .fq.gz Supported Supported
Salus .fastq.gz, .fq.gz Supported Supported

If you chose Tumor/Normalanalysis pipeline, Normal Files selection will be active. Using these fields, you can select the matched normal file for your tumor samples. Filename formatting is explained below. Formatting rules apply to normal and tumor samples separately. Matching between the tumor and normal samples is not checked. System will assume correct matching tumor and normal files are selected by the user.

# Filename formatting for batch upload

# Illumina

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Illumina naming convention” e.g.

NA10831_ATCACG_L002_R1_001.fastq.gz
NA10831_ATCACG_L002_R2_001.fastq.gz

The filenames from Illumina platform are handled as below:

<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz

Name field 1, name field 2, lane #, and read # are used to match the corresponding files correctly. You can alter the name fields 1 and 2 without using space, punctuation, or underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

The --no-lane-splitting parameter must not be used during file conversion with bcl2fastq, DRAGEN BCL Convert, or similar tools. Lane identifiers are required to remain present in FASTQ file names for correct file matching. Please refer to the tools’ documentations for details.

Please check the number of samples and matched files on the confirmation screen.

# MGI

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “MGI naming convention” e.g.

V12345678_L01_16_1.fastq.gz
V12345678_L01_16_2.fastq.gz

The filenames from MGI platform are handled as below:

<flowcell_id>_<lane_#>_<barcode>_<read_#>.fq.gz

Flowcell ID, lane #, barcode, and read # are used to match the corresponding files correctly. You can alter the flowcell ID field without using space, punctuation or underscore (_) characters. If you have used more than one barcode for the same sample, you need to rename the file as follows:

Original file names:
sample1234_L01_16_1.fq.gz
sample1234_L01_16_2.fq.gz
sample1234_L01_17_1.fq.gz
sample1234_L01_17_2.fq.gz

Altered file names:
sample1234_L01_16_1.fq.gz
sample1234_L01_16_2.fq.gz
sample1234_L02_16_1.fq.gz
sample1234_L02_16_2.fq.gz

In this example, barcode numbers of the last two files are changed to 16, and their lane numbers are increased by 1.

Please check the number of samples and matched files on the confirmation screen.

# Element Biosciences

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Element Biosciences naming convention” e.g.

SampleName_S1_L001_R1_001.fastq.gz
SampleName_S1_L001_R1_001.fastq.gz

The filenames from Element platforms are handled as below:

<name_field1>_<name_field2>_<lane_#>_<read_#>_<always_001>.fastq.gz

Name field 1, name field 2, lane #, and read # are used to match the corresponding files correctly. You can alter the name fields 1 and 2 without using space, punctuation, or underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please make sure the Lane identifiers are present in the file names as they are required for correct file matching. You can use --legacy-fastq parameter in Bases2Fastq to produce files with names adhering to the above format. Please refer to the Bases2Fastq documentation for details.

Please check the number of samples and matched files on the confirmation screen.

# GeneMind

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “GeneMind naming convention” e.g.

SampleName_R1.fq.gz
SampleName_R2.fq.gz

The filenames from GeneMind platform are handled as below:

<name_field>_<read_#>.fq.gz

Name field and read # are used to match the files correctly. You can alter the name field without using space, punctuation and underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Salus

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Salus naming convention” e.g.

ProjectName_SampleID_R1.fastq.gz
ProjectName_SampleID_R2.fastq.gz

The filenames from Salus platform are handled as below:

<project_name>_<sampleID>_<read_#>.fastq.gz

Project name, sampleID and read # are used to match the files correctly. You can alter the project name and sampleID fields without using space, punctuation and underscore (_) characters. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Onso (Pacific Biosciences)

You can upload multiple samples with two or more files. For batch uploads, we only support filenames following the “Obc2fastq v6.0 naming convention” e.g.

ABCD123_L01_R1_Sample_Information.fastq.gz
ABCD123_L01_R2_Sample_Information.fastq.gz

<Flowcell_id>_<LaneSpec>_<ReadSpec|IndexSpec>_<Sample_id>.fastq.gz

FlowcellID, SampleID and ReadSpec|IndexSpec are used to match the files correctly. You can alter the FlowcellID and sampleID fields without using space or punctuation. Following this naming convention, you can upload multiple samples (each with multiple fastq.gz files).

Please check the number of samples and matched files on the confirmation screen.

# Advanced options

# Variant Calling parameters

A set of parameters is used to assess the quality of every variant called in a sample. Two parameters, the primary coverage threshold and the minimum alternative fraction threshold, can cause the classification of the variant as “FAILED”. The “FAILED” variant calls will not be displayed.

The variant calls with an alternative allele count less than the primary coverage threshold will be classified as “FAILED” and not be displayed.

The variant calls with alternative allele frequency less than the allele fraction threshold will be classified as “FAILED” and not be displayed.

# Other parameters

When calculating coverage metrics for the gene coverage and the kit’s on-target coverage percentages, SEQ uses four different thresholds. 1X and 5X are the preset values. The other two values may be customized by the user per upload.

Advanced options

The default values of the advanced options are set under “Site Settings” in the Settings menu.

# Choose Analysis Version

SEQ has standard analysis versions pre-setup for every kit defined in the system. Data processing and variant calling are handled differently based on the sample type, sequencing platform, and selected analysis pipeline.

The SEQ platform has standard analysis versions pre-setup for every kit defined in the system. Calling variants are performed differently in different analysis versions.

# Analysis versions for Tumor/Normal Matched Samples (Capture based targeted panels, including WES)

Name Explanation Alignment/ variant calling BAM processing MSI Detection Available Genome Versions Available Platforms
Sentieon BWA-TNhaplotyper2- somatic Optimized for Exome Samples. Sentieon BWA / TNhaplotyper2 MarkDuplicate MSIsensor (opens new window) hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Sentieon BWA-TNScope- somatic Optimized for capture-based somatic kits. Sentieon BWA / TNScope MarkDuplicate MSIsensor (opens new window) hg38 Illumina
MGI
Element
GeneMind
Onso
Salus

# Analysis versions for Tumor Only Samples (Capture based targeted panels, including WES)

Name Explanation Alignment/ variant calling BAM processing Fusion Calling MSI Detection Primer Trimming Available Genome Versions Available Platforms
BWA-Freebayes-PCR Dedup-Indel Realignment- somatic Optimized for capture-based somatic kits. BWA / Freebayes Indel Realignment RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Sentieon (Solid, UMI) BWA-TNscope- somatic Optimized for somatic UMI kits with with solid samples. Sentieon BWA / TNScope MarkDuplicate
UMI Processing*
RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Sentieon (ctDNA, UMI) BWA-TNscope- somatic Optimized for somatic UMI kits with with ctDNA samples. Sentieon BWA / TNScope MarkDuplicate
UMI Processing*
RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Roche (Solid, UMI)-BWA-VarDict-RNA Fusion Calling- somatic Optimized for Roche somatic UMI kits with solid samples. BWA / VarDict UMI Processing* RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Roche (cfDNA, UMI)-BWA-VarDict-RNA Fusion Calling- somatic Optimized for Roche somatic UMI kits with cfDNA samples. BWA / VarDict UMI Processing* RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Nanodigmbio (Solid, UMI)-BWA-VarDict-RNA Fusion Calling- somatic Optimized for Nanodigmbio somatic UMI kits with solid samples. BWA / VarDict UMI Processing* RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
Nanodigmbio (cfDNA, UMI)-BWA-VarDict-RNA Fusion Calling- somatic Optimized for Nanodigmbio somatic UMI kits with cfDNA samples. BWA / VarDict UMI Processing* RNA (STAR-Fusion) msisensor2 (opens new window) N/A hg38 Illumina
MGI
Element
GeneMind
Onso
Salus

*the UMI barcode sequence is also required. For assistance, please contact support.

# Analysis versions for Tumor Only Samples (Amplicon based panels)

Name Explanation Alignment/ variant calling BAM processing Fusion Calling MSI Detection Primer Trimming Available Genome Versions Available Platforms
BWA-Freebayes-BamKeser-Indel Realignment- somatic Optimized for amplicon based somatic kits. BWA / Freebayes Indel Realignment RNA (STAR-Fusion) - BamKeser* hg38 Illumina
MGI
Element
GeneMind
Onso
Salus
BWA-Freebayes-BamKeser-Indel Realignment-Long Indel Finder- somatic Optimized for amplicon-based somatic kits. Performs an additional step for long indel alterations. BWA / Freebayes Indel Realignment RNA (STAR-Fusion) - BamKeser* hg38 Illumina
MGI
Element
GeneMind
Onso
Salus

*BamKeser is our in-house designed and precisely working primer trimming tool.

# Submit your data

As the last step, you can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.

When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.

# IMPORTANT NOTE: POST-PROCESSING

After variant calling, the Genomize SEQ platform processes the resulting VCF file to form a Genomize standard VCF file, which can be downloaded through the platform. The Genomize standard VCF line will have GSTD=1 in the info field. Standardization of the VCF file includes the following important steps:

  • Minimal variant representation: Some callers produce redundant bases at the left-hand or right-hand side of either alternative or reference allele. This redundancy has to be removed to obtain the correct annotation of variants in the subsequent steps.

# RNA Fusion Detection

The RNA Fusion Detection pipeline identifies candidate gene fusion events from RNA sequencing (RNA-Seq) data through sequential preprocessing, alignment, and fusion discovery steps summarized in the table below.

Step Explanation Tools
Adapter/Quality Trimming Removes sequencing adapters, low-quality bases, and short reads to produce high-quality trimmed FASTQ files. fastp (opens new window)
Alignment and Chimeric Read Detection Aligns non-rRNA reads to the human genome and identifies chimeric junctions suggestive of fusion transcripts. STAR (opens new window)
Fusion Transcript Detection Analyzes chimeric junction data against the CTAT genome library* to detect and report candidate gene fusion events. STAR-Fusion (opens new window)

*The CTAT genome library is a publicly available reference genome library for RNA-Seq analysis (for the details, please refer to CTAT (opens new window)).

# TMB (Tumor Mutation Burden) Calculation

Tumor Mutational Burden is defined as non-synonymous SNP mutations per megabase. If the covered region length is less than 1.1 megabases, TMB cannot be calculated and will be seen as 'N/A'. For tumor only and ctDNA samples, TMB is calculated by the total number of non-synonymous and unique/novel SNP variants found in the sample, divided by the covered region on CDS. For tumor/matched-normalsamples, the variants detected in the normal sample are used to eliminate germline mutations.

Please see the table below for detailed parameters used for TMB calculation.

Parameters Tumor Only T/N match ctDNA
Minimum Allowed Covered Length 1.1mb 1.1mb 1.1mb
Minimum Depth 50 50 1000
Minimum Allele Fraction 0.05 0.05 0.002
Maximum Allele Fraction 0.90 - 0.90
Population Database Allele Count 50 - 50
dbSNP filter Yes - Yes
Chromosome Filter MT MT MT
Filter Non-coding Regions Yes Yes Yes
Filter MNVs Yes Yes Yes
Filter No PASS Variants Yes Yes Yes
Use SNV and Indels Yes Yes Yes

# MSI (Microsatellite Instability) Calculation

For MSI detection, MSIsensor (Niu et al., 2014 (opens new window)) is used on tumor–normal matched samples, while msisensor2 (opens new window) is applied to tumor-only samples.

# VCF Upload

# Run Selection

You can upload your samples to a new run by selecting “Create New Run” under “Run Name” and giving it a name on the “Name For New Run” field. You can also upload your samples to an existing run by selecting the previous run from the dropdown menu.

Somatic run selection page

# Diagnostics

Next, you can add the diagnostic information including cancer type and the biomarker status.

Somatic cancer type selection page

# Cancer Type

  • (Mandatory) Cancer Type (e.g. Breast Cancer)
  • (Optional) Advanced Solid Tumor: Select this option if the sample or case involves a late-stage or metastatic solid tumor, regardless of the specific cancer type.

# Biomarkers

  • (Optional) TMB status: tumor mutation burden status (high | intermediate | low).The specified TMB status will be used in associating relevant clinical trials and drugs.
  • (Optional) MSI status: Microsatellite instability status ( high | low | stable ). The specified MSI status is displayed in the report and used in associating relevant clinical trials and drugs.
  • (Optional) HRD status: Homologous Recombination Deficiency status (high | low). The HRD status will be used in associating relevant clinical trials and drugs.
  • (Optional) TILs status: Tumor Infiltrating Lymphocytes status (high | low). The TILs status will be used in associating relevant clinical trials and drugs.
  • (Optional) PD-L1 status: PD-L1 (CD274) expression status (positive | negative). The PD-L1 (CD274) status will be used in associating relevant clinical trials and drugs.
  • (Optional) HER2 status: HER2 (ERBB2) expression status (positive | negative). HER2 (ERBB2) status will be used in associating relevant clinical trials and drugs.

# Select the VCF Files to be Uploaded

You can upload VCF files for SNVs from DRAGEN or other similar tools (See Supported Variant Callers) for the same sample in any combination you choose. VCF files should have vcf.gzextension. Files are matched using the sample info field, not the file names. Multisample VCFs are not supported.

# Submit your data

As the last step, you can then click the "Upload" button to see the number of analyses and the list of files matched for each analysis. Please be sure that both of these pieces of information are correct and hit "Approve" to start the upload process or "Cancel" to make changes.

When you start the upload, you will see the progress for each file. Transferred samples will immediately begin processing without waiting for the entire batch to finish uploading.

SEQ Platform's upload process is secure and performs a checksum to ensure the files are transferred correctly. Please do not close the browser tab or shut down your computer. Also, please ensure that your computer will not go into sleep/hibernation mode during the upload. Otherwise, the upload process will be aborted. Our upload process is resistant to intermittent loss of internet connection.

When the upload process is completed, you will be redirected to the corresponding Run's page, and your samples will be queued for analysis. Refresh the corresponding Run page to see the last status of the analysis.

# Supported Variant Callers

Upload Type Variant Types Supported Callers File Format (Extension) Multisample Support
Small Variant SNV
INDEL
DeepVariant
Dragen
Freebayes
GATK - Haplotype Caller
Ion Torrent Variant Caller
Isaac Variant Caller
Mutect2
Pivat
Sentieon DNAscope, TNScope
VarDict
Clair3
Qiagen CLC
VCF (.vcf.gz) No

# Unsupported Variant Callers and VCF Version Compatibility

Please note that using unlisted or unsupported variant callers may result in inaccurate VCF metrics. Unpredicted callers are categorized as "other," which may limit the capture of certain metrics. In some cases, custom integration may be required to support these callers fully. Only VCF version 4.1 or newer is supported for Copy Number, Structural and Short Tandem VCF files. If issues persist or critical data appears missing, please contact support for assistance.

# Mitochondrial Variants

The Revised Cambridge Reference Sequence (rCRS, NC_012920.1) is used as the reference for the mitochondrial genome regardless of the genome version used in VCF generation, which is the recommended sequence for clinical use (McCormick et al., 2020 (opens new window)). If the VCF file contains variants called using the older Yoruban (YRI) mitochondrial reference genome, errors may result due to incompatibility with our annotation sources. Unsupported chrM variants should also be removed before upload to prevent genome compatibility issues. For assistance, or if issues arise, please contact support.

# Filtering Parameters Applied to Small Variants

  1. No Call Filter: Small variants with a 'no call' status are excluded, ensuring that only fully determined genotypes are included in the analysis.

  2. Chromosome Filter: Small variants that are not on chromosomes 1-22, X,M are filtered out.

# Cloud Browser

To use the cloud browser, select or create your run, select the sequencing platform and the kit by following the directions above. After the kit selection, you will see the option to select either your “COMPUTER” or the “CLOUD BROWSER” as the data source.

Data source selection

When you select the "CLOUD BROWSER" option, click the PLUS (➕) button to open the cloud browser interface. Using the cloud browser, you can choose the files with which you want to start the analysis and click “DONE”. The rest of the process is the same as described above. Please note that there is a 3-minute duration between each cloud upload process, and files will be removed from your cloud account upon starting the analysis.