Skip to contents

Run Gene Quality Control (QC) function

Usage

run_RefGeneQC(
  ASE_df,
  XCI_ref,
  SNP_DETECTION_DP = 30,
  SNP_DETECTION_MAF = 0.1,
  SAMPLE_NUM_THR = 3,
  HE_allele_cell_number_THR = 50,
  QC_total_allele_THR = 10
)

Arguments

ASE_df

A dataframe (tibble) containing single-cell allele-specific expression (scASE) data for all samples. This dataframe should have the following columns:

  • SNP_ID: SNP identifier

  • POS: Genomic position of the SNP (GRCh38)

  • REF: Reference allele of the SNP A,T,G,C

  • ALT: Alternative allele of the SNP A,T,G,C

  • cell_barcode: Cell barcode

  • REFcount: Allelic expression of the reference allele

  • ALTcount: Allelic expression of the alternative allele

  • OTHcount: Allelic expression of the other allele

  • Sample_ID: Sample ID

  • Gene: Gene annotated to the SNP

XCI_ref

A dataframe (tibble) containing X chromosome inactivation status. This dataframe should have the following two columns:

  • Gene: Gene name

  • XCI_status: XCI status escape, variable, inactive

SNP_DETECTION_DP

Threshold for the total allele count (depth) of the SNP in the scASE data. SNP–Sample pairs with a total allele count of at least "SNP_DETECTION_DP" are used for the analysis. Default: 30.

SNP_DETECTION_MAF

Threshold for the minor allele count of the scASE data. SNP–Sample pairs with a minor allele ratio between "SNP_DETECTION_MAF" and "1 - SNP_DETECTION_MAF" are used for the analysis. Default: 0.1.

SAMPLE_NUM_THR

Threshold for the sample size used in the calculation of the ratio of expression from Xi. Genes evaluated in at least SAMPLE_NUM_THR samples are used for the calculation of the ratio of expression from Xi. Default: 3.

HE_allele_cell_number_THR

Threshold for the number of cells expressing reference SNPs. Candidate reference SNPs expressed in at least "HE_allele_cell_number_THR" cells are used for the analysis. Default: 50.

QC_total_allele_THR

Threshold for the total allele count (depth) of the SNP used for calculating the ratio of expression from Xi. Note that this count is calculated with cells successfully assigned to the group based on the inactivated X chromosome. This filter is applied in the final step of scLinax and differs from "SNP_DETECTION_DP". Default: 10.

Value

A dataframe (tibble) with the following columns:

  • Gene: Gene name

  • Mean_AR_target, SD_AR_target: Mean and standard deviation of the ratio of expression from Xi across other candidate reference genes when the SNPs on the gene were used as references

  • Mean_AR_reference, SD_AR_reference: Mean and standard deviation of the ratio of expression from Xi for the gene when SNPs on other candidate reference genes were used as references

  • Mean_Total_allele_target, SD_Total_allele_target: Mean and standard deviation of the total allele count across data points when calculating the ARs defined above for target.

  • Mean_Total_allele_reference, SD_Total_allele_reference: Mean and standard deviation of the total allele count across data points when calculating the ARs defined above for reference.

  • Sample_N_target: Number of samples calculating the ARs defined above for target.

  • Sample_N_reference: Number of samples calculating the ARs defined above for reference.

  • Count_target: Number of data points calculating the ARs defined above for target.

  • Count_reference: Number of data points calculating the ARs defined above for reference.