Skip to contents

Run scLinaX analysis for all samples

Usage

run_scLinaX(
  ASE_df,
  XCI_ref,
  QCREF,
  Inactive_Gene_ratio_THR = 0.05,
  SNP_DETECTION_DP = 30,
  SNP_DETECTION_MAF = 0.1,
  QC_total_allele_THR = 10,
  HE_allele_cell_number_THR = 50,
  REMOVE_ESCAPE = TRUE,
  PVAL_THR = 0.01,
  RHO_THR = 0.5
)

Arguments

ASE_df

A dataframe (tibble) containing single-cell allele-specific expression (scASE) data for all samples. This dataframe should have the following columns:

  • SNP_ID: SNP identifier

  • POS: Genomic position of the SNP (GRCh38)

  • REF: Reference allele of the SNP A,T,G,C

  • ALT: Alternative allele of the SNP A,T,G,C

  • cell_barcode: Cell barcode

  • REFcount: Allelic expression of the reference allele

  • ALTcount: Allelic expression of the alternative allele

  • OTHcount: Allelic expression of the other allele

  • Sample_ID: Sample ID

  • Gene: Gene annotated to the SNP

XCI_ref

A dataframe (tibble) containing X chromosome inactivation status. This dataframe should have the following two columns:

  • Gene: Gene name

  • XCI_status: XCI status escape, variable, inactive

QCREF

A dataframe generated for reference gene QC (please see the run_RefGeneQC function). QCREF should have the following three columns:

  • Gene

  • Mean_AR_target

  • Mean_AR_reference

Inactive_Gene_ratio_THR

Threshold for the ratio of inactive genes

SNP_DETECTION_DP

Threshold for the total allele count (depth) of the SNP in the scASE data. SNP–Sample pairs with a total allele count of at least "SNP_DETECTION_DP" are used for the analysis. Default: 30.

SNP_DETECTION_MAF

Threshold for the minor allele count of the scASE data. SNP–Sample pairs with a minor allele ratio between "SNP_DETECTION_MAF" and "1 - SNP_DETECTION_MAF" are used for the analysis. Default: 0.1.

QC_total_allele_THR

Threshold for the total allele count (depth) of the SNP used for calculating the ratio of expression from Xi. Note that this count is calculated with cells successfully assigned to the group based on the inactivated X chromosome. This filter is applied in the final step of scLinax and differs from "SNP_DETECTION_DP". Default: 10.

HE_allele_cell_number_THR

Threshold for the number of cells expressing reference SNPs. Candidate reference SNPs expressed in at least "HE_allele_cell_number_THR" cells are used for the analysis. Default: 50.

REMOVE_ESCAPE

Boolean:

  • TRUE (default): Remove ASE profiles of SNPs on escapee genes when calculating correlations between pseudobulk ASE profiles.

  • FALSE: Include ASE profiles of SNPs on escapee genes when calculating correlations between pseudobulk ASE profiles.

PVAL_THR

Threshold for P-values and absolute correlation coefficients in the correlation analysis of pseudobulk profiles generated for reference SNPs.

RHO_THR

Threshold for absolute correlation coefficients in the correlation analysis of pseudobulk profiles generated for reference SNPs.

Value

A list of objects ($result, $raw_exp_result, $Fail_list) representing the results of scLinaX. Additional objects representing intermediate results of scLinaX are also included in the list ($clustering_result, $Max_Num_Table_result, $df_snp_summary, $phasing_result). While users may not typically need to use these intermediate objects directly, they are provided for reference.

  • $result: A per-sample raw result of scLinaX. This data has redundancy and should be summarized using the summarize_scLinaX function before analysis.

    The dataframe ($result) has the following columns:

    • Sample_ID: Sample ID

    • SNP_ID: SNP identifier

    • CHR: PAR or nonPAR

    • POS: Genomic position of the SNP (GRCh38)

    • REF: Reference allele of the SNP A,T,G,C

    • ALT: Alternative allele of the SNP A,T,G,C

    • Gene: Gene name

    • XCI_status: XCI status escape, variable, inactive, unknown

    • Gene_class: XCI status combined with CHR information PAR1, nonPAR_escape, nonPAR_variable, nonPAR_inactive, nonPAR_unknown, PAR2

    • Used_as_refGene: Whether the Gene was used as a reference gene Yes, No

    • Used_as_refSNP: Whether the SNP_ID was used as a reference SNP Yes, No

    • Total_A_allele, Total_B_allele: Total allele count of allele A and B

    • Total_allele: Total allele count of the SNP

    • Expressing_cells: Number of cells expressing the SNP

    • minor_allele_ratio: Ratio of the expression from the allele (A, B) with lower expression

    • Reference_Gene, Reference_SNP: A list of genes used as reference genes and SNPs

    • Num_Reference_Gene, Num_Reference_SNP: Number of reference genes and SNPs

    • Reference_Cell_Count: Number of cells showing mono-allelic expression of reference SNPs

    • Num_A_cells, Num_B_cells: Number of cells showing mono-allelic expression of A and B alleles of reference SNPs

    • Num_Fail_cells: Number of cells showing bi-allelic expression of reference SNPs (should be removed from the analysis)

  • $raw_exp_result: A per-sample raw result of scLinaX.

    The dataframe ($raw_exp_result) has the following columns:

    • cell_barcode: Cell barcode

    • Sample_ID: Sample ID

    • SNP_ID: SNP identifier

    • CHR: PAR or nonPAR

    • POS: Genomic position of the SNP (GRCh38)

    • REF: Reference allele of the SNP A,T,G,C

    • ALT: Alternative allele of the SNP A,T,G,C

    • REFcount: Allelic expression of the reference allele

    • ALTcount: Allelic expression of the alternative allele

    • OTHcount: Allelic expression of the other allele

    • Gene: Gene name

    • XCI_status: XCI status escape, variable, inactive, unknown

    • Gene_class: XCI status combined with CHR information PAR1, nonPAR_escape, nonPAR_variable, nonPAR_inactive, nonPAR_unknown, PAR2

    • Used_as_refGene: Whether the Gene was used as a reference gene Yes, No

    • Used_as_refSNP: Whether the SNP_ID was used as a reference SNP Yes, No

    • Xa: Information of the activated X chromosome Allele_A, Allele_B

    • Reference_Gene, Reference_SNP: A list of genes used as reference genes and SNPs

  • $Fail_list: List of samples for which scLinaX analysis failed.

  • $clustering_result: Result of the grouping of reference SNPs. Cluster names are in the format Sample_IDclustercluster_ID.

  • $Max_Num_Table_result: A dataframe describing the number of cells.

  • $df_snp_summary: An original dataframe from which Spearman correlation between pseudobulk ASE profiles is calculated.

  • $phasing_result: Result of the Spearman correlation analysis for pseudobulk ASE profiles.