ﺑﺎﺯﮔﺸﺖ ﺑﻪ ﺻﻔﺤﻪ ﻗﺒﻠﯽ
خرید پکیج
تعداد آیتم قابل مشاهده باقیمانده : 3 مورد
نسخه الکترونیک
medimedia.ir

Tools for genetics and genomics: Gene expression profiling

Tools for genetics and genomics: Gene expression profiling
Literature review current through: Jan 2024.
This topic last updated: Oct 03, 2023.

INTRODUCTION — The genetic basis for disease is determined by the specific variants within genes and noncoding deoxyribonucleic acid (DNA). Most genetic sequences are inherited, although some arise de novo. Gene expression is triggered through interactions with environmental signals, often in a cell-type or timing-specific manner, ultimately leading to synthesis of specific proteins.

Since messenger ribonucleic acid (mRNA) represents the functional bridge between DNA and protein, alterations in mRNA may serve as markers for the activation (expression) or inhibition (repression) of a particular gene. Analyses of gene expression, referred to as gene expression profiling, can be clinically useful for disease classification, diagnosis, prognosis, and tailoring treatment to underlying genetic or genomic determinants of pharmacologic response. It is also used as a research tool to identify the role of different genes in normal functioning or pathophysiology of disease.

This topic focuses on the role of mRNA in the cell, platforms for profiling mRNA expression, the challenges in interpreting the data from these analyses, and the emerging clinical applications of gene expression measurements.

Other molecular tools for evaluating genetic disorders are presented in separate topic reviews:

Cytogenetics – (See "Tools for genetics and genomics: Cytogenetics and molecular genetics".)

Polymerase chain reaction (PCR) – (See "Polymerase chain reaction (PCR)".)

Next-generation sequencing – (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

DEFINITIONS

mRNA — According to the central dogma of biology, ribonucleic acid (RNA) is transcribed from a deoxyribonucleic acid (DNA) template; messenger RNA (mRNA) is then translated into protein (figure 1). Transcription and translation underlie gene expression. (See "Basic genetics concepts: DNA regulation and gene expression", section on 'Gene expression'.)

mRNA accounts for approximately 1 percent of the total RNA in a cell [1]. It is transcribed from approximately 20,000 to 25,000 protein-coding genes in the human genome [2].

After mRNA is transcribed from DNA, it typically undergoes further modifications including the addition of a methyl-guanosine cap (5' cap), the addition of a series of adenines to the 3' end of RNA (poly-A tail), and the splicing out of introns (figure 2) [3]. mRNA is then transported from the nucleus to the cytoplasm where it is translated into protein. mRNA serves as a transient intermediate between DNA and protein and is degraded in minutes to hours [3].

At any point in time, approximately 3 to 5 percent of genes are active in a particular cell, even though all cells have the same information contained in their DNA. Most of the genome is selectively repressed, a property that is governed by the regulation of gene expression, mostly at the level of mRNA production from DNA (transcription).

In response to a cellular perturbation, changes in gene expression take place that result in the expression of hundreds of gene products and the suppression of others. This molecular heterogeneity can affect when and how a disease presents clinically in an individual with genetic predisposition to a condition, and it can determine how a given disease will respond to specific treatments in different individuals.

Noncoding RNAs — mRNA is the RNA that codes for and is translated into protein. (See 'mRNA' above.)

In addition to mRNA, there are several other classes of RNA that are not translated into protein and serve other functions in the cell. These noncoding RNAs can be measured using variations of the same technologies used to measure mRNA, and they may also have relevance to human disease.

Classes of noncoding RNAs include the following [3-8]:

Transfer RNAs (tRNAs)

Ribosomal RNAs (rRNAs)

Small nuclear RNAs (snRNAs)

Small nucleolar RNAs (snoRNAs)

MicroRNAs (miRNAs)

Piwi-interacting RNAs (piRNAs)

Long noncoding RNAs (lncRNAs), which include the subclass large intergenic noncoding RNAs (lincRNAs)

cDNA — Complementary DNA (cDNA) is a DNA copy of an mRNA molecule.

Because mRNA only contains the coding regions of the DNA and not the introns, cDNA is a continuous piece of coding DNA, whereas genomic DNA that exists in vivo contains exons with interspersed introns.

cDNA is synthesized in a laboratory through an in vitro reverse transcription reaction. mRNA is susceptible to degradation by ribonucleases (RNases), and reverse transcription to cDNA is necessary to create a more stable analyte for genome-wide gene expression profiling. (See 'Technical considerations' below.)

Array — An array is an orderly arrangement of items.

In the context of genome-wide expression profiling, array is shorthand for microarray. Microarrays are one method for measuring gene expression on a genome-wide level. Microarrays are the physical substrate that is arrayed with microscopic quantities of nucleic acid probes. These nucleic acid probes each have known sequences and are laid out in known locations on the physical substrate. The nucleic acid probes bind complementary cDNA from a processed sample.

Gene expression — Gene expression is the process by which the information encoded in DNA is turned into mRNA. The expression of genes allows cells to adapt their phenotype by turning on (activating) or turning off (repressing) specific functions that the DNA encodes. (See "Basic genetics concepts: DNA regulation and gene expression", section on 'Gene expression'.)

Gene expression is measured by assaying mRNA.

Genome-wide expression profiling versus genome-wide genotyping and GWAS

Genome-wide expression profiling is the process of measuring the expression of all genes in the genome. (See 'Profiling genome-wide gene expression' below.)

Genome-wide genotyping is the process of measuring the allelic status (ie, the genotype) at hundreds of thousands (or millions) of single nucleotide polymorphisms (SNPs) across the genome.

Genome-wide association studies (GWAS) are genetic mapping studies that assesses for evidence of association between genetic variants and heritable traits across the entire genome, individually testing for differences in genotype frequencies between trait-expressing individuals (cases) and individuals who do not express the trait (controls) across hundreds of thousands of common SNPs.

RNA sequencing platforms can provide genotype information for polymorphisms in mRNA, but this information cannot reliably be used to perform GWAS. (See 'Transcriptome sequencing (RNA-seq)' below.)

Transcriptome — Transcriptome is the full spectrum mRNA expressed in a cell or tissue.

METHODS

Technical considerations

RNA degradation – A challenge in measuring RNA relates to the susceptibility of RNAs to degradation by ribonucleases (RNases). RNases are present in saliva, sweat, and other body secretions, and it can be very difficult for laboratory personnel to eliminate or control RNase contamination of laboratory assays [9]. Creation of the corresponding cDNA helps to mitigate this problem. (See 'cDNA' above.)

Choice of detection method – Methods of RNA detection take advantage of the single-stranded structure of RNA and its complementarity to the DNA from which it was transcribed. The method of RNA detection used requires consideration of the scientific question of interest, samples to be studied, experimental design, technical expertise, and availability of technologies.

Profiling small numbers of genes — While microarray and sequencing technology may capture gene expression on a whole genome level, additional methods may be used to measure the expression of a small number of genes; the table compares these methods (table 1).

Some of these methods have fallen out of favor with advancing technology but others, such as polymerase chain reaction (PCR), are commonly used:

Real time (quantitative reverse-transcription polymerase chain reaction (RT-PCR) (see 'Real-time RT-PCR' below and "Polymerase chain reaction (PCR)", section on 'PCR process')

RNA in situ hybridization (see 'RNA in situ hybridization' below)

Custom microarrays (see 'Custom microarrays' below)

Other methods are no longer commonly used:

Northern blot (see 'Northern blot (mainly of historical interest)' below)

Ribonuclease protection assay (see 'Ribonuclease protection assay (mainly of historical interest)' below)

These methods were originally developed using samples prepared from bulk tissues or samples composed of multiple cell types. Expression profiling at the single-cell level makes it possible to analyze differences in expression profiles of different cells. (See 'Single-cell RNA sequencing' below.)

Real-time RT-PCR — Real-time reverse transcription polymerase chain reaction (RT-PCR) is a flexible and relatively inexpensive approach that can be used to assay small or large numbers of genes from a single sample [10]. (See "Polymerase chain reaction (PCR)", section on 'Terminology'.)

PCR is commonly used in clinical laboratories. (See "Polymerase chain reaction (PCR)", section on 'Clinical applications'.)

Examples include:

Infectious diseases – Testing for viruses such as SARS-CoV-2, influenza, and hepatitis C.

Cancer – Predictive markers such as for prostate cancer. (See "Active surveillance for males with clinically localized prostate cancer", section on 'Tissue-based genomic prognostic markers'.)

After isolating RNA from a sample, complementary DNAs (cDNAs) are synthesized by reverse transcription with an RNA-dependent DNA polymerase. This cDNA mixture is then combined with a DNA-dependent DNA polymerase and fluorescently-labeled oligonucleotide primers [11]. These primers are short sequences of nucleotides complementary to a portion of the cDNA and allow amplification. Fluorescence increases as the cDNA of interest is amplified with PCR (figure 3). The fluorescence intensity is monitored and the total number of PCR cycles is counted [1]. The point at which the PCR cycler can distinguish fluorescence related to gene amplification from background is the cycle threshold; this number can be used to estimate the relative starting quantity of the RNA of interest [10]. Careful primer selection is required to prevent amplification of related genes [1].

RNA in situ hybridization — In situ hybridization (ISH) uses a nucleic acid probe to detect nucleic acids in a tissue section. ISH can localize the RNA of interest at the anatomic or cellular level. The tissue section is fixed to preserve tissue morphology and nucleic acid integrity [12,13]. The sample is then treated with proteases to eliminate proteins bound to the RNA of interest [12,13]. A labeled probe is hybridized to the sample and detected using autoradiography or chemiluminescence [13]. In situ hybridization using a fluorescently labeled probe is also called fluorescence in situ hybridization (FISH) [14].

The use of FISH to detect variations in DNA is discussed separately. (See "Tools for genetics and genomics: Cytogenetics and molecular genetics", section on 'Fluorescence in situ hybridization'.)

Custom microarrays — Microarrays can be used for whole genome measurements; custom microarrays can be constructed to analyze expression of a subset of the genome consisting of several hundred to several thousand genes. (See 'Microarrays (oligonucleotide arrays)' below.)

Generally, custom microarrays are used to measure specific sets of RNA that are not as easily assessed using whole transcriptome methods. For example, researchers may choose to use a custom microarray to measure tissue-specific noncoding RNAs from a particular tissue of interest [15-17].

Examples of microarrays in clinical use are discussed below. (See 'Clinical use' below.)

Custom microarray methods typically use the same technology as those used for whole genome analyses, in which RNA isolated from one sample is processed on one microarray. (See 'Microarrays (oligonucleotide arrays)' below.)

Digital expression profiling — Digital expression profiling is a newer adaptation of expression profiling that provides an absolute count of the number of transcripts per gene, rather than abundance relative to other genes. Assays for certain cancer types (breast, pancreatic) are available or under study.

Northern blot (mainly of historical interest) — Northern blots allow the determination of both the presence of an RNA molecule and its size [18]. Northern blotting is not used clinically, and its use in research has declined due to availability of less labor-intensive methods.

In Northern blotting, RNA molecules are first separated using gel electrophoresis and then transferred and cross-linked to a nylon membrane. The RNA of interest is detected by incubating the membrane with a labeled single-stranded DNA probe that is complementary to this RNA. Probes bound to the RNA of interest can then be detected using chemiluminescence or autoradiography.

Ribonuclease protection assay (mainly of historical interest) — Whereas the Northern blot uses complementary DNA probes, the ribonuclease protection assay (RPA) uses complementary RNA riboprobes (single-stranded radiolabeled antisense RNA probes) that are complementary to the RNA of interest [1]. RPA is not used clinically, and its use in research has declined due to availability of less labor-intensive methods.

In RPA, the riboprobe is incubated with sample RNA to form double-stranded RNA complexes, as well as ribonucleases to degrade excess unbound single-stranded RNA from both the sample and probe. The remaining double-stranded RNA complexes are size-separated by electrophoresis and detected by autoradiography.

Profiling genome-wide gene expression — Platforms for profiling gene expression take advantage of increased knowledge of the sequence of the human genome and require smaller quantities of starting RNA. The table (table 1) summarizes available methods, which include:

Microarrays (oligonucleotide arrays) – (See 'Microarrays (oligonucleotide arrays)' below.)

RNA (transcriptome) sequencing – (See 'Transcriptome sequencing (RNA-seq)' below.)

In situ and spatial transcriptomic profiling – (See 'In situ and spatial transcriptomics' below.)

Microarrays and sequencing can both assay large numbers of genes with relatively high throughput. As the cost of sequencing has declined, transcriptome sequencing has surpassed microarrays as the platform most often used for gene expression profiling of clinical specimens.

Microarrays (oligonucleotide arrays) — In this method short probes are synthesized directly onto a slide, allowing for inclusion of a sufficient number of probes to assay RNA expression at a genome-wide level [19,20]. Depending on the commercial manufacturer, probes vary from approximately 20 to 60 base pairs in length. Several types of arrays are commercially available.

Sample preparation begins with the isolation of RNA from the tissue of interest, resulting in an extraction that contains all of the genes transcribed in the tissue at the time the RNA is isolated. The RNA is then reverse transcribed into cDNA and amplified using PCR. Finally, a biotin label is incorporated through an in vitro transcription process, which converts cDNA into labeled cRNA.

A single sample's labeled cRNA is applied to each array. Hybridization occurs between the labeled cRNA from the sample and complementary probes on the array. This is followed by binding to an avidin-conjugated fluorophore and a washing step that removes any unbound material. The fluorophore is excited by a laser scanner coupled to a computer that captures the fluorescence signals from probes hybridized to the cDNA of interest, thus enabling the detection of the expression of thousands of genes simultaneously.

In general, the greater the amount of mRNA from a particular gene (the higher that gene's expression), the more fluorescently-labeled material corresponding to that gene will bind to complementary probes on the array. Background fluorescence or nonspecific binding may limit detection of transcripts with very low expression levels. Probe-based detection for gene expression limits analysis to genes that are known.

Microarray-based assays were the first widely used platforms for whole-transcriptome profiling research, but they have been supplanted by more flexible and robust sequence-based platforms such as RNA sequencing. (See 'Transcriptome sequencing (RNA-seq)' below.)

However, microarray-based assays comprised of select sets of clinically informative genes remain in clinical use. (See 'Custom microarrays' above.)

Transcriptome sequencing (RNA-seq) — An alternative for measuring gene expression is the direct sequencing and quantification of RNA molecules. Other names for transcriptome sequencing include:

RNA sequencing (RNA-seq)

Massively parallel RNA sequencing

Bulk RNA sequencing

Next-generation RNA sequencing (NGS)

Unlike microarray-based platforms (for which probes must be designed to amplify prespecified targets), sequencing-based profiling does not require prior knowledge of the sequences to be assayed. Hence, transcriptome sequencing allows improved detection of low abundance transcripts, as well as detection of novel transcripts and polymorphisms within a transcript's sequence.

Several commercial platforms are available. The details of each system vary. In general, the sample is prepared so that many sequencing reactions can occur simultaneously and can yield millions of RNA sequence reads obtained by laser scanning [21]. Advances in sample processing techniques also allow for preservation of the identity of the sense and antisense strands [22].

Next-generation DNA sequencing is discussed in more detail separately. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Single-cell RNA sequencing — Bulk RNA sequencing is performed on the pool of RNA extracted from a collection of cells. The resulting expression profile reflects the average transcript abundances across all cells. Thus, when analyzing heterogeneous samples with an admixture of various cell types, it is not clear whether gene expression differences observed between healthy tissue and diseased tissue are due to changes in the relative abundance of cell types with differing patterns of gene expression or to actual changes in the gene expression levels in a specific cell type.

In contrast, single-cell RNA sequencing (scRNA-seq) produces the transcriptome from a single cell. This approach overcomes the admixture-related limitations of bulk sequencing and allows for more accurate estimation of within-cell gene expression dynamics. scRNA-seq provides unique insights into how individual cells and cell types contribute to human health and disease beyond that which is possible with bulk sequencing. scRNA-seq is used almost exclusively on a genome-wide basis.

Multiple protocols for scRNA-seq are available. In general, these systems start with isolation of single cells separated manually. Separation methods include [23-25]:

Serial or microwell dilution

Fluorescence-activated cell sorting (FACS)

Automated microfluidics-based technologies  

This is often followed by a confirmatory procedure such as microscopy to ensure that single cells were indeed isolated. This helps to prevent spurious conclusions based on the evaluation of chambers that are either empty or contain multiple cells.

After separation, cells are lysed, the RNA fraction is converted to cDNA by reverse transcription, and the cDNA is amplified and sequenced [23,25].

Transcript barcoding tags the cell of origin with a unique nucleic acid sequence for identification. Barcoding, microfluidics, and other microwell-plate-based technologies allow for parallel sequencing of large numbers of individual cells [26-28].

scRNA-seq measures transcript abundance from the cell's cytoplasm and nucleus; it is highly dependent on tissue quality and is best used on fresh samples or those preserved by flash freezing. Because RNA in the nucleus is more stable, single-nuclear RNA-seq (snRNA-seq) has been developed to use cell nuclei (rather than whole cells); this method may have greater applicability for clinical use.

In situ and spatial transcriptomics — Spatial transcriptomics refers to expression profiling in different regions of a tissue, as a means of studying expression in local microenvironments [29]. Laser capture microdissection can be used to isolate specific regions within a tissue or tumor to investigate how changes differ from region to region. In situ hybridization of expression probes allows visualization of tissue histology alongside gene expression. These methods can be combined with scRNA-seq. (See 'Single-cell RNA sequencing' above.)

These methods are used for research into disease mechanisms, especially in cancer biology [30,31].

DATA ANALYSIS AND INTERPRETATION — Typically, investigators involved with transcriptome profiling experiments are interested in comparing gene expression across different conditions [32]. While there are many approaches to the data analysis in order to accomplish this goal, there are generally several analytical steps that must first be taken (figure 4).

There are three general considerations:

Preprocessing of raw data (see 'Preprocessing of raw data' below)

Data storage and analysis (see 'Data storage' below and 'Data analysis' below)

Biologic interpretation (see 'Interpretation' below)

Preprocessing of raw data — Many statistical tests used to analyze genomic data require that the data exhibit Gaussian/normal or Poisson distributions with few outlying data points. Samples (or specific genes) that violate these properties often have undue influence on downstream analysis, frequently resulting in skewed, spurious findings that cannot be replicated. Preprocessing refers to the process of transforming raw data to a format suitable for statistical analysis.

Microarray-based and sequence-based approaches differ in their quantification of transcript abundance, but the general principles of data preprocessing are the same, consisting broadly of two approaches: normalization and quality assessment of the raw data.

Normalization transforms the raw data to reduce the impact of stochastic (random) variation and technical variation due to the platform, operator, or sample batch. This allows the software to generate per gene and per sample distributions that are more suitable for statistical inference testing. Quality assessment eliminates low-quality microarrays, poorly aligning sequencing reads, or outlier measurements, which, if not addressed, can lead to biased results. Sequence quality can impact normalization methods and, similarly, normalization can impact quality assessment; thus, quality assessment is typically performed both before and after normalization.

Quantification and normalization of expression — For microarray data analysis, each microarray can be considered a separate experiment that contains slightly different amounts of starting RNA and different labeling efficiencies [32]. Data normalization adjusts the fluorescence intensities representing the amount of RNA bound to each probe so that these intensities are comparable across different arrays. The resulting values are intensities that are normalized to one of several reference distributions, depending on the normalization method used.

There are several methods for normalizing microarray data, including:

Scaling – Adjusts intensities by a constant factor so that the average expression level across microarrays is similar.

Quantile normalization – Adjusts the distribution of intensities across microarrays. As illustrated in the figure (figure 5), this is accomplished by ranking the probe intensities from highest to lowest for each array. A numerical value is assigned to represent this intensity on an individual array based on the behavior across all arrays and the rank of that probe on the individual array.

LOWESS – Locally weighted scatterplot smoothing (LOWESS) adjusts the brightness or darkness of different fluorescent labels for two-color array experiments.

For transcriptome sequencing, each sample generates millions of sequencing reads that are used to estimate expression levels of each gene or isoform. To estimate gene expression, these reads must first be assigned to their respective transcript to allow for accurate abundance estimation. This is done by aligning high-quality sequencing reads to a reference genome using one of many available sequence aligners [33].

Once aligned, the reads mapped to each transcript are counted to estimate each transcript's abundance, under the assumption that transcripts at higher abundance will yield a greater number of sequence reads compared with transcripts at low abundance. However, because sequencing can begin at any position along the length of the transcript, there are more opportunities for sequencing in larger transcripts than shorter ones. Without accounting for transcript length, RNA sequencing (RNA-seq) gene expression estimates would be biased towards longer genes. To address this, several methods are available, including:

Gene/transcript length normalization – For the most common of this family of methods, reads in a sample are first normalized for sequencing depth. The depth-normalized reads are then divided by the length of the corresponding gene or isoform in kilobases. This yields reads per kilobase per million reads (RPKM) [33]. Similar methods generate measures expressed as transcripts per million (TPM). While suitable for within-sample comparisons (comparing the relative abundance of two genes in the same sample), this method is not optimal for between-sample comparisons (differential gene expression analysis) and has largely fallen out of favor for most bulk-sequencing analyses [34].

Trimmed mean of M-values (TMM) – This method uses the weighted average of log expression ratios for each gene calculated for all samples against one reference sample (M-values). Genes with outlier values are thrown out, and a weighted average for all M-values is set for each sample [34,35]. This approach provides superior between-sample normalization compared with TPM methods and is more reliable for downstream statistical inference.

DESeq – The DESeq software for transcriptomic analysis uses a normalization procedure that calculates a per-sample scaling factor based on the median of the ratios of each gene's read count over its geometric mean across all samples [36]. Like TMM, this approach is more reliable than TPM methods for differential gene expression analysis.

Variance modeling at the observational level (voom) – In the voom method, log counts are first normalized for sequencing depth [37]. Then a precision weight incorporating the mean-variance trend for each normalized observation is generated and both the normalized counts and precision weights are entered into the analysis pipeline. This method is particularly useful for small sample sizes or datasets where the between-sample sequencing depth is highly variable.

Single-cell sequencing uses similar processes for expression quantification as bulk sequencing, with the caveat that the normalization procedure must account for the high proportion of zero read counts. This so-called "zero-inflation" is a result of two factors:

Not all cells express the same genes.

Relatively low-abundance transcripts frequently are not always captured/sequenced in a given cell.

While many single-cell RNA sequencing (scRNA-seq) normalization methods use scaling factors as described for bulk sequencing and microarrays, additional methods have been developed to manage zero-inflation and the other biases inherent in single-cell sequencing data [38]. This is an active area of method development and study. (See 'Single-cell RNA sequencing' above.)

Quality assessment — Quality assessment occurs both before and after data normalization.

Pre-normalization quality assessment evaluates the quality of the raw data before preprocessing. For microarrays, the array itself is inspected to ensure there are no bubbles, scratches, or other artifacts on the array. Some commercial arrays also contain controls inserted during sample processing ("spike-in" controls) to ensure that all steps leading to the hybridization were successful. For transcriptome sequencing, each base pair call and individual sequencing read is considered a separate experiment that must be quality controlled [33]. This is performed with tools such as FastQC or NGSQC. Sequencing reads may be then "trimmed" to remove lower-quality base pairs at the ends of each sequencing read, along with the leading or tailing adapter (primer target) sequences that were added to conduct the sequencing reactions [33,39].

Post-normalization quality assessment of either microarray or transcriptome sequencing data evaluates the normalized data from a sample relative to others in the experiment to identify outlier samples or differences in batches of microarrays or sequencing. Samples identified as significantly different from others can be adjusted statistically (ie, batch-level normalization) or excluded from the analysis.

Data transformation — Many common statistical procedures assume a normal and continuous distribution of data. Gene expression levels from microarrays or transcriptome sequencing can be mathematically transformed, often using a logarithmic scale, so that they become normally distributed. Transcriptome sequencing data, which is comprised of read counts rather than continuous numerical values, can be filtered to include only higher read counts which may approximate continuous data. Alternatively, sequencing data can be modeled using a distribution more suitable for count data, such as the negative binomial distribution. Preprocessing can also include filtering out low-quality probe sets or genes with low variability across all samples in the experiment.

Data storage — Microarray and transcriptome sequencing experiments require computational tools to store raw data, analyze gene expression, and ensure uniformity across different laboratories. Most scientific journals specify that raw data be made publicly available as a requirement for publication [40].

The fluorescence intensities generated by scanning an oligonucleotide array or sequencing flow cell with a laser scanner results in an image file.

A typical microarray raw data file, called a CEL file, is 0.1 to 1 gigabytes (GB) per array [21].

A typical sequencing text-based raw data file, called a FASTQ file, is approximately 1 to 5 GB per sample.

Beyond the storage of raw data, a full experimental dataset includes the final preprocessed expression data, a metadata file that describes the necessary technical information for each sample, and a file that includes the clinical or experimental variables (the phenotypes and covariates) associated with each sample. The metadata describes important experimental details for each sample such as the date of sample collection, measure of RNA quality, identity of the technician running the assay, and the well position of the sample on a 96-well plate.

Thus, these experiments generate a large storage requirement (often terabytes) and software to access, integrate, and analyze complex genomic and clinical datasets. As the size and complexity of these data steadily increase, commercial cloud storage and computing infrastructures are increasingly used, many of which have adopted sophisticated security measures to ensure compliance with privacy standards.  

Data analysis — There are several possible levels of data analysis, ranging from simple statistical tests that can be performed with commercial software packages to advanced analyses and the development of novel algorithms.

In the clinical setting, the analysis is typically limited to reporting of technical quality and providing a quantitative readout of results. For single gene assays, the readout is either binary (the gene is expressed or not) or continuous (eg, estimated viral load). For multigene assays that estimate prognosis based on the collective expression pattern of a set number of genes, analysis consists of applying a trained statistical model that produces a prognostic score. Examples are listed separately. (See "Molecular prognostic tests for prostate cancer", section on 'Tests based on molecular characteristics' and "Deciding when to use adjuvant chemotherapy for hormone receptor-positive, HER2-negative breast cancer".)

In research settings, advanced analyses and novel algorithms are implemented with a variety of programming languages, such as Perl and Python, and computational software, such as R and Matlab [41,42]. The flexibility to write, modify, and share algorithms using these tools makes them particularly well suited for gene expression data analysis.

Differential expression – One of the most common analyses performed on gene expression data is to determine which genes are altered in one condition as compared with another. This can be accomplished by performing a t-test, ANOVA, or linear model for continuous data, or binomial models for numerical data. (See "Glossary of common biostatistical and epidemiological terms".)

Multiple comparison problem – Statistical analyses of several thousand genes pose unique problems in the interpretation of the results due to the large number of tests performed. This is because every statistical test has a small possibility of leading to the conclusion that an association is present when no such association actually exists, and when thousands of genes are tested with a microarray, an unacceptably high number of false-positive associations may be produced. An overview of statistical principles relevant to the multiple comparison problem is presented separately. (See "Hypothesis testing in clinical research: Proof, p-values, and confidence intervals".)

Single-cell analysis – Single-cell RNA sequencing (scRNAseq) can identify specific cells or cell types and their functions by interrogating cell-specific molecular signatures [23]. Cell type identification is often done using clustering methods that exploit latent-class modeling to identify cells with similar gene expression patterns. Modified versions of tools for bulk sequencing enable differential expression and network analyses to characterize cell-specific differences in gene expression and function. Methods for scRNAseq analysis are still in their infancy and an active field of study. (See 'Single-cell RNA sequencing' above.)

Class prediction – In this type of analysis, samples from two conditions are split into a training set and a test set. A list of genes that distinguishes the two conditions is derived from the training set of samples, and the accuracy of this gene expression signature is assessed on the test set of samples.

Class discovery – In addition to these general types of analyses of gene expression data, transcriptome sequencing also allows for more advanced analyses, such as discovery of novel transcripts or isoforms, detection of alternative splicing, and de novo reconstruction of the transcriptome [39].

By evaluating genes across all samples regardless of their clinical phenotype, it can be determined which samples are most closely correlated with each other based on gene expression alone. Samples that share similar patterns of gene expression may represent previously unrecognized subtypes of the disease.

Network analysis – The number of genes assayed by microarrays and transcriptome sequencing allows the entire dataset to be harnessed to make new predictions about how genes might interact. These approaches often operate on the premise that highly correlated genes in a network of gene-gene interactions are involved in the same or overlapping biologic pathways. One approach, weighted gene co-expression network analysis (WGCNA), works by clustering highly correlated genes, defined as "modules," within a gene network [43]. Genome-wide gene expression data can also be integrated with other data types, such as DNA methylation, proteomics, and metabolomics [39].

Interpretation — The final step in gene expression data analysis is to interpret the results in a biologically meaningful context.

Making biologic sense of a whole transcriptome profiling-derived gene list is one of the more challenging aspects of the analysis. There are many strategies for accomplishing this goal.

Additional studies are often required to validate biologic predictions that are made from the microarray or sequencing data.

Comparison with other datasets – Several tools exist for comparing gene expression datasets, including large databases containing gene expression data to look for a shared gene expression signature [44,45], alternative gene probes [46], and analytic tools that incorporate phenotypic associations [47-49].

Enrichment ranking – Gene set enrichment analysis (GSEA) is a method by which gene expression data is ranked by association with phenotypes and is used as a means to identify biologically-relevant pathways [47,48]. Other techniques provide other mechanisms to enrich pathways or functional categories [50] or to visualize previously published interactions between genes of interest [51]. Data visualization using heat maps, which organize samples by columns and genes by rows according to similarity in gene expression, are also useful for determining which groups of genes or samples share similar patterns of expression. Gene set variation analysis (GSVA) uses a similar approach to identify pathway enrichment in a gene-expression dataset [52].

CLINICAL AND RESEARCH APPLICATIONS — Gene expression profiling is primarily used for research purposes, but a few expression profiling assays have been incorporated into clinical practice to assist with diagnosis and disease classification.

Clinical use — Gene expression profiling within clinical specimens has the potential to be used for improving diagnosis, clarifying prognosis, and helping to optimize treatment. As the platforms for measuring gene expression continue to evolve, personalized approaches to the diagnosis and treatment of complex human disease may increasingly find their place in routine clinical practice. (See "Personalized medicine".)

Breast cancer – The most advanced application of gene expression profiling is in predicting disease outcomes, which may help in treatment decisions for patients with certain cancers. The risk of certain therapies might be outweighed by the potential benefit for patients at high risk for relapse; in such cases, cancer therapy may be favored, whereas for those with a lower risk of relapse, it may be possible to avoid certain therapies. Gene expression profiling has been helpful in targeting appropriate therapy for patients with breast cancer and other cancers.

A breast cancer classification assay such as the Oncotype DX can be performed on biopsy specimens to guide management decisions. (See "Prognostic and predictive factors in metastatic breast cancer", section on 'Predictive factors' and "Deciding when to use adjuvant chemotherapy for hormone receptor-positive, HER2-negative breast cancer".)

Heart transplant rejection – Routine endomyocardial biopsy surveillance is used for cardiac transplant patients to detect acute cellular rejection. Gene expression profiling on peripheral blood monocytes can then be used to determine the need for additional biopsies. In a 2006 study involving 107 heart transplant patients, a profile of the expression of 11 genes in blood monocytes correlated with rejection [53]. (See "Heart transplantation in adults: Diagnosis of allograft rejection", section on 'Gene expression profiling'.)

Lung cancer – Individuals with an abnormal chest computed tomography (CT) scan and positive smoking history often require invasive diagnostic procedures beyond bronchoscopy to diagnose lung cancer. A gene expression profiling assay on bronchoscopy specimens is intended for patients with an intermediate probability of malignancy and a nondiagnostic bronchoscopy result [54]. (See "Diagnostic evaluation of the incidental pulmonary nodule".)

Interstitial lung disease classification – There are several forms of interstitial lung disease (ILD). Because treatments differ, it is important to differentiate usual interstitial pneumonia (UIP) from other forms of ILD. A gene expression profiling test performed on bronchoscopy specimens is able to distinguish UIP with a specificity of 92 percent and sensitivity of 60 percent compared with histopathology [55]. (See "Idiopathic interstitial pneumonias: Classification and pathology".)

Thyroid cancer – Thyroid nodules are frequently evaluated using fine-needle aspirates, but this approach sometimes yields indeterminate results and requirement for thyroid surgery to obtain a tissue sample for definitive diagnosis. An expression profiling test (Afirma) had a positive predictive value of 47 percent and a negative predictive value of 96 percent in patients with Bethesda III and IV thyroid nodules [56]. These findings suggest that patients with indeterminate results from a fine needle aspirate of a thyroid nodule can be monitored less invasively, potentially avoiding unnecessary surgery. (See "Diagnostic approach to and treatment of thyroid nodules", section on 'Indeterminate cytology (Bethesda III and IV)' and "Evaluation and management of thyroid nodules with indeterminate cytology in adults".)

Research applications — Samples from human tissues and model systems can be studied in two main types of research:

Discovery-based (data-driven) analyses – In discovery-based analysis, researchers aim to identify novel biological insights into a disease or other biological state of interest. (See 'Discovery' below.)

Hypothesis-driven analyses – In hypothesis-driven analysis, the scientist has a prespecified question and uses gene expression profiling data to answer this question. (See 'Hypothesis testing' below.)

Discovery — Analytical tools discussed above may be used to identify novel biological insights. (See 'Methods' above.)

Differential gene expression and pathway analysis via enrichment ranking is used to identify previously unknown genes, and network analysis can be used to discover previously unappreciated biological pathways. Differential expression and pathway analysis tools broadly consider differences in genes or groups of genes between two groups. These tools may be used to identify differences between disease and control samples or samples that have or have not been treated with a medication of interest.

Class discovery, in which novel groupings of samples instead of genes are considered, is an alternative widely used discovery-based analytical approach. For example, class discovery may be used to identify subtypes of diseases with similar gene expression patterns in a tissue of interest. Examples of this discovery-based analytical approach include:

Bladder cancer – Maintenance treatment of locally advanced or metastatic bladder (urothelial) cancer with the immune checkpoint inhibitor avelumab following standard of care chemotherapy was found to prolong overall survival as compared with chemotherapy alone in a randomized trial [57]. Investigators next aimed to identify mechanisms and biomarkers of long-term avelumab treatment response using whole transcriptome profiling of tumor samples, done in combination with immunohistochemistry and whole-exome sequencing [58]. Using differential gene expression and pathway analyses that included network- and enrichment-based approaches, investigators found that avelumab survival benefit was associated with genes and pathways indicative of an enhanced innate and adaptive immune response, while worse prognosis was associated with genes and pathways indicative of tumor growth and angiogenesis. They found that a subset of these genes, when combined with a subset of identified tumor mutations, could predict better overall survival in response to avelumab.

Asthma endotyping – Asthma is a heterogeneous disease that responds to therapies such as glucocorticoids or biologic agents in a subset of individuals. In a 2007 study, airway samples were collected via research bronchoscopy from individuals with steroid-naïve asthma and healthy controls, and gene expression profiling was analyzed by microarray [59]. Class discovery analyses indicated that approximately 50 percent of the individuals with asthma had relatively high expression of the genes most altered between asthma and control conditions. In the other 50 percent, expression of these genes was similar to controls. Individuals with high expression of these genes were then found to have evidence of an enhanced type 2 inflammatory state, including a better response of lung function to glucocorticoids, than those individuals with low expression of these genes. Thus, the gene expression divided individuals with asthma into two endotypes, or groups of individuals with differing underlying biology, that differed by type 2 inflammation and response to the associated treatments. (See "Severe asthma phenotypes", section on 'Phenotyping based on biomarkers of inflammation'.)

miRNAs in cancer – MicroRNAs (miRNAs) have also been found to be dysregulated in a number of solid tumors and hematologic malignancies [60-72]. (See 'Noncoding RNAs' above.)

A 2012 systematic review of available studies examining associations between miRNA profiling and cancer prognosis found associations between certain miRNAs and poor outcomes, including decreased overall survival, but noted several potential sources of bias [73]. Further work will be needed to validate the use of these findings in clinical applications.

Hypothesis testing — Analytical tools such as differential gene expression and pathway analysis are also used to analyze the data when an investigator is considering a specific hypothesis. In this case, however, the investigator may be asking if a particular gene, or set of genes representing a biological pathway, is altered in one condition compared with another or related to an outcome of interest. By testing only one or a small number of genes in relation to a hypothesis, as opposed to all genes across the transcriptome, the multiple comparisons problem becomes less of an issue. (See 'Data analysis' above.)

The hypothesis-driven approach is often employed to determine if a gene or set of genes can be used as a biomarker to represent a particular biological subgroup of individuals or predict a treatment response.

Glucocorticoid-unresponsive inflammation in COPD – Inhaled glucocorticoids are a mainstay of treatment in chronic obstructive pulmonary disease (COPD). However, COPD often does not respond to this treatment. One possible explanation may be that interleukin (IL)-17 driven inflammation in the airways is glucocorticoid-unresponsive and may contribute to disease severity. In a 2019 study, our group generated an airway tissue IL-17 gene signature score, a composite of 11 IL-17-associated genes derived using a cell culture model stimulated with IL-17, to identify a COPD subgroup with enhanced IL-17-driven airway inflammation [74]. This signature was hypothesized to be a biomarker of IL-17-associated immune activity and was used to probe gene expression data in studies of psoriatic skin lesions before and after treatment with anti-IL-17 biologics. In these studies, the IL-17 gene signature decreased in association with clinically recognized treatment response to the biologics, suggesting that the signature did indeed mark IL-17 activity. The signature was then studied in a gene expression dataset of airway samples obtained from a randomized trial of inhaled glucocorticoids versus placebo for COPD. The signature was associated with lack of improvement in lung function with glucocorticoids. Thus, overall, the IL-17 gene signature was associated with anti-IL-17 treatment response in psoriasis and lack of response to glucocorticoids in COPD.

Barriers — Despite the promise of gene expression profiling for clinical and research applications, several barriers interfere with widespread use of these technologies.

Cost – The cost of transcriptome and single-cell sequencing is high. However, as cost declines, use is expected to increase.

Sample requirement – A sufficient quantity of high-quality RNA is required to run a valid assay.

RNA quantity – Needle biopsy specimens contain a relatively small number of cells, and some cells may be needed for initial pathology or cytology testing, leaving insufficient quantities for further analysis.

RNA quality – Clinical specimens that are archived, paraffin embedded, and formalin fixed may have nucleic acid degradation, making transcriptome sequencing impossible.

Investigator or laboratory expertise – Expertise is required for experimental design and data analysis. Studies of microarray biomarker development have suggested that analytic expertise may be a key factor in obtaining reproducible and meaningful results [75].

SUMMARY

Definitions – Genome-wide expression profiling is a technique that uses data generated from microarrays or transcriptome sequencing. These technologies measure the full spectrum of RNA in a cell, including coding mRNA and noncoding RNAs. (See 'Definitions' above.)

Methods – A variety of methods are used to evaluate gene expression (table 1). These methods all rely on the abundance of messenger ribonucleic acid (mRNA). They differ in their requirements for the amount of starting material, their sensitivity to detect the RNA of interest, technical requirements for running the experiment, and computational requirements for data storage, analysis, and interpretation. Small numbers of genes (100s to 1000s) can be profiled using reverse transcription polymerase chain reaction (RT-PCR), RNA in situ hybridization, and custom microarrays. Genome-wide gene expression can be profiled using microarrays or mRNA sequencing. This can be done on admixtures of cells from tissues or on single cells. Research applications may also incorporate in situ and spatial transcriptomics within a tissue. (See 'Methods' above.)

Analysis and interpretation – There are numerous challenges and pitfalls in the analysis and interpretation of the large volume of data generated, including considerations related to data processing, storage, analysis, and interpretation. All genome-wide gene expression studies should statistically account for the multiple-comparison problem inherent in these technologies. (See 'Data analysis and interpretation' above.)

Clinical use – Gene expression profiling is emerging as a potential approach for the diagnosis and prognosis of complex human disease. It is used in assays to help diagnose or prognosticate several conditions including breast cancer, heart transplant rejection, lung cancer, interstitial lung disease, and thyroid cancer. However, a number of important barriers remain to expanded use of gene expression profiling in clinical settings, including validation of biomarkers in prospective multicenter studies to demonstrate their reproducibility, accuracy, and cost-effectiveness across multiple sites and operators. (See 'Clinical use' above and 'Barriers' above.)

Research – Gene expression profiling methods can help to identify previously unknown genes and to discover previously unappreciated biological pathways. Examples include bladder cancer, asthma, and microRNAs (miRNAs). They can also be used to test hypotheses such as the role of interleukin (IL)-17-related genes in response to glucocorticoid treatment. (See 'Research applications' above.)

ACKNOWLEDGMENT — The UpToDate editorial staff acknowledges Avrum Spira, MD, MSc, who contributed to an earlier version of this topic review.

  1. Dvorák Z, Pascussi JM, Modrianský M. Approaches to messenger RNA detection - comparison of methods. Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub 2003; 147:131.
  2. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004; 431:931.
  3. Brown TA. Genomes 3, 3rd ed, Garland Science, 2007.
  4. Lee RC, Feinbaum RL, Ambros V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 1993; 75:843.
  5. Girard A, Sachidanandam R, Hannon GJ, Carmell MA. A germline-specific class of small RNAs binds mammalian Piwi proteins. Nature 2006; 442:199.
  6. Aravin A, Gaidatzis D, Pfeffer S, et al. A novel class of small RNAs bind to MILI protein in mouse testes. Nature 2006; 442:203.
  7. Khalil AM, Guttman M, Huarte M, et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 2009; 106:11667.
  8. Hon CC, Ramilowski JA, Harshbarger J, et al. An atlas of human long non-coding RNAs with accurate 5' ends. Nature 2017; 543:199.
  9. Blumberg DD. Creating a ribonuclease-free environment. Methods Enzymol 1987; 152:20.
  10. Nolan T, Hands RE, Bustin SA. Quantification of mRNA using real-time RT-PCR. Nat Protoc 2006; 1:1559.
  11. VanGuilder HD, Vrana KE, Freeman WM. Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques 2008; 44:619.
  12. Gall JG, Pardue ML. Formation and detection of RNA-DNA hybrid molecules in cytological preparations. Proc Natl Acad Sci U S A 1969; 63:378.
  13. Jin L, Lloyd RV. In situ hybridization: methods and applications. J Clin Lab Anal 1997; 11:2.
  14. Young AP, Jackson DJ, Wyeth RC. A technical review and guide to RNA fluorescence in situ hybridization. PeerJ 2020; 8:e8806.
  15. Ali A, Jamieson NB, Khan IN, et al. Prognostic implications of microRNA-21 overexpression in pancreatic ductal adenocarcinoma: an international multicenter study of 686 patients. Am J Cancer Res 2022; 12:5668.
  16. Kappelhoff R, Puente XS, Wilson CH, et al. Overview of transcriptomic analysis of all human proteases, non-proteolytic homologs and inhibitors: Organ, tissue and ovarian cancer cell line expression profiling of the human protease degradome by the CLIP-CHIP™ DNA microarray. Biochim Biophys Acta Mol Cell Res 2017; 1864:2210.
  17. Hammond SM. RNAi, microRNAs, and human disease. Cancer Chemother Pharmacol 2006; 58 Suppl 1:s63.
  18. Alwine JC, Kemp DJ, Stark GR. Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl-paper and hybridization with DNA probes. Proc Natl Acad Sci U S A 1977; 74:5350.
  19. Pease AC, Solas D, Sullivan EJ, et al. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A 1994; 91:5022.
  20. Nuwaysir EF, Huang W, Albert TJ, et al. Gene expression analysis using oligonucleotide arrays produced by maskless photolithography. Genome Res 2002; 12:1749.
  21. Wilhelm BT, Landry JR. RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing. Methods 2009; 48:249.
  22. Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 2011; 12:87.
  23. Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in single-cell transcriptomics. Nat Rev Genet 2015; 16:133.
  24. Hwang B, Lee JH, Bang D. Single-cell RNA sequencing technologies and bioinformatics pipelines. Exp Mol Med 2018; 50:1.
  25. Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nat Rev Genet 2016; 17:175.
  26. Islam S, Kjällquist U, Moliner A, et al. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res 2011; 21:1160.
  27. Hashimshony T, Wagner F, Sher N, Yanai I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep 2012; 2:666.
  28. Macosko EZ, Basu A, Satija R, et al. Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets. Cell 2015; 161:1202.
  29. Zhang L, Chen D, Song D, et al. Clinical and translational values of spatial transcriptomics. Signal Transduct Target Ther 2022; 7:111.
  30. He B, Bergenstråhle L, Stenbeck L, et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat Biomed Eng 2020; 4:827.
  31. Moncada R, Barkley D, Wagner F, et al. Integrating microarray-based spatial transcriptomics and single-cell RNA-seq reveals tissue architecture in pancreatic ductal adenocarcinomas. Nat Biotechnol 2020; 38:333.
  32. Quackenbush J. Microarray data normalization and transformation. Nat Genet 2002; 32 Suppl:496.
  33. Yang IS, Kim S. Analysis of Whole Transcriptome Sequencing Data: Workflow and Software. Genomics Inform 2015; 13:119.
  34. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 2010; 11:R25.
  35. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 2010; 26:139.
  36. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 2014; 15:550.
  37. Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 2014; 15:R29.
  38. Vallejos CA, Risso D, Scialdone A, et al. Normalizing single-cell RNA sequencing data: challenges and opportunities. Nat Methods 2017; 14:565.
  39. Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13.
  40. Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 2001; 29:365.
  41. R Development Core Team. R: A Language and Environment for Statistical Computing 2009. R Foundation for Statistical Computing. Available at: www.R-project.org (Accessed on December 14, 2009).
  42. The Mathworks I. The MathWorks - MATLAB and Simulink for Technical Computing 2009. Available at: www.mathworks.com (Accessed on December 14, 2009).
  43. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008; 9:559.
  44. Barrett T, Troup DB, Wilhite SE, et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 2009; 37:D885.
  45. Gower AC, Spira A, Lenburg ME. Discovering biological connections between experimental conditions based on common patterns of differential gene expression. BMC Bioinformatics 2011; 12:381.
  46. Dai M, Wang P, Boyd AD, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005; 33:e175.
  47. Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 2005; 102:15545.
  48. Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003; 34:267.
  49. Lamb J, Crawford ED, Peck D, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006; 313:1929.
  50. Dennis G Jr, Sherman BT, Hosack DA, et al. DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003; 4:P3.
  51. Ingenuity Systems. Ingenuity Pathway Analysis Software 2009. Available at: www.ingenuity.com (Accessed on December 14, 2009).
  52. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013; 14:7.
  53. Deng MC, Eisen HJ, Mehra MR, et al. Noninvasive discrimination of rejection in cardiac allograft recipients using gene expression profiling. Am J Transplant 2006; 6:150.
  54. Raval AA, Benn BS, Benzaquen S, et al. Reclassification of risk of malignancy with Percepta Genomic Sequencing Classifier following nondiagnostic bronchoscopy. Respir Med 2022; 204:106990.
  55. Richeldi L, Scholand MB, Lynch DA, et al. Utility of a Molecular Classifier as a Complement to High-Resolution Computed Tomography to Identify Usual Interstitial Pneumonia. Am J Respir Crit Care Med 2021; 203:211.
  56. Patel KN, Angell TE, Babiarz J, et al. Performance of a Genomic Sequencing Classifier for the Preoperative Diagnosis of Cytologically Indeterminate Thyroid Nodules. JAMA Surg 2018; 153:817.
  57. Powles T, Park SH, Voog E, et al. Avelumab Maintenance Therapy for Advanced or Metastatic Urothelial Carcinoma. N Engl J Med 2020; 383:1218.
  58. Powles T, Sridhar SS, Loriot Y, et al. Avelumab maintenance in advanced urothelial carcinoma: biomarker analysis of the phase 3 JAVELIN Bladder 100 trial. Nat Med 2021; 27:2200.
  59. Woodruff PG, Boushey HA, Dolganov GM, et al. Genome-wide profiling identifies epithelial cell genes associated with asthma and with treatment response to corticosteroids. Proc Natl Acad Sci U S A 2007; 104:15858.
  60. Calin GA, Sevignani C, Dumitru CD, et al. Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A 2004; 101:2999.
  61. Lovat F, Valeri N, Croce CM. MicroRNAs in the pathogenesis of cancer. Semin Oncol 2011; 38:724.
  62. Esquela-Kerscher A, Slack FJ. Oncomirs - microRNAs with a role in cancer. Nat Rev Cancer 2006; 6:259.
  63. Calin GA, Dumitru CD, Shimizu M, et al. Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci U S A 2002; 99:15524.
  64. Calin GA, Ferracin M, Cimmino A, et al. A MicroRNA signature associated with prognosis and progression in chronic lymphocytic leukemia. N Engl J Med 2005; 353:1793.
  65. Garzon R, Volinia S, Liu CG, et al. MicroRNA signatures associated with cytogenetics and prognosis in acute myeloid leukemia. Blood 2008; 111:3183.
  66. Yanaihara N, Caplen N, Bowman E, et al. Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. Cancer Cell 2006; 9:189.
  67. Yu SL, Chen HY, Chang GC, et al. MicroRNA signature predicts survival and relapse in lung cancer. Cancer Cell 2008; 13:48.
  68. Raponi M, Dossey L, Jatkoe T, et al. MicroRNA classifiers for predicting prognosis of squamous cell lung cancer. Cancer Res 2009; 69:5776.
  69. Fanini F, Vannini I, Amadori D, Fabbri M. Clinical implications of microRNAs in lung cancer. Semin Oncol 2011; 38:776.
  70. Boeri M, Pastorino U, Sozzi G. Role of microRNAs in lung cancer: microRNA signatures in cancer prognosis. Cancer J 2012; 18:268.
  71. Castañeda CA, Agullo-Ortuño MT, Fresno Vara JA, et al. Implication of miRNA in the diagnosis and treatment of breast cancer. Expert Rev Anticancer Ther 2011; 11:1265.
  72. Sandhu S, Garzon R. Potential applications of microRNAs in cancer diagnosis, prognosis, and treatment. Semin Oncol 2011; 38:781.
  73. Nair VS, Maeda LS, Ioannidis JP. Clinical outcome prediction by microRNAs in human cancer: a systematic review. J Natl Cancer Inst 2012; 104:528.
  74. Christenson SA, van den Berge M, Faiz A, et al. An airway epithelial IL-17A response signature identifies a steroid-unresponsive COPD patient subgroup. J Clin Invest 2019; 129:169.
  75. Shi L, Campbell G, Jones WD, et al. The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol 2010; 28:827.
Topic 14602 Version 34.0

References

آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟