Our data sources
The data in FHIR-Aggregator is a snapshot of the following sources' data as of 3/31/2025.
Cellosaurus¶
Cellosaurus is a knowledge resource on cell lines. It attempts to describe all cell lines used in biomedical research including: immortalized cell lines, naturally immortal cell lines (example: stem cell lines), finite life cell lines when those are distributed and used widely,vertebrate cell line with an emphasis on human, mouse and rat cell lines, invertebrate (insects and ticks) cell lines, and plant cell lines
Genotype-Tissue Expression Portal(GTEx)¶
The Adult Genotype-Tissue Expression (GTEx) project is a comprehensive public resource for the study of tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq.
Human Tumor Atlas Network (HTAN)¶
The Human Tumor Atlas Network (HTAN) is a National Cancer Institute (NCI)-funded Cancer MoonshotSM initiative through which a collaborative network of Research Centers and a central Data Coordinating Center are constructing 3-dimensional atlases of the cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease. Across a diverse set of cancer types, these atlases aim to define critical processes and events throughout the life cycle of human cancers, such as the transition of pre-malignant lesions to malignant tumors, the progression of malignant tumors to metastatic cancer, tumor response to therapeutics, and the development of therapeutic resistance. The diverse set of cancer types under investigation include tumors that affect minority and underserved populations, tumors with a hereditary component, and highly aggressive pediatric cancers.
1000 Genomes¶
The 1000 Genomes Project created a catalogue of common human genetic variation, using openly consented samples from people who declared themselves to be healthy. The reference data resources generated by the project remain heavily used by the biomedical science community.
International Cancer Genome Consortium(ICGC)¶
The International Cancer Genome Consortium (ICGC) is a scientific organization that coordinates large-scale cancer genome studies. Their current data includes information on more than 50 globally significant cancer types.
Cancer Data Aggregator¶
The Cancer Data Aggregator is a service of the National Cancer Institutes' (NCI) Cancer Research Data Commons. We pull metadata for thousands of studies hosted at multiple data repositories across NCI, and make it available for search from a single tool so researchers can more easily find and reuse existing cancer research data. All information in the CDA has been aggregated into a single internal model and harmonized to existing ontologies to allow easy search and cross reference. The CDA includes data from:
-
Genomics Data CommonsThe Genomic Data Commons (GDC) is a cancer knowledge network that supports hosting, standardization, and analysis of genomic, clinical, and biospecimen data from cancer research programs. The GDC harmonizes raw sequencing data, identifies and applies state-of-the-art bioinformatics methods for generating mutation calls, structural variants and other high-level data, and provides scalable downloads and web-based analysis tools. Because of the personal nature of genomic data, some genomic data in the GDC may be controlled access, requiring eRA Commons authentication and dbGaP authorization to access the data. -
Proteomic Data CommonsThe Proteomic Data Commons (PDC) was developed to advance understanding of how proteins help to shape the risk, diagnosis, development, progression, and treatment of cancer. In-depth analysis of proteomic data allows the study of both how and why cancer develops and informs ways of tailoring treatment for individual patients using precision medicine. All proteomic data in the PDC are open access and, with appropriate attribution, can be included in publications. -
Imaging Data CommonsNCI Imaging Data Commons (IDC) is a cloud-based repository of publicly available cancer imaging data co-located with the analysis and exploration tools and resources. IDC is a node within the broader NCI Cancer Research Data Commons (CRDC) infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data. -
Cancer Data ServicesCDS hosts a variety of data types from NCI projects such as the Human Tumor Atlas Network (HTAN), Division of Cancer Control and Population Sciences (DCCPS), and Childhood Cancer Data Initiative (CCDI) as well as data from independent research projects. The CDS is home to both open and controlled access data. -
Integrated Canine Data CommonsThe Integrated Canine Data Commons (ICDC) is a cloud-based repository of spontaneously-arising canine cancer data. ICDC was established to further research on human cancers by enabling comparative analysis with canine cancer. The data in the ICDC is sourced from multiple different programs and projects; all focused on canine subjects. The data is harmonized into an integrated data model and then made available to the research community. -
ISB Cancer Gateway in the CloudThe ISB Cancer Gateway in the Cloud (ISB-CGC) is one of three National Cancer Institute (NCI) Cloud Resources tasked with enabling researchers to combine cancer data and cloud computation. The ISB-CGC cloud resource hosts data from a variety of sources such as HTAN and TCGA, CPTAC, and TARGET from the GDC and PDC in Google BigQuery columnar data tables. This includes file, case, clinical, and open access derived data that can be accessed both programmatically and through interactive web applications, eliminating the need to download and store large data sets.