FHIR-Aggregator: A Catalog of Research Data¶
The FHIR Aggregator acts as a centralized repository for diverse healthcare data, organized using the FHIR (Fast Healthcare Interoperability Resources) standard. It provides researchers access to a wide range of information, including:
- Clinical data: Patient demographics, conditions, medications, observations, and procedures.
- Research studies: Information about research projects, participants, and study protocols.
- OMICS data associated with Specimens
Specify the endpoint¶
We need to select the FHIR Server's URL https://google-fhir.fhir-aggregator.org
This line of code tells the notebook, "Remember this address: https://google-fhir.fhir-aggregator.org, and label it FHIR_BASE. We'll use it later to talk to a server that stores healthcare data."
By setting this environment variable, the URL to the FHIR Aggregator server is conveniently stored for later use within the notebook. This way you won't need to repeat the URL every time it's needed.
From there we have access to search the data in the server using FHIR queries
%env FHIR_BASE=https://google-fhir.fhir-aggregator.org
env: FHIR_BASE=https://google-fhir.fhir-aggregator.org
Example FHIR query¶
Now that you have the endpoint, if you are comfortable with FHIR, that is all you need. For example:
This query returns the official identifier for all ResearchStudy resources.
- $FHIR_BASE is the environment variable we set earlier, which holds the FHIR server's base URL. It's expanded to the actual URL during execution.
- /ResearchStudy is the FHIR resource type we are interested in (in this case, "ResearchStudy").
- ?_elements=identifier is a FHIR search parameter that limits the returned data to only include the 'identifier' element of the ResearchStudy resources.
# Install the jq json formatter tool
# e.g. !apt-get install -yq jq > /dev/null
!jq --version
! curl -s $FHIR_BASE'/ResearchStudy?_elements=identifier&identifier.use=official' | jq -rc '.entry[] | [ (.resource.identifier[] | .value), .fullUrl]' | sort
/usr/bin/sh: 1: jq: not found
/usr/bin/sh: 1: jq: not found
- Let's craft the code to query the FHIR server and load the results into a Pandas DataFrame.
import requests
import pandas as pd
import json
# Assuming FHIR_BASE is already set as an environment variable
fhir_base_url = %env FHIR_BASE
# Define the API endpoint
endpoint = f"{fhir_base_url}/ResearchStudy?_elements=identifier&identifier.use=official"
# Make the request
response = requests.get(endpoint)
# Check for successful response
if response.status_code == 200:
# Parse the JSON response
data = response.json()
# Extract identifiers
identifiers = []
for entry in data.get('entry', []):
resource = entry.get('resource', {})
for identifier in resource.get('identifier', []):
# add the url RearchStudy to the dataframe
identifier['url'] = entry.get('fullUrl')
identifiers.append(identifier)
# Create a Pandas DataFrame
print(f"Found {len(identifiers)} ResearchStudy identifiers. Use the 'url' field to retrieve the data.")
df = pd.DataFrame(identifiers)
display(df) # Display the DataFrame
else:
print(f"Error: Request failed with status code {response.status_code}")
Found 100 ResearchStudy identifiers. Use the 'url' field to retrieve the data.
| system | use | value | url | |
|---|---|---|---|---|
| 0 | https://cda.readthedocs.io/associated_project | official | upenn_gbm | https://google-fhir.fhir-aggregator.org/Resear... |
| 1 | https://cda.readthedocs.io/associated_project | official | victre | https://google-fhir.fhir-aggregator.org/Resear... |
| 2 | https://cda.readthedocs.io/system | official | CDA | https://google-fhir.fhir-aggregator.org/Resear... |
| 3 | https://cda.readthedocs.io/associated_project | official | vestibular_schwannoma_seg | https://google-fhir.fhir-aggregator.org/Resear... |
| 4 | https://cda.readthedocs.io/associated_project | official | tcga_uvm | https://google-fhir.fhir-aggregator.org/Resear... |
| ... | ... | ... | ... | ... |
| 95 | https://cda.readthedocs.io/associated_project | official | nsclc_radiomics_interobserver1 | https://google-fhir.fhir-aggregator.org/Resear... |
| 96 | https://cda.readthedocs.io/associated_project | official | nsclc_radiomics | https://google-fhir.fhir-aggregator.org/Resear... |
| 97 | https://cda.readthedocs.io/associated_project | official | nlst | https://google-fhir.fhir-aggregator.org/Resear... |
| 98 | https://cda.readthedocs.io/associated_project | official | nsclc_radiomics_genomics | https://google-fhir.fhir-aggregator.org/Resear... |
| 99 | https://cda.readthedocs.io/associated_project | official | mouse_mammary | https://google-fhir.fhir-aggregator.org/Resear... |
100 rows × 4 columns
Explore the notebooks in the sidebar to learn about our command line tool fhir-query and our Vocabulary dataframe.