FHIR-Aggregator: A Catalog of Research Data¶

The FHIR Aggregator acts as a centralized repository for diverse healthcare data, organized using the FHIR (Fast Healthcare Interoperability Resources) standard. It provides researchers access to a wide range of information, including:

Clinical data: Patient demographics, conditions, medications, observations, and procedures.
Research studies: Information about research projects, participants, and study protocols.
OMICS data associated with Specimens

Specify the endpoint¶

We need to select the FHIR Server's URL https://google-fhir.fhir-aggregator.org
- This line of code tells the notebook, "Remember this address: https://google-fhir.fhir-aggregator.org, and label it FHIR_BASE. We'll use it later to talk to a server that stores healthcare data."
- By setting this environment variable, the URL to the FHIR Aggregator server is conveniently stored for later use within the notebook. This way you won't need to repeat the URL every time it's needed.
From there we have access to search the data in the server using FHIR queries

In [1]:

Copied!

%env FHIR_BASE=https://google-fhir.fhir-aggregator.org
%env FHIR_BASE=https://google-fhir.fhir-aggregator.org

env: FHIR_BASE=https://google-fhir.fhir-aggregator.org

Example FHIR query¶

Now that you have the endpoint, if you are comfortable with FHIR, that is all you need. For example:

This query returns the official identifier for all ResearchStudy resources.

$FHIR_BASE is the environment variable we set earlier, which holds the FHIR server's base URL. It's expanded to the actual URL during execution.
/ResearchStudy is the FHIR resource type we are interested in (in this case, "ResearchStudy").
?_elements=identifier is a FHIR search parameter that limits the returned data to only include the 'identifier' element of the ResearchStudy resources.

In [2]:

Copied!

# Install the jq json formatter tool
# e.g. !apt-get install -yq jq > /dev/null
!jq --version

! curl -s $FHIR_BASE'/ResearchStudy?_elements=identifier&identifier.use=official' | jq -rc '.entry[] | [ (.resource.identifier[] | .value), .fullUrl]' | sort
# Install the jq json formatter tool
# e.g. !apt-get install -yq jq > /dev/null
!jq --version

! curl -s $FHIR_BASE'/ResearchStudy?_elements=identifier&identifier.use=official' | jq -rc '.entry[] | [ (.resource.identifier[] | .value), .fullUrl]' | sort

/usr/bin/sh: 1: jq: not found

/usr/bin/sh: 1: jq: not found

Let's craft the code to query the FHIR server and load the results into a Pandas DataFrame.

In [3]:

Copied!





import requests
import pandas as pd
import json

# Assuming FHIR_BASE is already set as an environment variable
fhir_base_url = %env FHIR_BASE

# Define the API endpoint
endpoint = f"{fhir_base_url}/ResearchStudy?_elements=identifier&identifier.use=official"

# Make the request
response = requests.get(endpoint)

# Check for successful response
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Extract identifiers
    identifiers = []
    for entry in data.get('entry', []):
        resource = entry.get('resource', {})
        for identifier in resource.get('identifier', []):
            # add the url RearchStudy to the dataframe
            identifier['url'] = entry.get('fullUrl')
            identifiers.append(identifier)

    # Create a Pandas DataFrame
    print(f"Found {len(identifiers)} ResearchStudy identifiers. Use the 'url' field to retrieve the data.")
    df = pd.DataFrame(identifiers)
    display(df)  # Display the DataFrame
else:
    print(f"Error: Request failed with status code {response.status_code}")
import requests
import pandas as pd
import json

# Assuming FHIR_BASE is already set as an environment variable
fhir_base_url = %env FHIR_BASE

# Define the API endpoint
endpoint = f"{fhir_base_url}/ResearchStudy?_elements=identifier&identifier.use=official"

# Make the request
response = requests.get(endpoint)

# Check for successful response
if response.status_code == 200:
    # Parse the JSON response
    data = response.json()

    # Extract identifiers
    identifiers = []
    for entry in data.get('entry', []):
        resource = entry.get('resource', {})
        for identifier in resource.get('identifier', []):
            # add the url RearchStudy to the dataframe
            identifier['url'] = entry.get('fullUrl')
            identifiers.append(identifier)

    # Create a Pandas DataFrame
    print(f"Found {len(identifiers)} ResearchStudy identifiers. Use the 'url' field to retrieve the data.")
    df = pd.DataFrame(identifiers)
    display(df)  # Display the DataFrame
else:
    print(f"Error: Request failed with status code {response.status_code}")

Found 100 ResearchStudy identifiers. Use the 'url' field to retrieve the data.

	system	use	value	url
0	https://cda.readthedocs.io/associated_project	official	upenn_gbm	https://google-fhir.fhir-aggregator.org/Resear...
1	https://cda.readthedocs.io/associated_project	official	victre	https://google-fhir.fhir-aggregator.org/Resear...
2	https://cda.readthedocs.io/system	official	CDA	https://google-fhir.fhir-aggregator.org/Resear...
3	https://cda.readthedocs.io/associated_project	official	vestibular_schwannoma_seg	https://google-fhir.fhir-aggregator.org/Resear...
4	https://cda.readthedocs.io/associated_project	official	tcga_uvm	https://google-fhir.fhir-aggregator.org/Resear...
...	...	...	...	...
95	https://cda.readthedocs.io/associated_project	official	nsclc_radiomics_interobserver1	https://google-fhir.fhir-aggregator.org/Resear...
96	https://cda.readthedocs.io/associated_project	official	nsclc_radiomics	https://google-fhir.fhir-aggregator.org/Resear...
97	https://cda.readthedocs.io/associated_project	official	nlst	https://google-fhir.fhir-aggregator.org/Resear...
98	https://cda.readthedocs.io/associated_project	official	nsclc_radiomics_genomics	https://google-fhir.fhir-aggregator.org/Resear...
99	https://cda.readthedocs.io/associated_project	official	mouse_mammary	https://google-fhir.fhir-aggregator.org/Resear...

100 rows × 4 columns

Explore the notebooks in the sidebar to learn about our command line tool fhir-query and our Vocabulary dataframe.

In [ ]: