Using fhir-query¶
fq (fhir-query): Your FHIR Querying Assistant¶
The fq utility, short for "fhir-query," is a command-line tool specifically designed to simplify the process of interacting with FHIR servers. It provides researchers with a convenient way to:
Retrieve the vocabulary of a FHIR server: With the vocabulary command, fq fetches and summarizes the key data elements (CodeableConcepts and Extensions) used within the FHIR data. This creates a central vocabulary Dataframe that helps researchers identify important data elements and their usage within the server.
Execute queries to retrieve FHIR resources: Researchers can then use fq to execute FHIR queries using a readable syntax. This helps to retrieve and filter data from the FHIR Server based on various search parameters and criteria.
!pip install fhir-aggregator-client==0.1.8 --no-cache-dir --quiet
Verify installation¶
!fq
Usage: fq [OPTIONS] COMMAND [ARGS]... FHIR-Aggregator utilities. Options: --help Show this message and exit. Commands: ls List all the installed GraphDefinitions. run Run GraphDefinition queries. results Work with the results of a GraphDefinition query. vocabulary FHIR-Aggregator's key Resources and CodeSystems.
Overview¶
This notebook leverages FHIR GraphDefinition objects to define and execute graph-based traversals across multiple interconnected FHIR resource graphs. The data retrieved is written to a local SQLite database for persistence and later transformed into analyst-friendly dataframes for analysis using tools like Python’s pandas library.
By using FHIR GraphDefinition, we declaratively define resource relationships and efficiently retrieve data. Once retrieved, the data is stored locally and can be transformed into dataframes for advanced analysis.
The fhir-aggregator-client tool runs an R5 GraphDefinition against a FHIR server
Key Features¶
- GraphDefinition-Driven Traversals: Use GraphDefinition objects to define explicit relationships between resources and automate traversal logic.
- Local SQLite Storage: Persist the retrieved FHIR data in a local SQLite database for querying and offline analysis.
- Analyst-Friendly Dataframes: Convert stored FHIR resources into pandas dataframes for ease of use in analytical workflows.
- Reusable Graph Definitions: Maintain a library of GraphDefinition YAML files that can be reused across different workflows and projects. Researchers and Data submitters can publish GraphDefinition files to help others navigate their data.
List installed GraphDefinition files¶
!fq ls
| id | description | |-----------------------------+-----------------------------------------------------------------------------------------------------------------------------------------| | research-study-link-iterate | (dbGAP) Retrieve ResearchStudy and children. Uses HAPI's deep linking. fhir-query '/ResearchStudy?_id=phs001232' | | patient-survival-graph | (FHIR-Aggregator) Retrieve Patient and Observations [NCIT_C156418,NCIT_C156419]. fhir-query '/ResearchStudy?identifier=TCGA-BRCA' | | research-study-part-of | (FHIR-Aggregator) Retrieve a ResearchStudy and children. Uses part-of-study extension. fhir-query '/ResearchStudy?identifier=TCGA-BRCA' | | condition-graph | (FHIR-Aggregator) Condition to ResearchStudy and children. fhir-query '/Condition?code:text=cholangiocarcinoma' |
Run a GraphDefinition¶
%env FHIR_BASE= https://google-fhir.fhir-aggregator.org
env: FHIR_BASE=https://google-fhir.fhir-aggregator.org
!fq run condition-graph /Condition?code:text=cholangiocarcinoma
condition-graph is valid FHIR R5 GraphDefinition
ℹ Fetching https://google-fhir.fhir-aggregator.org/Condition?code:text=cholangiocarcinoma
ℹ Processing Condition with 739 resources
ℹ Processing 1 links for Condition in parallel. ℹ Processing link: Patient/_id={ref} with 704 Condition(s)
✔ Processed link: Patient/_id={ref} ℹ Processing Patient with 739 resources
ℹ Processing 10 links for Patient in parallel. ℹ Processing link: ResearchSubject/individual={ref}&_include=ResearchSubject:study with 704 Patient(s) ℹ Processing link: Group/member={ref} with 704 Patient(s)
ℹ Processing link: Specimen/subject={ref} with 704 Patient(s) ℹ Processing link: Observation/subject={ref} with 704 Patient(s) ℹ Processing link: Procedure/subject={ref} with 704 Patient(s) ℹ Processing link: DocumentReference/subject={ref}&_count=1000&_total=accurate with 704 Patient(s) ✖ Could not find any resources for Patient->Group link: {'params': 'member={ref}&_count=1000&_total=accurate', 'sourceId': 'Patient', 'targetId': 'Group', 'path': 'Patient.id'} ℹ Processing link: ImagingStudy/subject={ref}&_count=1000&_total=accurate with 704 Patient(s) ℹ Processing link: MedicationAdministration/subject={ref}&_count=1000&_total=accurate with 704 Patient(s)
ℹ Processing link: Encounter/subject={ref}&_count=1000&_total=accurate with 704 Patient(s)
✔ Processed link: ResearchSubject/individual={ref}&_include=ResearchSubject:study
✔ Processed link: Group/member={ref}
✔ Processed link: ImagingStudy/subject={ref}&_count=1000&_total=accurate ✔ Processed link: MedicationAdministration/subject={ref}&_count=1000&_total=accurate
✔ Processed link: Procedure/subject={ref} ✔ Processed link: Observation/subject={ref} ✔ Processed link: Encounter/subject={ref}&_count=1000&_total=accurate
✔ Processed link: DocumentReference/subject={ref}&_count=1000&_total=accurate
✔ Processed link: Specimen/subject={ref} ℹ Processing ResearchSubject with 704 resources
ℹ Processing Specimen with 756 resources ℹ Processing ResearchStudy with 3146 resources ℹ Processing 1 links for ResearchSubject in parallel. ℹ Processing 1 links for Specimen in parallel. ℹ Processing 1 links for ResearchStudy in parallel. ✔ Processed link: ResearchStudy/
ℹ Processing link: ServiceRequest/specimen={ref} with 3146 Specimen(s) ℹ Processing link: DocumentReference/subject={ref}&_count=1000&_total=accurate with 35 ResearchStudy(s)
✔ Processed link: DocumentReference/subject={ref}&_count=1000&_total=accurate
✔ Processed link: ServiceRequest/specimen={ref}
Aggregated Results: {'Condition': 739, 'DocumentReference': 1021, 'Group': 37, 'ImagingStudy': 147, 'MedicationAdministration': 15, 'Patient': 704, 'ResearchStudy': 35, 'ResearchSubject': 756, 'ServiceRequest': 2233, 'Specimen': 3146}
database available at: /home/docs/.fhir-aggregator/fhir-graph.sqlite
Analyse Results¶
The graph represents relationships between different FHIR resources.Examples of FHIR resources include Patient, Condition, Observation, Procedure, etc.
Each node is labled as: <resource_type>/<count> the number of records of that type retrieved.
The edges in the graph are weighted. The thicker the line, the more connections there are between nodes.
# Create a graph of the results
!fq results visualize
Wrote: fhir-graph.html
# Read the locally stored HTML file containing a graph visualization and displaying it within the Jupyter notebook.
from IPython.display import HTML
with open('fhir-graph.html', 'r') as file:
html_content = file.read()
# Set the display height (in pixels)
display(HTML("<div style='height: 800px;'>{}</div>".format(html_content)))
Create a dataframe of results¶
!fq results dataframe
Saved fhir-graph.tsv
import pandas as pd
df = pd.read_csv('fhir-graph.tsv', sep='\t')
df
| specimen_identifier | specimen_id | specimen_type | specimen_part_of_study | patient_identifier | patient_subject_id | patient_subject_alias | patient_id | patient_part_of_study | patient_us_core_birthsex | ... | servicerequest_identifier | servicerequest_id | servicerequest_intent | servicerequest_status | servicerequest_category | servicerequest_part_of_study | patient_deceasedBoolean | patient_patient_extensions_patient_age | patient_name-list | patient_cell_line | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | FM-AD.AD4002.AD4002_sample | 8aa76feb-ad0d-5e45-80f0-2cd5cc791792 | Tumor | ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb9... | AD4002 | FM.AD4002 | 68247.0 | d2c5d64a-0fb0-525c-a485-64e7a4868eaf | ResearchStudy/5a73fd32-241a-5771-ba32-0d0ba78e... | F | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | FM-AD.AD4002.AD4002_slide | c50c876e-00b1-5748-a634-53eeace773e9 | Tumor | ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb9... | AD4002 | FM.AD4002 | 68247.0 | d2c5d64a-0fb0-525c-a485-64e7a4868eaf | ResearchStudy/5a73fd32-241a-5771-ba32-0d0ba78e... | F | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | FM-AD.AD4002.AD4002_aliquot | 2cf5965a-0e05-5b32-bae3-efe5d06faddd | Tumor | ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb9... | AD4002 | FM.AD4002 | 68247.0 | d2c5d64a-0fb0-525c-a485-64e7a4868eaf | ResearchStudy/5a73fd32-241a-5771-ba32-0d0ba78e... | F | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | FM-AD.AD4002.93a05931-a3a1-534b-b63a-fc73c5b09232 | 37ed0c2e-4884-538d-bb82-db2a8d5a3538 | Tumor | ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb9... | AD4002 | FM.AD4002 | 68247.0 | d2c5d64a-0fb0-525c-a485-64e7a4868eaf | ResearchStudy/5a73fd32-241a-5771-ba32-0d0ba78e... | F | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | FM-AD.AD4002.897330f0-52b3-53b0-ab1d-74988b88e658 | 3b8e86cd-1759-5929-a40a-85631276e594 | Tumor | ResearchStudy/b86ee080-2f2f-54c6-b6a8-c1674bb9... | AD4002 | FM.AD4002 | 68247.0 | d2c5d64a-0fb0-525c-a485-64e7a4868eaf | ResearchStudy/5a73fd32-241a-5771-ba32-0d0ba78e... | F | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3141 | efc5351c-aecb-4181-9f27-5733d80372b4 | 20ca4a8b-a436-57ae-8715-18184f90ece5 | Metastatic | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | AD2769 | NaN | NaN | bc498d7b-b6dd-53c0-bc00-e12d4bf2249f | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | M | ... | NaN | 78c2d4fb-2bdd-5759-9008-acdf058585d1 | order | completed | Laboratory procedure | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | NaN | NaN | NaN | NaN |
| 3142 | 629e3379-1ac4-4c80-a373-929709882f70 | 922be56c-cf44-55fc-9229-ca181d95e1a2 | Metastatic | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | AD2769 | NaN | NaN | bc498d7b-b6dd-53c0-bc00-e12d4bf2249f | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | M | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3143 | cb5869b1-1953-5e91-8187-a87c92f4ee24 | a46af6d6-97c4-5b57-a2b3-27c0c5fc4938 | Metastatic | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | AD2769 | NaN | NaN | bc498d7b-b6dd-53c0-bc00-e12d4bf2249f | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | M | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3144 | 6fe69ae0-4b4e-5556-993a-d9d65d035ce0 | 7ad10db2-dc0d-5b05-9988-f46086b94b6d | Metastatic | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | AD2769 | NaN | NaN | bc498d7b-b6dd-53c0-bc00-e12d4bf2249f | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | M | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3145 | 6bda56a8-0a45-5c06-bfc5-443a60d97d58 | 3a91ad18-82fa-5448-ab16-7736b3bb87de | Metastatic | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | AD2769 | NaN | NaN | bc498d7b-b6dd-53c0-bc00-e12d4bf2249f | ResearchStudy/febbaeba-c56e-5d4f-a0ba-96965793... | M | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3146 rows × 29 columns
Other servers¶
You can use the fq tool with other FHIR servers. For example, this query retrieves a study from dbGAP.
# delete the previous results, start with a fresh database
!rm ~/.fhir-aggregator/fhir-graph.sqlite
!fq run --fhir-base-url https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1 research-study-link-iterate '/ResearchStudy?_id=phs001232'
research-study-link-iterate is valid FHIR R5 GraphDefinition
ℹ Fetching https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchStudy?_id=phs001232
ℹ Processing ResearchStudy with 1 resources ℹ Processing 1 links for ResearchStudy in parallel. ℹ Processing link: Patient/_has:ResearchSubject:individual:study={ref}&_revinclude=Group:member&_revinclude=ResearchSubject:individual&_revinclude=Specimen:subject&_revinclude=Observation:subject&_revinclude=DocumentReference:subject&_count=1000&_total=accurate with 1 ResearchStudy(s)
✔ Processed link: Patient/_has:ResearchSubject:individual:study={ref}&_revinclude=Group:member&_revinclude=ResearchSubject:individual&_revinclude=Specimen:subject&_revinclude=Observation:subject&_revinclude=DocumentReference:subject&_count=1000&_total=accurate
ℹ Processing Patient with 1 resources
ℹ Processing Specimen with 5837 resources
ℹ Processing 1 links for Patient in parallel. ℹ Processing 3 links for Specimen in parallel. ✔ Processed link: Specimen/
ℹ Processing link: DocumentReference/subject={ref}&_count=1000&_total=accurate with 6413 Specimen(s) ✖ Could not find any resources for Specimen->DocumentReference link: {'params': 'relatesto={ref}&_count=1000&_total=accurate', 'path': 'Specimen.id', 'sourceId': 'Specimen', 'targetId': 'DocumentReference'}
ℹ Processing link: Observation/subject={ref}&_count=1000&_total=accurate with 6413 Specimen(s)
✔ Processed link: Observation/subject={ref}&_count=1000&_total=accurate
✔ Processed link: DocumentReference/subject={ref}&_count=1000&_total=accurate
Aggregated Results: {'DocumentReference': 7448, 'Group': 1614, 'Observation': 25690, 'Patient': 5837, 'ResearchStudy': 1, 'ResearchSubject': 5837, 'Specimen': 6413}
database available at: /home/docs/.fhir-aggregator/fhir-graph.sqlite
# use the same commands to analyse results
!fq results visualize
Wrote: fhir-graph.html
# create a graph of the results
from IPython.display import HTML
with open('fhir-graph.html', 'r') as file:
html_content = file.read()
# Set the display height (in pixels)
display(HTML("<div style='height: 800px;'>{}</div>".format(html_content)))
# create a dataframe of results
!fq results dataframe
Saved fhir-graph.tsv
import pandas as pd
df = pd.read_csv('fhir-graph.tsv', sep='\t')
df
| specimen_identifier | specimen_phs001232-SampleIdentifier | specimen_id | specimen_type | patient_identifier | patient_id | patient_active | patient_gender | patient_meta | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | SAME123424 | f9964f35-00a8-4f0c-8ba2-6f848ea0fb0f | dgs-5063489 | Fibroblasts | NaN | 1826993 | True | male | phs001232 |
| 1 | SAME123417 | 6ba8fe4f-693d-40c9-8c69-981cc7b6ba87 | dgs-2211200 | Blood | NaN | 1826993 | True | male | phs001232 |
| 2 | SAME1839057 | 084ecc87-1588-4bb3-94d7-33942e3f0561 | dgs-4274478 | Blood | NaN | 1827029 | True | male | phs001232 |
| 3 | SAME1839246 | 08fcbd02-5729-4c9a-9a63-4ce48c400fe4 | dgs-2211148 | Blood | NaN | 1827029 | True | male | phs001232 |
| 4 | SAME124161 | db37d721-4b6a-4de7-8e37-38d3a00f9aca | dgs-2211247 | Blood | NaN | 1827062 | True | male | phs001232 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 6408 | SAME122976 | a5679b33-ff69-48e7-9f82-6bbd0a9b4049 | dgs-5063129 | Blood | NaN | 4296843 | True | male | phs001232 |
| 6409 | SAME1839098 | ced2fb56-6535-4e52-b755-dd16263925f1 | dgs-5063313 | Blood | NaN | 4296875 | True | female | phs001232 |
| 6410 | SAME1839147 | e5ede4ac-e8a3-447d-a654-0c6319af9c5c | dgs-5063407 | Blood | NaN | 4296515 | True | male | phs001232 |
| 6411 | SAME123105 | f8240565-ff65-4742-a299-99c01bab31ab | dgs-5063482 | Blood | NaN | 4297305 | True | male | phs001232 |
| 6412 | SAME123917 | 98af2a6c-7f17-4612-832d-05d1d104e1ea | dgs-5063082 | Blood | NaN | 4296730 | True | female | phs001232 |
6413 rows × 9 columns