Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

1. Cohort & dataset builder

If you need a concept but you don’t know how to do that, you can use the cohort builder to create sample criteria and then preview the resulting Jupyter code (click ‘preview code’) to view the variable names and ranges.

2. Write an optimized query

Avoid using “SELECT*” (see below). This function selects all, which means you use a lot of computing power and it takes a long time to load, i.e. it is not useful and also expensive.

Instead, the OMOP tables to view details of each variable or concept and select those specifically.

If you need variables from different concept tables, you need to join them together.

Finally, add filters whenever you can to reduce the amount of data you use/load.

See examples below.

3. Save data from super long query

Data get deleted after about a week if you don’t save them. Save your data in a bucket and later run that bucket.

You can use snippets to do that, see screenshot below.

If you use snippets, you always have to run the corresponding “Setup” first. Then, select the snippet “List buckets” → then “copy file from workspace bucket”.

5. Restart & run all

6. Save your most recent notebook version

The snippet for saving your notebook version is only available in Python, not in R.

I didn’t quite follow this part.

Notes for my study:

to account for dataset changes/update, run this first:

import os
dataset_name = os.getenv('WORKSPACE_CDR')
dataset_name

To get the date difference: OSA x HTN: condition_start_state

  1. Create different tables for each set of diagnostic criteria and join the separate tables together.

Example on the bottom:

  • No labels