1. Cohort & dataset builder

If you need a concept but you don’t know how to do that, you can use the cohort builder to create sample criteria and then preview the resulting Jupyter code (click ‘preview code’) to view the variable names and ranges.

2. Write an optimized query

Avoid using “SELECT*” (see below). This function selects all, which means you use a lot of computing power and it takes a long time to load, i.e. it is not useful and also expensive.

Instead, the OMOP tables to view details of each variable or concept and select those specifically.

If you need variables from different concept tables, you need to join them together.

Finally, add filters whenever you can to reduce the amount of data you use/load.

See examples below.

3. Save data from super long query

Data get deleted after about a week if you don’t save them. Save your data in a bucket and later run that bucket.

You can use snippets to do that, see screenshot below.

If you use snippets, you always have to run the corresponding “Setup” first. Then, select the snippet “List buckets” → then “copy file from workspace bucket”.

5. Restart & run all

6. Save your most recent notebook version

The snippet for saving your notebook version is only available in Python, not in R.

I didn’t quite follow this part.

Notes for my study:

to account for dataset changes/update, run this first:

import os

dataset_name = os.getenv('WORKSPACE_CDR')

dataset_name

To get the date difference: OSA x HTN: condition_start_state

Create different tables for each set of diagnostic criteria and join the separate tables together.

Example on the bottom:

Office hours 05-28-2021: Keeping your workspace organized