1. Cohort & dataset builder
If you need a concept but you don’t know how to do that, you can use the cohort builder to create sample criteria and then preview the resulting Jupyter code (click ‘preview code’) to view the variable names and ranges.
2. Write an optimized query
Avoid using “SELECT*” (see below). This function selects all, which means you use a lot of computing power and it takes a long time to load, i.e. it is not useful and also expensive.
...
Finally, add filters whenever you can to reduce the amount of data you use/load.
See examples below.
...
3. Save data from super long query
Data get deleted after about a week if you don’t save them. Save your data in a bucket and later run that bucket.
...
If you use snippets, you always have to run the corresponding “Setup” first. Then, select the snippet “List buckets” → then “copy file from workspace bucket”.
...
5. Restart & run all
6. Save your most recent notebook version
The snippet for saving your notebook version is only available in Python, not in R.
I didn’t quite follow this part.
Notes for my study:
to account for dataset changes/update, run this first:
import os
dataset_name = os.getenv('WORKSPACE_CDR')
dataset_name
To get the date difference: OSA x HTN: condition_start_state
...