Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

1. Cohort & dataset builder

If you need a concept but you don’t know how to do that, you can use the cohort builder to create sample criteria and then preview the resulting Jupyter code (click ‘preview code’) to view the variable names and ranges.

2. Write an optimized query

Avoid using “SELECT*” (see below). This function selects all, which means you use a lot of computing power and it takes a long time to load, i.e. it is not useful and also expensive.

...

Finally, add filters whenever you can to reduce the amount of data you use/load.

See examples below.

...

3. Save data from super long query

Data get deleted after about a week if you don’t save them. Save your data in a bucket and later run that bucket.

...

If you use snippets, you always have to run the corresponding “Setup” first. Then, select the snippet “List buckets” → then “copy file from workspace bucket”.

...

5. Restart & run all

6. Save your most recent notebook version

The snippet for saving your notebook version is only available in Python, not in R.

I didn’t quite follow this part.

Notes for my study:

to account for dataset changes/update, run this first:

import os
dataset_name = os.getenv('WORKSPACE_CDR')
dataset_name

To get the date difference: OSA x HTN: condition_start_state

...