Subtables: diagnosis[condition]_first
Since the diagnosis table includes repeated diagnoses of the same condition, for some analyses we want to have a table with only the first instance of a diagnosis for any one condition. For example, to look at the time difference between OSA and HTN, we want the date of the first diagnosis of OSA and of HTN. It's easier to have tables with just the diagnosis of interest, so we create "diagnosishypertension_first" and "diagnosisosa_first."
This process turns out to be slow in SQL. Based off http://www.mysqltutorial.org/mysql-min/, here are times for the 2017-02-23 UCRex dataset, using a query like this:
CREATE TABLE diagnosishypertension_first SELECT STUDY_ID, DIAGNOSIS_DATE AS DIAGNOSIS_DATE_FIRST, ENCOUNTER_ID FROM diagnosishypertension a WHERE DIAGNOSIS_DATE = (SELECT MIN(DIAGNOSIS_DATE) FROM diagnosishypertension b WHERE b.STUDY_ID=a.STUDY_ID)
The solution is to read into MATLAB. However, we run out of java heap memory, so we need to 1) increase the meomery, and 2) set the db preferences to read in batches.Â
Now we get a pretty good reduction - 48+hours to 1 minute!