Data Download

Imaging data can be downloaded either as a .zip file containing an imaging series or as a DICOM .dcm file containing a single acquisition within an imaging series.

Imaging series

Selecting the imaging series

The SeriesInstanceUID is needed to download an imaging series. The below example selects one series from the TCGA-THCA collection.

julia> patient_studies = tcia_studies(collection = "TCGA-THCA")
7×9 DataFrame. Omitted printing of 5 columns
│ Row │ Collection │ PatientID    │ PatientName  │ PatientSex │
│     │ String     │ String       │ String       │ String     │
├─────┼────────────┼──────────────┼──────────────┼────────────┤
│ 1   │ TCGA-THCA  │ TCGA-DE-A4MD │ TCGA-DE-A4MD │ M          │
│ 2   │ TCGA-THCA  │ TCGA-DE-A4MA │ TCGA-DE-A4MA │ F          │
│ 3   │ TCGA-THCA  │ TCGA-DE-A4MA │ TCGA-DE-A4MA │ F          │
│ 4   │ TCGA-THCA  │ TCGA-DE-A4MC │ TCGA-DE-A4MC │ F          │
│ 5   │ TCGA-THCA  │ TCGA-DE-A4MB │ TCGA-DE-A4MB │ F          │
│ 6   │ TCGA-THCA  │ TCGA-E3-A3DZ │ TCGA-E3-A3DZ │ F          │
│ 7   │ TCGA-THCA  │ TCGA-E3-A3E5 │ TCGA-E3-A3E5 │ M          │

julia> chosen_study = patient_studies.StudyInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.291746741815681058731047886323"

julia> imaging_series = tcia_series(study = chosen_study)
2×16 DataFrame. Omitted printing of 15 columns
│ Row │ PatientID    │
│     │ String       │
├─────┼──────────────┤
│ 1   │ TCGA-DE-A4MD │
│ 2   │ TCGA-DE-A4MD │

julia> chosen_series = imaging_series.SeriesInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.267009254990923767283017660950"

Downloading the imaging series

Once the SeriesInstanceUID is known, the imaging data can be downloaded as a zip file by:

julia> zip_file = "output_file.zip"; # Can also be a path

julia> tcia_images(series = chosen_series, file = zip_file)
"output_file.zip"

Convenience wrapper

The above steps will only download a zip file which then has to be extracted. This can be cumbersome when downloading multipled series, so the download_series() function is provided for convenience.

Note

The download_series() assumes that the unzip utility is installed on the system. This can be verified by typing unzip in a terminal or ;unzip in julia. ```

Downloading a single series

The following will download and extract the chosen_series (selected above) and extract the images in the current directory ./.

julia> download_series(chosen_series, "./")

Downloading multiple series

The wrapper function can download multiple series from a Dataframe by

julia> series = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001")
julia> download_series(series, "./testdf")

or from an array of dictionaries by

julia> seriesjs = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001", format="json") 
julia> download_series(seriesjs, "./testjs")

Single image

Selecting the single image

To download a single image, both its SeriesInstanceUID and SOPInstanceUID must be known. Continuing from the previous example, if we only wanted to download the first image in chosen_series, then:

julia> series_sops = tcia_sop(series = chosen_series)
2×1 DataFrame
│ Row │ SOPInstanceUID                                                   │
│     │ String                                                           │
├─────┼──────────────────────────────────────────────────────────────────┤
│ 1   │ 1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843 │
│ 2   │ 1.3.6.1.4.1.14519.5.2.1.8421.4019.276950195196892303662134388840 │

julia> chosen_sop = series_sops.SOPInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843"

Downloading the single image

Once the SeriesInstanceUID and SOPInstanceUID are known, the dicom file can be downloaded by:

julia> dicom_file = "output_file.dcm";

julia> tcia_single_image(series = chosen_series, sop = chosen_sop, file = dicom_file)
"output_file.dcm"