Data Download
Imaging data can be downloaded either as a .zip
file containing an imaging series or as a DICOM .dcm
file containing a single acquisition within an imaging series.
Imaging series
Selecting the imaging series
The SeriesInstanceUID is needed to download an imaging series. The below example selects one series from the TCGA-THCA collection.
julia> patient_studies = tcia_studies(collection = "TCGA-THCA")
7×9 DataFrame. Omitted printing of 5 columns
│ Row │ Collection │ PatientID │ PatientName │ PatientSex │
│ │ String │ String │ String │ String │
├─────┼────────────┼──────────────┼──────────────┼────────────┤
│ 1 │ TCGA-THCA │ TCGA-DE-A4MD │ TCGA-DE-A4MD │ M │
│ 2 │ TCGA-THCA │ TCGA-DE-A4MA │ TCGA-DE-A4MA │ F │
│ 3 │ TCGA-THCA │ TCGA-DE-A4MA │ TCGA-DE-A4MA │ F │
│ 4 │ TCGA-THCA │ TCGA-DE-A4MC │ TCGA-DE-A4MC │ F │
│ 5 │ TCGA-THCA │ TCGA-DE-A4MB │ TCGA-DE-A4MB │ F │
│ 6 │ TCGA-THCA │ TCGA-E3-A3DZ │ TCGA-E3-A3DZ │ F │
│ 7 │ TCGA-THCA │ TCGA-E3-A3E5 │ TCGA-E3-A3E5 │ M │
julia> chosen_study = patient_studies.StudyInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.291746741815681058731047886323"
julia> imaging_series = tcia_series(study = chosen_study)
2×16 DataFrame. Omitted printing of 15 columns
│ Row │ PatientID │
│ │ String │
├─────┼──────────────┤
│ 1 │ TCGA-DE-A4MD │
│ 2 │ TCGA-DE-A4MD │
julia> chosen_series = imaging_series.SeriesInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.267009254990923767283017660950"
Downloading the imaging series
Once the SeriesInstanceUID is known, the imaging data can be downloaded as a zip file by:
julia> zip_file = "output_file.zip"; # Can also be a path
julia> tcia_images(series = chosen_series, file = zip_file)
"output_file.zip"
Convenience wrapper
The above steps will only download a zip file which then has to be extracted. This can be cumbersome when downloading multipled series, so the download_series()
function is provided for convenience.
The download_series()
assumes that the unzip
utility is installed on the system. This can be verified by typing unzip
in a terminal or ;unzip
in julia. ```
Downloading a single series
The following will download and extract the chosen_series
(selected above) and extract the images in the current directory ./
.
julia> download_series(chosen_series, "./")
Downloading multiple series
The wrapper function can download multiple series from a Dataframe by
julia> series = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001")
julia> download_series(series, "./testdf")
or from an array of dictionaries by
julia> seriesjs = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001", format="json")
julia> download_series(seriesjs, "./testjs")
Single image
Selecting the single image
To download a single image, both its SeriesInstanceUID and SOPInstanceUID must be known. Continuing from the previous example, if we only wanted to download the first image in chosen_series
, then:
julia> series_sops = tcia_sop(series = chosen_series)
2×1 DataFrame
│ Row │ SOPInstanceUID │
│ │ String │
├─────┼──────────────────────────────────────────────────────────────────┤
│ 1 │ 1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843 │
│ 2 │ 1.3.6.1.4.1.14519.5.2.1.8421.4019.276950195196892303662134388840 │
julia> chosen_sop = series_sops.SOPInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843"
Downloading the single image
Once the SeriesInstanceUID and SOPInstanceUID are known, the dicom file can be downloaded by:
julia> dicom_file = "output_file.dcm";
julia> tcia_single_image(series = chosen_series, sop = chosen_sop, file = dicom_file)
"output_file.dcm"