Data Download

Imaging data can be downloaded either as a .zip file containing an imaging series or as a DICOM .dcm file containing a single acquisition within an imaging series.

Imaging series

Selecting the imaging series

The SeriesInstanceUID is needed to download an imaging series. The below example selects one series from the TCGA-THCA collection.

julia> patient_studies = tcia_studies(collection = "TCGA-THCA")┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset
└ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587
┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset
└ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587
┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset
└ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587
7×16 DataFrame
 Row │ StudyInstanceUID                   StudyDate              StudyDescript ⋯
     │ String                             String31               String        ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…  2004-09-26 00:00:00.0  Outside Read  ⋯
   2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…  2004-05-27 00:00:00.0  Outside Read
   3 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…  2004-06-13 00:00:00.0  PET/CT Tumor
   4 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…  2004-07-21 00:00:00.0  CT NECK SOFT
   5 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…  2004-07-05 00:00:00.0  Outside Read  ⋯
   6 │ 1.3.6.1.4.1.14519.5.2.1.3023.401…  1996-04-14 00:00:00.0  CT NECK W/ CO
   7 │ 1.3.6.1.4.1.14519.5.2.1.3023.401…  1996-03-09 00:00:00.0  CT NECK W/ CO
                                                              14 columns omitted
julia> chosen_study = patient_studies.StudyInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.291746741815681058731047886323"
julia> imaging_series = tcia_series(study = chosen_study)2×20 DataFrame Row │ SeriesInstanceUID StudyInstanceUID M ⋯ │ String String S ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2.1.8421.401… C ⋯ 2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2.1.8421.401… C 18 columns omitted
julia> chosen_series = imaging_series.SeriesInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.267009254990923767283017660950"

Downloading the imaging series

Once the SeriesInstanceUID is known, the imaging data can be downloaded as a zip file by:

julia> zip_file = "output_file.zip"; # Can also be a path
julia> tcia_images(series = chosen_series, file = zip_file)"output_file.zip"

Convenience wrapper

The above steps will only download a zip file which then has to be extracted. This can be cumbersome when downloading multipled series, so the download_series() function is provided for convenience.

Note

The download_series() assumes that the unzip utility is installed on the system. This can be verified by typing unzip in a terminal or ;unzip in julia. ```

Downloading a single series

The following will download and extract the chosen_series (selected above) and extract the images in the current directory ./.

julia> download_series(chosen_series, "./")

Downloading multiple series

The wrapper function can download multiple series from a Dataframe by

julia> series = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001")
julia> download_series(series, "./testdf")

or from an array of dictionaries by

julia> seriesjs = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001", format="json") 
julia> download_series(seriesjs, "./testjs")

Single image

Selecting the single image

To download a single image, both its SeriesInstanceUID and SOPInstanceUID must be known. Continuing from the previous example, if we only wanted to download the first image in chosen_series, then:

julia> series_sops = tcia_sop(series = chosen_series)2×1 DataFrame
 Row │ SOPInstanceUID
     │ String
─────┼───────────────────────────────────
   1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…
   2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…
julia> chosen_sop = series_sops.SOPInstanceUID[1]"1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843"

Downloading the single image

Once the SeriesInstanceUID and SOPInstanceUID are known, the dicom file can be downloaded by:

julia> dicom_file = "output_file.dcm";
julia> tcia_single_image(series = chosen_series, sop = chosen_sop, file = dicom_file)"output_file.dcm"