Data Download
Imaging data can be downloaded either as a .zip
file containing an imaging series or as a DICOM .dcm
file containing a single acquisition within an imaging series.
Imaging series
Selecting the imaging series
The SeriesInstanceUID is needed to download an imaging series. The below example selects one series from the TCGA-THCA collection.
julia> patient_studies = tcia_studies(collection = "TCGA-THCA")
┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset └ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587 ┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset └ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587 ┌ Warning: thread = 1 warning: parsed expected 15 columns, but didn't reach end of line around data row: 7. Parsing extra columns and widening final columnset └ @ CSV ~/.julia/packages/CSV/aoJqo/src/file.jl:587 7×16 DataFrame Row │ StudyInstanceUID StudyDate StudyDescript ⋯ │ String String31 String ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2004-09-26 00:00:00.0 Outside Read ⋯ 2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2004-05-27 00:00:00.0 Outside Read 3 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2004-06-13 00:00:00.0 PET/CT Tumor 4 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2004-07-21 00:00:00.0 CT NECK SOFT 5 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2004-07-05 00:00:00.0 Outside Read ⋯ 6 │ 1.3.6.1.4.1.14519.5.2.1.3023.401… 1996-04-14 00:00:00.0 CT NECK W/ CO 7 │ 1.3.6.1.4.1.14519.5.2.1.3023.401… 1996-03-09 00:00:00.0 CT NECK W/ CO 14 columns omitted
julia> chosen_study = patient_studies.StudyInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.291746741815681058731047886323"
julia> imaging_series = tcia_series(study = chosen_study)
2×20 DataFrame Row │ SeriesInstanceUID StudyInstanceUID M ⋯ │ String String S ⋯ ─────┼────────────────────────────────────────────────────────────────────────── 1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2.1.8421.401… C ⋯ 2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 1.3.6.1.4.1.14519.5.2.1.8421.401… C 18 columns omitted
julia> chosen_series = imaging_series.SeriesInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.267009254990923767283017660950"
Downloading the imaging series
Once the SeriesInstanceUID is known, the imaging data can be downloaded as a zip file by:
julia> zip_file = "output_file.zip"; # Can also be a path
julia> tcia_images(series = chosen_series, file = zip_file)
"output_file.zip"
Convenience wrapper
The above steps will only download a zip file which then has to be extracted. This can be cumbersome when downloading multipled series, so the download_series()
function is provided for convenience.
The download_series()
assumes that the unzip
utility is installed on the system. This can be verified by typing unzip
in a terminal or ;unzip
in julia. ```
Downloading a single series
The following will download and extract the chosen_series
(selected above) and extract the images in the current directory ./
.
julia> download_series(chosen_series, "./")
Downloading multiple series
The wrapper function can download multiple series from a Dataframe by
julia> series = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001")
julia> download_series(series, "./testdf")
or from an array of dictionaries by
julia> seriesjs = tcia_series(collection = "AAPM-RT-MAC", patient = "RTMAC-LIVE-001", format="json")
julia> download_series(seriesjs, "./testjs")
Single image
Selecting the single image
To download a single image, both its SeriesInstanceUID and SOPInstanceUID must be known. Continuing from the previous example, if we only wanted to download the first image in chosen_series
, then:
julia> series_sops = tcia_sop(series = chosen_series)
2×1 DataFrame Row │ SOPInstanceUID │ String ─────┼─────────────────────────────────── 1 │ 1.3.6.1.4.1.14519.5.2.1.8421.401… 2 │ 1.3.6.1.4.1.14519.5.2.1.8421.401…
julia> chosen_sop = series_sops.SOPInstanceUID[1]
"1.3.6.1.4.1.14519.5.2.1.8421.4019.244350881260053174818877266843"
Downloading the single image
Once the SeriesInstanceUID and SOPInstanceUID are known, the dicom file can be downloaded by:
julia> dicom_file = "output_file.dcm";
julia> tcia_single_image(series = chosen_series, sop = chosen_sop, file = dicom_file)
"output_file.dcm"