Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Journal of the American Society of Nephrology Publish Ahead of Print 

DOI: 10.1681/ASN.0000000904 

 
Automatically Measuring Kidney, Liver, and Cyst Volumes in Autosomal 

Dominant Polycystic Kidney Disease 

 
Qing Xiong
1
; Xinzi He

2
; Elisa Scalco

3
; Siria Pasini

4
; Chenglin Zhu

1
; Mina C. Moghadam

2
; 

Usama Sattar
1
; Vahid Davoudi

1
; Vahid Bazojoo

1
; Hreedi Dev

1
; Mengjun Shen

1
; Zhongxiu Hu

1
; 

Sophie Shih
1
; Serena J. Prince

1
; Jon D. Blumenfeld

5,6
; Robert J. Min

1
; James M. Chevalier

5,6
; 

Daniil Shimonov
5,6

; Rebecca J. Lepping
7
; Alan S.L. Yu

8,9
; Mert R. Sabuncu

1
; Anna Caroli

4
; 

Martin R. Prince
1
 

1
Department of Radiology, Weill Cornell Medicine, New York City, New York, U.S.A.; 

2
Biomedical Engineering, Cornell University, Ithaca, New York, U.S.A.; 

3
Institute of 

Biomedical Technologies, Italian National Research Council (ITB-CNR), Segrate (MI), Italy; 
4
Bioengineering Department, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Ranica 

(BG), Italy; 
5
Department of Medicine, Weill Cornell Medicine, New York City, New York, 

U.S.A.,
6
Rogosin Institute, New York, New York, USA; 

7
Department of Neurology, University 

of Kansas Medical Center, Kansas City, Kansas, U.S.A.; 
8
Department of Internal Medicine, 

Division of Nephrology and Hypertension, University of Kansas Medical Center, Kansas City, 

Kansas, U.S.A.; 
9
Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas 

City, Kansas, U.S.A.
 

Address Correspondence to Prof. Martin R. Prince, 416 East 55th Street, New York, NY 10022, 

map2008@med.cornell.edu 

 
M.R.S., A.C., and M.R.P. contributed equally to this work. 

 
This is an open access article distributed under the terms of the Creative Commons Attribution-

NonCommercial-NoDerivatives License 4.0 (CC-BY-NC-ND), which permits downloading and 

sharing the work provided it is properly cited. The work cannot be changed in any way or 

commercially without permission from the journal.  

ACCEPTED

mailto:map2008@med.cornell.edu


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Abstract  

Background Kidney, liver and cyst volumes are important for diagnosis, classification and 

management of autosomal dominant polycystic kidney disease (ADPKD) but challenging to 

measure accurately and reproducibly. Here, we develop a web-based deep learning platform to 

automatically and robustly measure kidneys, liver and cyst volumes in ADPKD. 

Methods MRI and CT scans from ADPKD patients (n=611) and participants without ADPKD 

(n=109) were used to train a 3D hybrid model combining U-Net and transformer elements for 

segmenting kidneys, liver and cysts. The model is implemented as a web-based calculator at 

www.traceorg.com, providing segmentation labels, volumes and Mayo Clinic Image 

Classification (MIC). Automatic browser anonymization of DICOM images ensures privacy. 

Internal validation was conducted on 70 MRIs for kidney and liver segmentations, 46 MRIs for 

cyst segmentations and performance was compared to 5 open access segmentation models 

(TotalSegmentator, MR Annotator, Kim, Woznicki and Gregory-Kline). External validation was 

performed on one single-center dataset (n=58), one multicenter dataset (n=73), CRISP2 (n=30) 

and PKD-RRC (n=115) MRIs with T2-weighted and T1-weighted images.  

Results After training on 720 participants (mean age=48±15, eGFR=74±32 ml/min/1.73m
2
 and 

htTKV=826±772ml/m), TraceOrg internal validation performance achieved high mean Dice 

scores of 0.97 (kidneys), 0.97 (liver), 0.93 (kidney cysts) and 0.82 (liver cysts) outperforming 

existing models for ADPKD. External validation showed strong performance with Dice scores of 

0.92-0.94 (kidney), 0.87-0.96 (liver), 0.85 (kidney cysts) and 0.76-0.90 (liver cysts) for the 

single-center and 0.95 (kidney), 0.81 (kidney cysts) for the multicenter dataset. Compared to 

CRISP volumes measured by stereology, mean absolute percent difference was 5.3% (kidneys, 

n=30), 11% (kidney cysts, n=30) and 5.5% (liver, n=22). Compared to PKD-RRC (n=115), mean 

absolute percent difference in TKV was 4.9%. 

Conclusions TraceOrg is a publicly available web-based tool that automatically measures kidney, 

liver and cyst volumes from abdominal MRI in ADPKD with high accuracy compared to manual 

segmentations. 

 
Supplemental Digital Content --- http://links.lww.com/JSN/F518  

ACCEPTED

http://links.lww.com/JSN/F518


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Introduction 

Autosomal dominant polycystic kidney disease (ADPKD) is the most common inherited kidney 

disease affecting an estimated 1:1000 live births
1
. Patients with ADPKD develop cysts in the 

kidneys and liver, causing these organs to enlarge
2,3

. Total kidney volume (TKV), the sum of 

right and left kidney volumes, is a validated biomarker that is routinely used for tracking 

ADPKD progression and predicting the onset of end stage kidney disease
4-7

. TKV was used as a 

primary endpoint in clinical trials of tolvaptan for evaluating therapeutic efficacy. Tolvaptan is 

currently the only Food and Drug Administration (FDA)-approved pharmacologic treatment for 

ADPKD, specifically for patients at risk of rapid disease progression
8-10

. The Mayo Clinic Image 

Classification (MIC), using age and height-adjusted total kidney volume (Ht-TKV), is a standard 

predictor of disease severity in ADPKD and is increasingly being used in the US to determine 

eligibility for tolvaptan therapy
11

. Consequently, accurate and reproducible measurement of TKV 

is essential for clinical management and investigational studies.  

Traditional methods, to estimate kidney volume (e.g. modified ellipsoid formula using length, 

width and depth), have high inter-reader variability (7–15%)
12,13

. Manual contouring (3–6.7% 

variability) of the kidneys is more accurate than the ellipsoid formula, and has greater precision 

needed for longitudinal follow-up
14

 but is tedious, time consuming and subject to operator 

variability. Deep learning models address these limitations by automating kidney segmentation 

on multiple imaging sequences, further improving reproducibility and reducing reader burden
14-18

. 

Recent advances have brought deep learning performance close to that of expert manual 

contouring, significantly reducing the time required for accurate segmentation
19

.  

Nevertheless, there are important limitations of TKV in the clinical assessment of ADPKD. For 

example, TKV provides only limited information in the early stages of the disease when cyst 

volume is less than TKV measurement noise
7,20

. Measuring cyst volumes has potential to 

improve the accuracy of ADPKD prognosis
21,22

. Furthermore, there are extrarenal manifestations 

of ADPKD, including liver cyst growth, which are associated with significant morbidity and 

impairment of quality of life
23-25

. Therefore, comprehensive assessment of the overall ADPKD 

phenotype requires readily accessible imaging biomarkers that can be measured efficiently, 

accurately and scaled to incorporate large populations of ADPKD patients in clinical and 

investigational settings. 

This study introduces a 3D deep learning model that segments kidneys, liver, and cysts, on 

multiple MRI pulse sequences. The tool is accessible via an online calculator (traceorg.com) 

supporting widespread use.  

  
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Methods 

Training and Validation Datasets 

Training and Internal Validation Dataset 

The TraceOrg training/internal validation dataset consisted of all available abdominal CT scans 

and MR series with ground truth labels available, including T2-weighted, T1-weighted, Steady 

State Free Precession (SSFP), DWI and contrast enhanced images, see Figure 1. From this 

dataset, all images from 70 participants were held out from training to use for internal validation 

on kidney and liver segmentations. For cyst segmentation, training data were derived from 

contrast-enhanced CT and MRI series including T2-weighted, T1-weighted (primarily for 

hemorrhagic/proteinaceous cysts), and SSFP holding out all images on 46 participants for 

internal validation.   

External Validation Datasets 

Two datasets from Italy were used for external validation. The first dataset (n=58) was acquired 

from a single-center (Bergamo) and included MRI data acquired in the context of two 

prospective, longitudinal, multi-center, completed clinical studies
26,27

. All MRIs in this dataset 

were acquired between June 2014 and November 2020, including coronal T2-weighted fat 

saturated, out-of-phase coronal T1-weighted spoiled gradient echo and SSFP sequences. For 

liver cysts segmentation performance, we excluded 7 participants with no liver cysts. 

The second external dataset (n=73) was a multicenter cohort from a completed clinical trial 

involving 6 Italian medical centers with T2-weighted fat saturated scans acquired between May 

2006 and May 2008, incorporating a variety of image resolutions, scanner types, and acquisition 

protocols
28

. For both single and multicenter cohorts, only baseline MRI scans were utilized for 

external validation so there was only one protocol MRI per participant. 

CRISP 2 Dataset 

Further external validation was performed comparing TraceOrg organ and cyst volume 

measurements to the stereology volume measurements in the Consortium for Radiologic Imaging 

Studies of Polycystic Kidney Disease (CRISP) dataset
29,30

. For this dataset, ADPKD participants 

were scanned 6 times over 8 years with organ volume measurements on the first 4 scans, CRISP 

1, performed by gadolinium-enhanced T1-weighted MRI and on non-contrast MRI for the later 

CRISP 2 scans 5 and 6, similar to our current approach. Thus, we compared the performance of 

TraceOrg volume measurements to those derived by stereology from the non-contrast MRI scans 

in CRISP 2, which were acquired between 2006 and 2008. In most CRISP 2 participants, 

multiple breath hold coronal T2-weighted acquisitions were utilized to cover the entire kidneys 

and liver, creating a challenge to compose the acquisition without corrupting organ volumes. To 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

avoid this confounding issue, we selected only cases where the entirety of both kidneys was 

acquired in a single acquisition. For each CRISP 2 participant meeting this criterion (n=30), the 

absolute percent difference in kidney, and kidney cyst volume between the CRISP stereology 

measurement and TraceOrg measurement was calculated. We performed a similar analysis on 

liver volumes (n=22). For volumetric accuracy assessment, we performed three analysis because 

CRISP 2 volumes were derived from T1-weighted imaging: (1) T1-only analysis using those 

sequences, (2) T2-only analysis, and (3) all-sequence analysis. For the all-sequence analysis, we 

averaged all available sequences for each participant (excluding outlier values >10% different 

from the median) to produce a single participant-level volume for comparison with the 

stereology reference volumes. 

PKD-RRC Dataset 

External validation with more recently acquired scans from the Polycystic Kidney Disease-

Research Resource Consortium (PKD-RRC) was performed with images acquired at the 

University of Kansas Medical Center, University of Chicago Medical Center and the Children’s 

Mercy Hospital Kansas City from 2016 to 2025 (n=115).  For this dataset, participants are still 

being recruited and scanned biennially by non-contrast MRI including coronal T2, T2 fat 

saturated, T1 and SSFP sequences.  For some participants, additional axial and sagittal T2 fat 

saturated and non-contrast MRA sequences were acquired. Because PKD-RRC reference 

volumes were obtained from T2-weighted fat saturated imaging, we separately performed a T2-

only analysis using those sequences, a T1-only analysis and an all-sequence average analysis 

analogous to the methodology described for CRISP 2. 

Ethics Approval 

This study adheres to the Declaration of Helsinki. Permission for data reuse was obtained from 

the local ethics committee Lombardy 6 (Reg. N.2024-3.11/486) for both single-center and 

multicenter Italian datasets, as well as for the CRISP data. Internal validation was conducted 

under Weill Cornell IRB approval #1610017623.  Ethics approval for PKD-RRC imaging and 

analysis was provided by the University of Kansas Medical Center with reliance by the 

University of Chicago Medical Center and the Children’s Mercy Hospital Kansas City, approval 

#STUDY00146013. 

Data Annotation Protocol 

Training Dataset Annotation 

Labeling utilized an iterative model-assisted workflow to streamline the annotation process and 

enhance annotation accuracy. A previously described 3D nnUnet model
19

 was utilized to 

generate initial segmentation predictions for target structures, including kidneys, liver, and their 

related cysts. These predictions served as a foundation for manual refinement by trained research 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

assistants using ITK-SNAP (www.itksnap.org version 3.8.0). Manual refinement tasks were 

randomly assigned among the annotators, who labeled only those MRI sequences (T1-weighted, 

T2-weighted, and SSFP) and CT scans containing kidneys and liver completely within the field 

of view. Labeling was supervised by a board-certified radiologist with >30 years of experience 

(MRP), who checked annotation accuracy for every case. This two-tiered process ensured a high 

standard of annotation while maintaining efficiency. The segmentation labels were stored in a 

standardized format, Neuroimaging Informatics Technology Initiative (NIfTI) to ensure 

compatibility with downstream deep learning pipelines. 

Internal Validation Dataset Annotation 

We used the same semi-automatic approach as the training data annotation approach where the 

TraceOrg model segmentations were manually refined by expert reviewers, serving as the 

ground truth labels for evaluating model performance. Annotations for kidney and liver were 

refined by 4 physician experts (VB, VD, US, MS) and 2 novices (SS, SP) for all available 

complete sequences. Annotations for kidney and liver cysts were manually refined by 2 

physician experts (VB, VD).  

External Validation Dataset Annotation 

External validations on kidney, liver, and kidney cysts were performed using all available images 

with existing ground truth data, thereby ensuring the ground truth was created blinded to model 

outputs and there was no selection bias. For the single-center dataset, manual segmentation was 

performed on the MR images where delineation was most suitable for each structure. 

Specifically, kidney and liver annotations were performed manually on thin-slice T1-weighted 

images, whilst kidney cysts delineation was made on T2-weighted images. Binary masks were 

then transformed onto the other sequences to create the ground truth for comprehensive 

assessment. For liver cysts, we used the semi-automatic approach; TraceOrg model labels were 

refined by experts from the external center to create the ground truth. For the multicenter dataset, 

the ground truth was performed directly on T2-weighted sequences with varying slice 

thicknesses across the different centers for kidney. Ground truth for kidney cysts was obtained 

by a previously published semi-automatic method
28

. For the multicenter dataset, no ground truth 

was available for liver and liver cysts since the liver was incompletely imaged. 

CRISP 2 Dataset Annotation 

The CRISP dataset employed stereology for precise volume measurements for kidneys on T1-

weighted images, and cysts on T2-weighted images
29

. Trained raters placed point grids over 

orthogonal views of the organs and cysts and manually counted points falling within target 

structures. Volume calculation incorporated point counts, grid size, and slice thickness. 

  
ACCEPTED

http://www.itksnap.org/


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

PKD-RRC Dataset Annotation 

For all scans, the right and left kidney volumes were measured at the U of Kansas Medical 

Center by manual planimetry labeling right and left kidneys on every slice of the coronal T2 fat 

saturated scans.  In participants who had more than one coronal T2 fat saturated scan (usually 

because the first scan was corrupted by respiratory motion), the measurement was made on the 

best quality scan determined by visual inspection of artifacts. To improve inter-rater reliability, 

manual planimetry was automatically translated into stereology estimates using a 100 mm
2
 

square grid overlaid on the images and points inside the tracing boundary being automatically 

labeled. Total volume via stereology was estimated as (#points*100*slice thickness)/1000.  

Reliability of stereology volume estimates between raters was assessed with intraclass 

correlation. Training of new raters was continued until ICC exceeded 0.9 indicating excellent 

agreement
31

.  

TraceOrg Model 

The TraceOrg model builds upon our previously published work
19

 but extends its capabilities 

significantly to improve organ segmentations and include cyst segmentations. It is a hybrid 3D 

U-Net transformer architecture designed for high-accuracy segmentation of abdominal organs 

and cysts
19

. The model integrates 3D convolutional neural networks with transformer-based 

components to capture both local and global contexts, improving segmentation performance 

across diverse imaging modalities and patient populations (Figure 2 and Supplemental Figure 1). 

Specifically, the U-Net architecture is used for detailed feature extraction, while the transformer 

component enhances the model's ability to understand complex spatial relationships. Model 

training details are described in Supplemental Material.  

Comparison to Existing Publicly Available Open-Source Models 

The PubMed database was searched for all manuscripts reporting on models for kidney or cyst 

segmentations in ADPKD using the terms “ADPKD” and “deep learning.” TraceOrg 

performance was compared to all publicly available models identified. Three ADPKD-specific 

models were found: Kim
7
 model, Woznicki

32
 model and Gregory-Kline model

21
. For the Kim

7
 

and Woznicki
32

 models, only T2-weighted images were used, as those models were originally 

trained exclusively on T2 images. Kim model outputs included exophytic cysts, which were 

reassigned to the nearest kidney label using a majority-vote approach; in cases of ties or absent 

neighboring labels, the nearest label based on minimum distance was assigned. The Gregory-

Kline model
21

 was evaluated only on coronal T2 images, as this was the sole modality supported. 

In addition, TraceOrg performance was compared to two whole body segmentation models, 

TotalSegmentator
33

 and MRAnnotator
34

, which are publicly available for use on MR images 

although not specifically optimized for ADPKD. For the internal test set and the external test sets 

in which ground truth segmentations were available, the performance of each publicly available 

model was compared to TraceOrg by calculating Dice similarity coefficient and Hausdorff 

Distance. 

  
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Web Interface Implementation 

TraceOrg is implemented as a web-based calculator (www.traceorg.com) that offers two 

deployment options. For imaging expert users including radiologists and nephrologists who 

prefer localized solutions, the model can be deployed directly on their local systems. This 

enables offline use without relying on internet connectivity by downloading the code with 

checkpoints. Alternatively, users can leverage the web-based interface, which requires no 

additional hardware or programming knowledge. 

The web interface allows users to upload DICOM images for segmentation and automatically 

receive kidney, liver, and cyst volumes as well as the MIC (Figure 3). Patient privacy is 

protected through in-browser data anonymization occurring on the user computer prior to image 

uploading, ensuring compliance with privacy standards including HIPAA and related privacy 

regulations. A report is generated providing users with snapshots of the organ and cyst 

segmentations to allow quick verification of their accuracy as well as label volumes and MIC. 

Segmentation labels are provided in a compressed and anonymized (NIfTI) format and any 

model errors can be refined manually, thereby achieving performance comparable to manual 

contouring but in a fraction of the time. 

Statistical Analysis 

For the evaluation of segmentation accuracy, the Dice similarity coefficient, ASSD, Hausdorff 

Distance and Jaccard index were calculated for each segmented structure on both internal and 

external validation datasets. These metrics were used to assess the overlap and boundary 

precision of the model's predictions compared to expert manual annotations. TraceOrg model 

performance on the internal and external validation datasets was further compared to five open 

source models which are publicly available: TotalSegmentator
33

, MRAnnotator
34

, Kim
7
, 

Woznicki
32

 and the kidney cyst segmentation model proposed by Gregory and Kline et al.
20

 
using a paired t-test, with a significance level set at p < 0.05 on calculation of Dice, Hausdorff 

Distance and ASSD. Liver cyst segmentation comparison was excluded, as there was no baseline 

tool available to support this task. For CRISP 2 and PKD-RRC datasets, volumetric accuracy 

was assessed by calculating the mean absolute percent differences between TraceOrg predicted 

volumes and CRISP2 stereology/PKD-RRC manual contouring derived reference volumes. This 

assessment was performed separately for T1-only, T2-only, and all-sequence average volumes as 

described above. 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Results 

The study included diverse cohorts across training dataset (n=720 for kidneys/liver, n=395 for 

cysts), internal validation (n=70 for kidneys/liver, n=46 for cysts), external validation datasets 

(single-center n=58, multi-center n=73), CRISP 2 dataset (n=30) and PKD-RRC dataset (n=115), 

see Figure 1. Demographic data, body habitus data, estimated glomerular filtration rate (eGFR – 

for training data, internal validation data, single-center and PKD-RRC datasets), measured 

glomerular filtration rate (for CRISP and the multi-center datasets), TKV and total cyst volume 

are summarized in Table 1. Mean age for the training cohort was 48 years but for testing cohorts 

ranged from 22 to 53 years indicating validations were being performed over a broad age range. 

Mean body mass index (BMI) ranged from 24 to 27 kg/m² and mean ht-TKV ranged from 432 to 

1176 mL/m, demonstrating the model's validation across a broad range of disease severity. 

TraceOrg Model Performance 

Internal Validation 

TraceOrg demonstrated excellent performance on the internal validation dataset when compared 

to expert manual segmentations. For kidney segmentation, Table 2a, the model achieved an 

average Dice similarity coefficient of 0.97, ASSD of 0.91 mm, Hausdorff distance of 28 mm, 

and Jaccard index of 0.95 with good agreement across six independent observers (Supplemental 

Table 1). Similarly, for liver segmentation (Table 2a), the model achieved a Dice of 0.97, ASSD 

of 0.81 mm, Hausdorff distance of 26 mm, and Jaccard index of 0.95. For both liver and kidney 

segmentations the performances were similar across 5 different MRI pulse sequences. 

For kidney cysts (Table 2b), TraceOrg achieved a Dice coefficient of 0.93, ASSD of 0.48 mm, 

Hausdorff distance of 17 mm, and Jaccard index of 0.88 when averaged across two observers. 

Liver cyst segmentation showed a Dice of 0.86, ASSD of 2.6 mm, Hausdorff distance of 37 mm, 

and Jaccard index of 0.79. 

Notably, the model's performance metrics were comparable to inter-observer variability for all 

structures, indicating that TraceOrg achieves expert level accuracy (Supplemental Table 1, Table 

2b). 

External Validation 

Single-center Dataset 

On the external single-center dataset, TraceOrg maintained robust performance across different 

MR sequences (Table 3, Figure 4). The model maintained an average Dice similarity coefficient 

of 0.94 for kidney labels on T1-weighted images. Projecting the T1-weighted mask onto the T2-

weighted and SSFP images had a Dice of 0.93 and 0.92. For the liver labels, the T1-weighted 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Dice was 0.96 and SSFP was 0.92. Projecting the T1-weighted mask onto the T2-fat saturated 

images gave a lower Dice of 0.87 reflecting a greater challenge of accurately labeling the dark 

liver against a dark fat suppressed background.  

For kidney cysts the Dice was 0.85. For liver cysts the Dice was 0.79. However, for participants 

with polycystic liver disease (> 20 cysts), the Dice was 0.90 in participants with a larger cyst 

burden, >50ml
30

, and 0.76 in participants with smaller liver cyst burden. Participants with liver 

cyst volumes <50ml often had many cysts <1cm diameter, which are more challenging to 

segment and more readily confused with hepatic vessels or bile ducts.  

Multicenter Dataset 

On the multi-center dataset, TraceOrg maintained high performance with a Dice of 0.95 for 

kidneys and 0.81 for kidney cysts on T2-weighted images (Table 3, Figure 5).  

CRISP 2 Validation 

When validated against the CRISP 2 dataset (n=30), which used stereology as the reference 

standard, TraceOrg demonstrated good volumetric accuracy (Table 4). The mean absolute 

percent difference was 5.3% for TraceOrg kidney volumes on T1-weighted images, 8.7% for 

kidney volumes on T2 images and 6.0% when comparing the average of all MRI sequences to 

the stereology ground truth. The absolute percent difference was 11% for kidney cyst volumes. 

The liver was entirely within the field of view in 22 of those 30 participants for T1-weighted 

(n=5) and T2-weighted (n=17) sequences. The mean absolute percent difference for liver volume 

was 5.5%. Example comparisons are shown in Figure 5 highlighting both cases with high 

agreement and edge cases where discrepancies may stem from limitations of the stereology 

reference method. 

PKD-RRC Validation 

Validation against the more recent PKD-RRC dataset from University of Kansas Medical Center 

(n=115), Table 4, also showed excellent agreement between the TraceOrg volume measurements 

and the ground truth University of Kansas planimetry volume measurements. For the coronal T2 

fat saturated images on which the ground truth measurements were performed, the absolute 

percent difference between TraceOrg and PKD-RRC was 6.1%.  A larger error of 11% was 

observed when comparing TraceOrg measurements from coronal T1 images to ground truth 

measurements from coronal T2 fat saturated images which reflects the tendency of T1 to 

underestimate ADPKD kidney volumes and T2 to overestimate ADPKD kidney volumes.  

Interestingly, the best agreement was between the average TraceOrg measurement for all 

sequences to the ground truth measurements with a mean absolute percent difference of 4.9%, 

which may reflect the benefit of eliminating outlier values when calculating the mean of all 

TraceOrg measurements.  

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Comparison to Existing Segmentation Models 

TraceOrg performance on publicly available segmentation models for kidney and liver 

segmentations is shown in Table 5a, Figure 6. On the external test set, TraceOrg achieved Dice 

similarity coefficients of 0.93 (left kidney) and 0.94 (right kidney) compared to 

TotalSegmentator
33

 (0.19 left, 0.28 right), MRAnnotator
34

 (0.13 left, 0.17 right), Kim
7
 (0.30 left, 

0.53 right) and Woznicki
32

 (0.93 left and 0.93 right). For the Kim
7
 model trained on ADPKD 

images, most of the errors were right-left kidney swaps so Dice improved to 0.81 for TKV but 

this was still well below TraceOrg performance (0.94). Woznicki performed comparable to 

TraceOrg on the T2-weighted external validation images but not as well as TraceOrg on the 

internal validation and Woznicki did not work on other sequences or cysts. For liver 

segmentation, TraceOrg achieved a Dice of 0.92 on the external test set compared to 

TotalSegmentator
33

 (0.57), MRAnnotator
34

 (0.72) and Woznicki
32

 (0.87). The Kim
7
 model did 

not segment liver. For kidney cyst segmentation, TraceOrg achieved an average Dice of 0.82, 

substantially outperforming TotalSegmentator (0.09) and achieving similar performance to the 

Gregory-Kline model
21

 (0.76). Note, however, that the Gregory-Kline model could only be run 

on coronal T2 images and required a secondary channel kidney mask. No open-source models 

for liver cyst segmentation were found. Bland-Altman analysis (Supplemental Figure 2) 

confirmed superior agreement between TraceOrg measurement and ground truth. 

 
Discussion 

In this study, we developed an internet-based tool, TraceOrg, for automatically measuring kidney, 

liver and cyst volumes on MRI datasets from ADPKD patients. A major finding of this study is 

that TraceOrg was accurate and performed robustly across a wide range of clinical scenarios.  

For Dice similarity coefficient which measures the percentage of voxels that are identical 

between the model output and ground truth, TraceOrg performance was equal or superior to 

open-source models on T2 and superior to all existing models for T1 and SSFP MR images. For 

Hausdorff Distance, the largest distance between an erroneous model voxel label and the ground 

truth, the model also outperformed existing models for kidney and liver, and performed similarly 

to the Gregory-Kline model
21

 for kidney cyst segmentations. No existing models were available 

for comparing liver cyst labeling performance. Model generalizability was confirmed through 

multiple external validations using both single center and multicenter datasets, as well as older 

ADPKD images from CRISP 2 and more recent images from PKD-RRC. 

Traceorg is based upon deep learning technology and convolutional neural networks with many 

interconnections somewhat resembling the interconnections of the human brain and visual cortex.  

The MR images pass from one computer network layer to another gradually forcing greyscale 

MR images to transform into a simpler image with just 4 shades representing kidney, liver, cyst 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

or background. We used a convolutional neural network, known as U-Net because it performs 

well at these image segmentation tasks. It has a first set of layers known as “encoder” which 

gradually transforms the high-resolution MR images into lower resolution images with more 

dimensions. After encoding, a transformer network allows the model to handle larger anatomic 

structures and provides more interconnections to absorb the information from our relatively large 

training dataset. Finally, a decoder restores the original resolution. Many skip connections 

between the encoder and decoder elements ensure preservation of the high-resolution features. 

This network is trained by repeatedly inputting MR images and comparing the model output to 

ground truth annotations of experts. With each iteration of training, known as an epoch, the error 

between model performance and the ground truth is used to refine millions of interconnection 

parameters, also known as checkpoints, to gradually improve the model performance. We are 

routinely training with 1000 epochs and gradually reducing the learning rate with each epoch to 

ensure convergence and to make sure that big adjustments to the model are only occurring 

toward the beginning of training. 

Factors to consider when evaluating the strengths and weaknesses of a deep learning model 

include the number of patients and scans used for model training, the variety of training data, the 

target organs it can segment, model architecture, availability of external validation, Dice 

similarity coefficient for the external validation and public availability. These factors are listed in 

Table 6 for the deep learning models published for segmenting ADPKD kidneys. The high 

performance of TraceOrg reflects meeting all of these criteria. TraceOrg has the largest number 

of participants and the largest number of scans in the training data. TraceOrg trains with a variety 

of modalities, both CT and MRI and MRI included T2-weighted, T1-weighted, SSFP and DWI 

images as well as images with contrast enhancement. TraceOrg is a 3D model with an extra 

dimension of analysis compared to 2D models and segments multiple structures, kidneys, liver 

and cysts because the training data included all of this information.  

External validation of TraceOrg was performed on datasets spanning a wide range of imaging 

protocols and extent of disease. The Dice similarity coefficient was high on these external 

validations and the agreement with volume measurements for CRISP2 and PKD-RRC 

participants was also good indicating TraceOrg can be expected to perform well outside of our 

local institution. Finally, TraceOrg is available to download and run locally and also available as 

an internet calculator.   

TraceOrg effectively segments liver cysts performing better in PKD patients with liver cyst 

volumes exceeding 50 mL (Table 3). Segmenting smaller liver cyst volumes on MRI in 

ADPKD
22

 is challenging because cross-sections of blood vessels and bile ducts can resemble 

cysts as they are also bright on T2 and dark on T1. When there are few cysts, less than 20, the 

Dice metric is not reliable, oscillating over a wide range with small label changes involving just 

a few voxels. For this reason, we focused on examining cases with 20 or more cysts meeting the 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

criteria for polycystic liver disease. Since ground truth segmentation for liver cysts were not 

available pre inferencing on any dataset, they were created from model-assisted segmentations 

which is less reliable compared to the other ground truth data created prior to model inferencing. 

Additionally, liver segmentation was tested on a limited number of CRISP cases because the 

liver was frequently incompletely imaged in CRISP MRI data.  This may have biased our CRISP 

comparison towards ADPKD patients earlier in the course of their disease with fewer liver cysts. 

However, Bland Altman analysis showed that model performance was similar across a wide 

range of TKV.   

Another limitation of the study is that while CT images were included in the training set, model 

performance on CT was not specifically validated in this study. So presently it is intended only 

for MR images but a CT validation is planned for the future. The impact of segmentation 

accuracy on clinical decision-making was not directly assessed in this study and will be the focus 

of future research. Future validation for longitudinal analyses assessing the accuracy of TKV 

growth rate measurements is also planned. Another limitation is that the dataset, while diverse, 

may not fully capture all possible variations in imaging protocols and patient demographics. 

Further validation on additional external datasets from different geographical regions and 

continually retraining as more images become available will help to strengthen the 

generalizability of the model. Model performance is also inherently dependent on the manual 

annotation protocol; differences in annotation protocols between training and external validation 

datasets could affect observed performance. Another practical limitation is that inferencing was 

not instantaneous due to limitations on our computer resources, so there can be a delay of several 

hours between uploading images and obtaining results depending upon internet speed, case 

backlog, number of pulse sequences and other parameters. As the employed computer 

infrastructure improves, these delays are expected to diminish. 

In conclusion, TraceOrg provides a practical, robust, accurate and scalable solution for the 

segmentation of abdominal organs and cysts in ADPKD. The model is easy to use, generalizable 

and outperforms existing publicly available models, making it suitable for both clinical and 

multicenter research use. Future work will expand the model to include additional anatomical 

structures, imaging modalities, and longitudinal analysis of TKV growth rates to determine the 

clinical impact of the segmentation results.  

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Acknowledgments 

The data from CRISP study conducted by the study investigators and supported by the National 

Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) were supplied by NIDDK 

Central Repository (NIDDK-CR). This manuscript was not prepared under the auspices of the 

CRISP study and does not necessarily reflect the opinions or views of the CRISP study, NIDDK-

CR, or NIDDK. A version of this work was previously presented in abstract form at the 4th 

Annual Virtual PKD RRC Symposium, March 2025 and portions of this work were presented 

during a virtual roundtable discussion hosted by PKD RRC on May 5, 2025. 

Because Dr. Alan S.L. Yu is a Deputy Editor of JASN, he was not involved in the peer-review 

process for this manuscript. Another editor oversaw the peer-review and decision-making 

process for this manuscript. 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Supplemental Material Table of Contents 

Model Training  

Glossary of Terms and Abbreviations  

Supplemental Table 1. Interobserver variability of TraceOrg internal validations.  

Supplemental Figure 1. Model Architecture  

Supplemental Figure 2. Bland Altman Plots of TraceOrg comparisons to publicly available 

models  

  
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

References: 

1. Chapman AB, Devuyst O, Eckardt KU, et al. Autosomal-dominant polycystic 
kidney disease (ADPKD): executive summary from a Kidney Disease: Improving Global 
Outcomes (KDIGO) Controversies Conference. Kidney Int. Jul 2015;88(1):17-27. 
doi:10.1038/ki.2015.59 
2. Zhang ZY, Wang ZM, Huang Y. Polycystic liver disease: Classification, diagnosis, 
treatment process, and clinical management. World J Hepatol. Mar 27 2020;12(3):72-83. 
doi:10.4254/wjh.v12.i3.72 
3. Norcia LF, Watanabe EM, Hamamoto Filho PT, et al. Polycystic Liver Disease: 
Pathophysiology, Diagnosis and Treatment. Hepat Med. 2022;14:135-161. 
doi:10.2147/HMER.S377530 
4. Fick-Brosnahan GM, Belz MM, McFann KK, Johnson AM, Schrier RW. 
Relationship between renal volume growth and renal function in autosomal dominant 
polycystic kidney disease: a longitudinal study. Am J Kidney Dis. Jun 2002;39(6):1127-
1134. doi:10.1053/ajkd.2002.33379 
5. Sedman A, Bell P, Manco-Johnson M, et al. Autosomal dominant polycystic 
kidney disease in childhood: a longitudinal study. Kidney Int. Apr 1987;31(4):1000-1005. 
doi:10.1038/ki.1987.98 
6. Kistler AD, Poster D, Krauer F, et al. Increases in kidney volume in autosomal 
dominant polycystic kidney disease can be detected within 6 months. Kidney Int. Jan 
2009;75(2):235-241. doi:10.1038/ki.2008.558 
7. Kim Y, Tao C, Kim H, Oh GY, Ko J, Bae KT. A Deep Learning Approach for 
Automated Segmentation of Kidneys and Exophytic Cysts in Individuals with Autosomal 
Dominant Polycystic Kidney Disease. J Am Soc Nephrol. Aug 2022;33(8):1581-1589. 
doi:10.1681/ASN.2021111400 
8. Torres VE, Chapman AB, Devuyst O, et al. Tolvaptan in patients with autosomal 
dominant polycystic kidney disease. N Engl J Med. Dec 20 2012;367(25):2407-2418. 
doi:10.1056/NEJMoa1205511 
9. Torres VE, Ahn C, Barten TRM, et al. KDIGO 2025 clinical practice guideline for 
the evaluation, management, and treatment of autosomal dominant polycystic kidney 
disease (ADPKD): executive summary. Kidney Int. Feb 2025;107(2):234-254. 
doi:10.1016/j.kint.2024.07.010 
10. Irazabal MV, Rangel LJ, Bergstralh EJ, et al. Imaging classification of autosomal 
dominant polycystic kidney disease: a simple model for selecting patients for clinical 
trials. J Am Soc Nephrol. Jan 2015;26(1):160-172. doi:10.1681/ASN.2013101138 
11. Cigna. Cigna National Formulary Coverage - Policy:Tolvaptan Products – 
Jynarque Prior Authorization Policy. 2024. 
https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coverageposition
criteria_tolvaptan_products_jynarque_pa.pdf 
12. Demoulin N, Nicola V, Michoux N, et al. Limited Performance of Estimated Total 
Kidney Volume for Follow-up of ADPKD. Kidney Int Rep. Nov 2021;6(11):2821-2829. 
doi:10.1016/j.ekir.2021.08.013 
13. Sharma K, Caroli A, Quach LV, et al. Kidney volume measurement methods for 
clinical studies on autosomal dominant polycystic kidney disease. PLoS One. 
2017;12(5):e0178488. doi:10.1371/journal.pone.0178488 

ACCEPTED

https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coveragepositioncriteria_tolvaptan_products_jynarque_pa.pdf
https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coveragepositioncriteria_tolvaptan_products_jynarque_pa.pdf


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

14. Dev H, Zhu C, Sharbatdaran A, et al. Effect of Averaging Measurements From 
Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal 
Dominant Polycystic Kidney Disease. J Magn Reson Imaging. Oct 2023;58(4):1153-
1160. doi:10.1002/jmri.28593 
15. Goel A, Shih G, Riyahi S, et al. Deployed Deep Learning Kidney Segmentation 
for Polycystic Kidney Disease MRI. Radiol Artif Intell. Mar 2022;4(2):e210205. 
doi:10.1148/ryai.210205 
16. Raj A, Tollens F, Hansen L, et al. Deep Learning-Based Total Kidney Volume 
Segmentation in Autosomal Dominant Polycystic Kidney Disease Using Attention, 
Cosine Loss, and Sharpness Aware Minimization. Diagnostics (Basel). May 7 
2022;12(5)doi:10.3390/diagnostics12051159 
17. Kline TL, Korfiatis P, Edwards ME, et al. Performance of an Artificial Multi-
observer Deep Neural Network for Fully Automated Segmentation of Polycystic Kidneys. 
J Digit Imaging. Aug 2017;30(4):442-448. doi:10.1007/s10278-017-9978-1 
18. Taylor J, Thomas R, Metherall P, Ong A, Simms R. MO012: Development of an 
Accurate Automated Segmentation Algorithm to Measure Total Kidney Volume in 
ADPKD Suitable for Clinical Application (The Cystvas Study). Nephrology Dialysis 
Transplantation. 2022;37(Supplement_3)doi:10.1093/ndt/gfac061.007 
19. He X, Hu Z, Dev H, et al. Test Retest Reproducibility of Organ Volume 
Measurements in ADPKD Using 3D Multimodality Deep Learning. Acad Radiol. Mar 
2024;31(3):889-899. doi:10.1016/j.acra.2023.09.009 
20. Kline TL, Edwards ME, Fetzer J, et al. Automatic semantic segmentation of 
kidney cysts in MR images of patients affected by autosomal-dominant polycystic 
kidney disease. Abdom Radiol (NY). Mar 2021;46(3):1053-1061. doi:10.1007/s00261-
020-02748-4 
21. Gregory AV, Chebib FT, Poudyal B, et al. Utility of new image-derived 
biomarkers for autosomal dominant polycystic kidney disease prognosis using 
automated instance cyst segmentation. Kidney Int. Aug 2023;104(2):334-342. 
doi:10.1016/j.kint.2023.01.010 
22. Chookhachizadeh Moghadam M, Aspal M, He X, et al. Deep learning-based liver 
cyst segmentation in MRI for autosomal dominant polycystic kidney disease. Radiology 
Advances. 2024;1(2)doi:10.1093/radadv/umae014 
23. van Gastel MDA, Edwards ME, Torres VE, Erickson BJ, Gansevoort RT, Kline TL. 
Automatic Measurement of Kidney and Liver Volumes from MR Images of Patients 
Affected by Autosomal Dominant Polycystic Kidney Disease. J Am Soc Nephrol. Aug 
2019;30(8):1514-1522. doi:10.1681/ASN.2018090902 
24. Sharbatdaran A, Romano D, Teichman K, et al. Deep Learning Automation of 
Kidney, Liver, and Spleen Segmentation for Organ Volume Measurements in Autosomal 
Dominant Polycystic Kidney Disease. Tomography. Jul 13 2022;8(4):1804-1819. 
doi:10.3390/tomography8040152 
25. Zhu C, He X, Blumenfeld JD, et al. A Primer for Utilizing Deep Learning and 
Abdominal MRI Imaging Features to Monitor Autosomal Dominant Polycystic Kidney 
Disease Progression. Biomedicines. May 20 
2024;12(5)doi:10.3390/biomedicines12051133 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

26. Winterbottom J, Simms RJ, Caroli A, et al. Flank pain has a significant adverse 
impact on quality of life in ADPKD: the CYSTic-QoL study. Clin Kidney J. Nov 
2022;15(11):2063-2071. doi:10.1093/ckj/sfac144 
27. Trillini M, Caroli A, Perico N, et al. Effects of Octreotide-Long-Acting Release 
Added-on Tolvaptan in Patients with Autosomal Dominant Polycystic Kidney Disease: 
Pilot, Randomized, Placebo-Controlled, Cross-Over Trial. Clin J Am Soc Nephrol. Feb 1 
2023;18(2):223-233. doi:10.2215/CJN.0000000000000049 
28. Caroli A, Perico N, Perna A, et al. Effect of longacting somatostatin analogue on 
kidney and cyst growth in autosomal dominant polycystic kidney disease (ALADIN): a 
randomised, placebo-controlled, multicentre trial. Lancet. Nov 2 2013;382(9903):1485-
1495. doi:10.1016/S0140-6736(13)61407-5 
29. Chapman AB, Guay-Woodford LM, Grantham JJ, et al. Renal structure in early 
autosomal-dominant polycystic kidney disease (ADPKD): The Consortium for 
Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) cohort. Kidney Int. 
Sep 2003;64(3):1035-1045. doi:10.1046/j.1523-1755.2003.00185.x 
30. Bae KT, Tao C, Feldman R, et al. Volume Progression and Imaging 
Classification of Polycystic Liver in Early Autosomal Dominant Polycystic Kidney 
Disease. Clin J Am Soc Nephrol. Mar 2022;17(3):374-384. doi:10.2215/CJN.08660621 
31. Lepping RJ, Karcher, R. T., Keselman, P., Wallace, D., Yu, A., Martin, L. E., 
Brooks, W. M. Inter-rater reliability and translational implications of MR-based polycystic 
kidney volume measurements by stereology at early and late stage disease. presented 
at: International Society of Magnetic Resonance in Medicine (ISMRM) Annual Meeting; 
April 2017; Honolulu, HI.  
32. Woznicki P, Siedek F, van Gastel MDA, et al. Automated Kidney and Liver 
Segmentation in MR Images in Patients with Autosomal Dominant Polycystic Kidney 
Disease: A Multicenter Study. Kidney360. Dec 29 2022;3(12):2048-2058. 
doi:10.34067/KID.0003192022 
33. Akinci D'Antonoli T, Berger LK, Indrakanti AK, et al. TotalSegmentator MRI: 
Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI. 
Radiology. Feb 2025;314(2):e241613. doi:10.1148/radiol.241613 
34. Zhou A, Liu Z, Tieu A, et al. MRAnnotator: multi-anatomy and many-sequence 
MRI segmentation of 44 structures. Radiology Advances. 
2024;2(1)doi:10.1093/radadv/umae035 
35. Wasserthal J, Breit HC, Meyer MT, et al. TotalSegmentator: Robust 
Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell. Sep 
2023;5(5):e230024. doi:10.1148/ryai.230024 
36. Sharma K, Rupprecht C, Caroli A, et al. Automatic Segmentation of Kidneys 
using Deep Learning for Total Kidney Volume Quantification in Autosomal Dominant 
Polycystic Kidney Disease. Sci Rep. May 17 2017;7(1):2049. doi:10.1038/s41598-017-
01779-0 
37. Bevilacqua V, Brunetti A, Cascarano GD, et al. A comparison between two 
semantic deep learning frameworks for the autosomal dominant polycystic kidney 
disease segmentation based on magnetic resonance images. BMC Med Inform Decis 
Mak. Dec 12 2019;19(Suppl 9):244. doi:10.1186/s12911-019-0988-4 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

38. Shin TY, Kim H, Lee JH, et al. Expert-level segmentation using deep learning for 
volumetry of polycystic kidney and liver. Investig Clin Urol. Nov 2020;61(6):555-564. 
doi:10.4111/icu.20200086 
39. Jagtap JM, Gregory AV, Homes HL, et al. Automated measurement of total 
kidney volume from 3D ultrasound images of patients affected by polycystic kidney 
disease and comparison to MR measurements. Abdom Radiol (NY). Jul 
2022;47(7):2408-2419. doi:10.1007/s00261-022-03521-5 
40. Li D, Xiao C, Liu Y, et al. Deep Segmentation Networks for Segmenting Kidneys 
and Detecting Kidney Stones in Unenhanced Abdominal CT Images. Diagnostics 
(Basel). Jul 23 2022;12(8)doi:10.3390/diagnostics12081788 
41. Cui HaM, Yiyi and Yang, Ming and Lu, Yang and Zhang, Mingzi and Fu, Lili and 
Fu, Chicheng and Su, Beilin and He, Chuan and Xue, Cheng and Mei, Changlin and 
Song, Shuwei. Automatic Segmentation of Kidney Volume Using Multi-Module Hybrid 
Based U-Shape in Polycystic Kidney Disease. IEEE Access. 2023;11:58113-58124. 
doi:10.1109/ACCESS.2023.3284029 
42. Taylor J, Thomas R, Metherall P, et al. An Artificial Intelligence Generated 
Automated Algorithm to Measure Total Kidney Volume in ADPKD. Kidney Int Rep. Feb 
2024;9(2):249-256. doi:10.1016/j.ekir.2023.10.029 
43. Hsu JL, Singaravelan A, Lai CY, et al. Applying a Deep Learning Model for Total 
Kidney Volume Measurement in Autosomal Dominant Polycystic Kidney Disease. 
Bioengineering (Basel). Sep 26 2024;11(10)doi:10.3390/bioengineering11100963 
44. Krishnan C, Schmidt E, Onuoha E, Mrug M, Cardenas CE, Kim H. nnUNet for 
Automatic Kidney and Cyst Segmentation in Autosomal Dominant Polycystic Kidney 
Disease. Curr Med Imaging. 2024;20:e15734056272767. 
doi:10.2174/0115734056272767231130110017 
45. Sore R, Cathier P, Vlachomitrou AS, et al. Deep learning-based segmentation of 
kidneys and renal cysts on T2-weighted MRI from patients with autosomal dominant 
polycystic kidney disease. Eur Radiol Exp. Oct 30 2024;8(1):122. doi:10.1186/s41747-
024-00520-7 
46. Sheng TW, Onthoni DD, Gupta P, Lee TH, Sahoo PK. Segmentation of ADPKD 
Computed Tomography Images with Deep Learning Approach for Predicting Total 
Kidney Volume. Biomedicines. Jan 22 2025;13(2)doi:10.3390/biomedicines13020263 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 1.  Demographic, physical and imaging characteristics of the participants included in the training, internal and external 

validation datasets. For participants with more than one MRI examination, all measurements are based on the values at the time of the 

baseline scan. For normally distributed data mean ± standard deviation is reported; otherwise, median and (interquartile range) are 

reported. 

Parameter 

Training Data **  Internal Validation  External Validation 

Kidneys & 

Liver 
Cysts  

Kidneys & 

Liver 
Cysts  

Single-

center 
Multi-center CRISP 2 PKD-RRC 

Number 720 395  70 46  58 73 30 115 

Age (years) 48 ± 15 55 ± 16   53 ± 14  39 ± 6.7  44 ± 11 37 ± 8.1 38 ± 11 22 ± 7 

Gender (F:M) 374:346 158:117  38:32 20:0  29:29 39:34 22:8 68:47 

Height (m) 1.71 ± 0.1 1.69 ± 0.11  1.71 ± 0.12 1.65 ± 0.08  1.73±0.09 1.70 ± 0.1 1.68 ± 0.09 1.68 ± 0.15
 

Weight (kg) 77 ± 18 77 ± 21  75 ± 17 70 ± 12  71 ±15 73 ± 14 72 ± 17 70 ± 23 

Body mass index 

(kg/m
2
) 

26 ± 5.3 27 ± 6.1  26 ± 4.2 26 ± 3.9  24 ± 3.8 25 ± 3.7 26 ± 6.3 24 ± 5.8 

GFR* 

(ml/min/1.73m
2
) 

74 ± 32 70 ± 31  60 ± 28   87 ± 23 83 ± 27* 76 ± 38* 119 ±26 

ht-TKV* 

(mL/m) 
826 ± 772 910 ± 904  1176 ± 900 703 ± 397  894 ± 524 1097 ± 674 703 ± 409 432 ± 297 

ht-kidney cyst 

volume* (mL/m) 
N/A 81(13:512)  N/A 369 ± 361  624 ± 436 733 ± 552 476 ± 386 N/A 

* Glomerular Filtration Rate (GFR) is measured by iothalamate clearance for CRISP 2 dataset and by iohexol clearance for multi-

center dataset and estimated by Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) function for the other datasets. ht-

TKV stands for height adjusted total kidney volume; ht-kidney cyst volume stands for height adjusted kidney cyst volume. 

** Inside training data for kidneys and liver, 2 participants’ demographic data, 2 participants’ heights, 1 participant’s weight, and 7 

participants’ total kidney volumes (TKV) were not available. Inside training data for cysts, 120 participants were anonymized without 

any demographic data and 1 participant’s height were not available. 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 2a. TraceOrg internal validation for kidney and liver segmentations for individual pulse sequences. Values are averaged over all 

sequences. The evaluation metrics used were the Dice similarity coefficient, Average Symmetric Surface Distance (ASSD), Hausdorff 

Distance, and Jaccard Index.  

 Model performance compared to 
Average of 

all sequences  
Axial T2 Axial T1 Axial SSFP Coronal T2 

Coronal 

SSFP 

Kidney       

Dice 0.98 ± 0.02 0.97 ± 0.02 0.98 ± 0.02 0.97 ± 0.02 0.97 ± 0.02 0.97 ± 0.02 

ASSD[mm] 0.77 ± 0.67 1.1 ± 2.0 0.79 ± 0.68 0.97 ± 1.2 0.85 ± 0.86 0.91 ± 1.2 

Hausdorff[mm] 28 ± 18 28 ± 21 26 ± 16 28 ± 22 27 ± 19 28 ± 20 

Jaccard 0.95 ± 0.04 0.95 ± 0.04 0.95 ± 0.04 0.94 ± 0.04 0.95 ± 0.04 0.95 ± 0.04 

Liver       

Dice 0.98 ± 0.01 0.98 ± 0.01 0.98 ± 0.01 0.97 ± 0.02 0.96 ± 0.02 0.97 ± 0.02 

ASSD[mm] 0.55 ± 0.34 0.81 ± 0.51 0.78 ± 0.39 0.78 ± 0.42 1.2 ± 0.73 0.81 ± 0.53 

Hausdorff[mm] 22 ± 10 24 ± 10 26 ± 14 25 ± 22 34 ± 37 26 ± 21 

Jaccard 0.96 ± 0.02 0.96 ± 0.02 0.95 ± 0.02 0.95 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 2b. TraceOrg internal validation for kidney cyst and liver cyst segmentations. Model segmentations on T2-weighted MRI pulse 

sequences are compared to manual corrections performed independently by 2 observers calculating the Dice similarity coefficient, 

Average Symmetric Surface Distance (ASSD), Hausdorff Distance, and Jaccard Index.  Also shown is the Inter-observers mean 

pairwise average showing that the differences between each observer and the model are similar to the difference between observers. 

 Model performance compared to 
Average 

of all observers 

Inter-observers 

Mean Pairwise 

average 
 

Observer 1 Observer 2 

Kidney Cysts 

Dice 0.94 ± 0.05 0.93 ± 0.06 0.93 ± 0.06 0.94 ± 0.07 

ASSD[mm] 0.44 ± 0.35 0.52 ± 0.39 0.48 ± 0.37 0.50 ± 0.48 

Hausdorff[mm] 16 ± 7.4 19 ± 13 17 ± 11 18 ± 15 

Jaccard 0.88 ± 0.09 0.87 ± 0.10 0.88 ± 0.10 0.89 ± 0.11 

Liver Cysts*     

Dice 0.86 ± 0.24 0.79 ± 0.27 0.82 ± 0.26 0.86 ± 0.19 

ASSD[mm] 2.4 ± 7.4 3.0 ± 6.2 2.7 ± 6.8 2.6 ± 6.0 

Hausdorff[mm] 31 ± 32 43 ± 38 37 ± 35 37 ± 37 

Jaccard 0.80 ± 0.26 0.72 ± 0.30 0.76 ± 0.28 0.79 ± 0.24 

 
  ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 3. TraceOrg validation on the external single-center and multicenter datasets. Dice similarity coefficient, Average Symmetric 

Surface Distance (ASSD), Hausdorff Distance, and Jaccard Index were calculated. 

Single Center (n=58) 

 Kidney  Liver 

Metric Coronal T2-w Coronal T1-w 
Steady State 

Free Precession 
 Coronal T2-w Coronal T1-w 

Steady State 

Free Precession 

Dice 0.93 ± 0.03 0.94 ± 0.02 0.92 ± 0.04  0.87 ± 0.13 0.96 ± 0.01 0.92 ± 0.04 

ASSD[mm] 1.7 ± 0.67 1.3 ± 0.35 1.7 ± 0.80  3.8 ± 1.6 1.2 ± 0.21 2.1 ± 1.2 

Hausdorff[mm] 21 ± 10 19 ± 6.6 20 ± 7  31 ± 11 23 ± 8.2 25 ± 7.6 

Jaccard 0.88 ± 0.05 0.89 ± 0.03 0.86 ± 0.06  0.78 ± 0.13 0.92 ± 0.01 0.86 ± 0.07 

 
 Single Center (n=58)  Multi Center (n=73) 

 Kidney Cysts  Liver Cysts*  Kidney  Kidney Cysts 

 
Metric 

Coronal T2-w 

(n=58) 
 

Coronal T2-w, 

Vcyst > 50 mL, 

Ncyst > 20 

(n= 22) 

Coronal T2-w,  

Vcyst < 50 mL, 

Ncyst > 20 

(n= 20) 

Coronal T2-w, 

all cases 

(n=51) 

 
Coronal T2-w 

(n=73) 
 

Coronal T2-w 

(n=73) 

Dice 0.85 ± 0.06  0.90 ± 0.04 0.76 ± 0.19 0.79 ± 0.17  0.95 ± 0.02  0.81 ± 0.08 

ASSD[mm] 1.3 ± 0.5  0.75 ± 0.31 2.2 ± 1.2 3.0 ± 6.0  0.95 ± 0.45  1.5 ± 0.48 

Hausdorff[mm] 26 ± 13  35 ± 18 49 ± 30 48 ± 31  20 ± 11  23 ± 9 

Jaccard 0.74 ± 0.09  0.83 ± 0.07 0.62 ± 0.17 0.68 ± 0.19  0.91 ± 0.03  0.68 ± 0.10 

* For liver cysts performance, Vcyst stands for total liver cysts volume. Ncyst stands for number of cysts. Validation focused on 

polycystic liver diseases cases with Ncyst > 20
2
. Results for all cases including cases with Ncyst < 20 are reported for reference. 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 4. TraceOrg external Validation on CRISP 2 (n=30) and PKD-RRC (U of Kansas (n=115).  

 Total Kidney Volume  Kidney Cysts  Liver 

dataset 
 

Coronal  

T1 

Coronal 

T2-fatsat 

Mean of All 

Sequences* 
 

Axial/Coronal 

T2-w 
 

Coronal 

T1-w/T2-w 

CRISP (Kidney =30, Liver =22*)       

     TraceOrg Volume 1260 ± 695 1374 ± 781 1324 ± 739  815 ± 674  1539 ± 175 

     CRISP 2 Stereology Volume 1272 ± 736 see Coronal T1 see Coronal T1  806 ± 671  1530 ± 227 

     Mean |% difference|  5.3 ± 3.6 8.7** ± 4.9 6.0*** ± 4.4  11 ± 11  5.5 ± 5.1 

PKD-RRC (– 115 participants with 225 scans at age >18 years and 56 scans at age <18 years) 

     TraceOrg Volume 891 ± 672 936 ± 771 911 ± 747     

     PKD-RRC Volume see T2 fatsat 929 ± 769 925 ± 767     

     Mean |% difference| 11 ± 11 6.1 ± 7.7 4.9 ± 4.9     

 
*Among the 30 participants, 22 had complete sequences for liver validation. 

** This mean absolute percent difference is calculated between the TraceOrg measurements from Coronal T2 fat saturated images and 

the CRISP2 stereology volume measurements made from Coronal T1 images. 

*** CRISP mean of all sequences included more bad sequences which is why it is worse. 

 
  ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 5a. Performance comparison between TraceOrg and existing models for the segmentation of left kidney, right kidney, and liver 

with mean Dice and Hausdorff distance calculated as the mean of comparisons with ground truth manual tracing on the internal and 

external validation datasets for kidney and liver. 

 
Left Kidney Right Kidney Total Kidney Liver 

Dice 
Hausdorff 

[mm] 
Dice 

Hausdorff 

[mm] 
Dice 

Hausdorff 

[mm] 
Dice 

Hausdorff 

[mm] 

Internal Validation 

TraceOrg 0.97 ± 0.02 24 ± 15 0.94 ± 0.15 24 ± 19 0.97 ± 0.02 28 ± 20  0.97 ± 0.02 26 ± 21 

TotalSegmentator
33

 0.21 ± 0.20 68 ± 45 0.25 ± 0.23 64 ± 47 0.22 ± 0.20 88 ± 58 0.69 ± 0.26 65 ± 42 

MRAnnotator
34

 0.18 ± 0.24 120 ± 68 0.20 ± 0.26 100 ± 60
 

0.19 ± 0.23 129 ± 60 0.69 ± 0.30 157 ± 102 

Kim
7
* 0.15 ± 0.17 267 ± 54 0.31 ± 0.18 286 ± 38 0.49 ± 0.23 218 ± 51 N/A N/A 

Woznicki
32

* 0.92 ± 0.13 34 ± 43 0.92 ± 0.09 41 ± 48 0.91 ± 0.14 54 ± 55 0.96 ± 0.03 45 ± 69 

External Validation 

TraceOrg 0.93 ± 0.04 19 ± 20 0.94 ± 0.04 19 ± 16 0.94 ± 0.03 20 ± 9 0.92 ± 0.08 26 ± 10 

TotalSegmentator 0.19 ± 0.20  52 ± 32 0.28 ± 0.26 41 ± 27 0.23 ± 0.22 64 ± 38 0.57 ± 0.30 47 ± 29 

MRAnnotator
34

 0.13 ± 0.18 117 ± 55 0.17 ± 0.23 99 ± 59 0.14 ± 0.18 130 ± 55 0.72 ± 0.29 91 ± 66 

Kim
7
* 0.30 ± 0.22 187 ± 53 0.53 ± 0.15 199 ± 39 0.81 ± 0.15 134 ± 37 N/A N/A 

Woznicki
32

* 0.93 ± 0.08 18 ± 16 0.93 ± 0.05 18 ± 14 0.94 ± 0.06 21 ± 14 0.87 ± 0.11 42 ± 40 

* Kim and Woznicki models were trained exclusively on T2-w images and did not work on other sequences so this comparative 

analysis was restricted to T2-w images to show the most favorable performance for those models. 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 5b. Performance comparison between TraceOrg and existing models for kidney cyst segmentation on the internal and external 

validation datasets for cysts. Mean Dice, Hausdorff distance, and asymmetrical surface distance (ASSD) are reported. 

 
Kidney Cysts 

Dice 
Hausdorff 

[mm] 
ASSD 

Internal Validation 

TraceOrg 0.94 ± 0.05
 

17 ± 11 0.48 ± 0.37 

TotalSegmentator
35

 0.07 ± 0.07 78 ± 18 21 ± 9 

Gregory-Kline
20

 0.74 ± 0.14 23 ± 20 1.97 ± 1.17 

External Validation 

TraceOrg 0.82 ± 0.07 24 ± 11 1.39 ± 0.49 

TotalSegmentator
35

 0.09 ± 0.05 69 ± 15 20 ± 5 

Gregory-Kline
20

 0.76 ± 0.10 23 ± 9 1.44 ± 0.34 

 
*Gregory-Kline model could only run coronal T2-weighted images and a kidney mask was a required input. 

  ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Table 6. Summary of Deep Learning models for ADPKD kidney segmentations.  Note that TraceOrg has the largest number of 

patients and scans used for training data, the most modalities used for training, works on the most targets (kidney liver cysts), uses a 

combination 3D U-Net transformer architecture, provides external validation showing high Dice and is publicly available. 

Model 

1st Author 
Year 

Training 

Data 
Target 

 
Modality 

Model 

Architecture 

Validation 

Type(s)* 

Dice Similarity Coeff Public Availability of 

Model 
Pts Scans Kidney Cyst 

Kline
17

 2017  2000 Kidney MRI – T2 U-Net Internal  0.96  No 

Sharma
36

 2017 125 165 Kidney CT CNN Internal 0.86  No 

Bevilacqua
37

 2019 18 526 Kidney MRI – T2 CNN Internal 0.85-0.87  No 

Van Gastel
23

 2019 440  Kidney, Liver MRI – T2 CNN Internal  0.96  No 

Shin
38

 2020 175  Kidney CT V-Net Internal 0.96  ? 

GregoryKline
20

 2021 40 40 Kidney cysts MRI - T2 U-Net Internal  0.85 Yes 

Goel
15

 2022 129 213 Kidney MRI – T2  U-Net External 0.98   Yes-TraceOrg 

precursor 

Jagtap
39

 2022 22 132 Kidney Ultrasound 2D U-Net Internal 0.8  No 

Kim
7
 2022 157 157 Kidney+ 

ExophyticCysts 

MRI- Cor 

T2  

3D U-Net Internal 0.96 0.95-

Exophytic 

Yes  

Li
40

 2022  260 K+Stones CT Two-Stage**
 

External  0.82-0.97   No 

Raj
16

 2022 100  Kidney, Cysts MRI-T1 U-Net Internal 0.92  No 

Sharbatdaran
24

 2022 151  Kidney, Liver, 

Spleen 

MRI-T2 2D U-Net External 0.96-0.98  On Request 

Woznicki
32

 2022 327 992 Kidney, Liver MRI-T2 nn-U-net External 0.92-0.97  Yes 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

Cui
41

 2023 355  Kidney MRI-T1 HU-Net Internal 0.92  No 

Taylor
42

 2024  227 Kidney MRI nnU-Net External  0.96  No  

He
19

 2024 413 

 
 Kidney, Liver, 

Spleen 

MRI, CT nnU-Net External 

Test-retest 

0.98   

|%diff|=1.3 

 Yes-TraceOrg 

precursor 

Hsu
43 

2024 40  Kidney MRI – T2 3D U-Net  Internal 0.82-0.89  On Request (Data) 

Krishnan
44

 2024  604 Kidney, Cysts MRI U-Net Internal 0.92-0.95 0.83-0.89 No 

Sore
45

 2024 160  Kidney, Cysts MRI – T2 U-Net Internal 0.93-0.94 0.86-0.87 On Request (Data) 

Sheng
46

 2025 97 160 Kidney CT DeepLabV3+ Internal  0.96  No 

TraceOrg 2025 720 5052 
Kidney, liver,  

cysts 

CT MRI T1 

SSFP T2 

3D U-Net and 

Transformer  
External 0.92-0.95 0.82 Yes 

*External validation implies internal validation was also performed. 

** Two-stage dependent segmentation framework using interchangeable models (e.g., 3D U-Net, Res U-Net, DeepLabV3+, UNETR) 

ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure Legends  

 Figure 1. Participant Flow Chart showing the number of CT scans with (C+) and without (C-) 

contrast enhancement and number of each MRI sequence included in the training, internal 

validation, external validation data sets as well as from CRISP and PKD-RRC. Publicly available 

models compared to TraceOrg for segmentation performance (lower right) are color coded to 

indicate the MR sequences each comparator model was able to run. 

 
  ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure 2.  TraceOrg deep learning model architecture.  A stack of CT or MRI input images, too 

large for the memory to handle, enters the model in two ways, first (upper pathway) the images 

are down sampled to a lower resolution that fits within the computer memory to learn the global 

context.  Second (lower pathway), the images are cropped down to small 3D patches that fit 

within computer memory.  After processing by global and patch encoders respectively, processed 

images are combined in the transformer, then decoded to restore the full resolution. After output 

an organ completion quality assurance check ensures the entire organ is within the field of view, 

a FAIL on this case results from kidney lower poles being off the bottom edge of the images. 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure 3. TraceOrg calculator website interface. The top two figures are the landing page (left) 

and the upload page (right). The bottom two figures are an example of the TraceOrg output with 

selected tile segmentation snapshots of each series, 25%, 50% and 75% of the way through the 

stack of images (left) and volume report for organ segmentations (top right), cyst segmentations 

(middle right) and TKV on the Mayo Clinic Image Classification plot (bottom right). 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure 4. Examples of external validation results showing TraceOrg model performance 

compared to human labeled ground truth. Top right shows different annotation protocols used in 

the training dataset and the external testing datasets on kidneys.  

  
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure 5. Example of comparisons between volumes resulting from TraceOrg segmentations 

(left kidney in green, right kidney in red, liver in yellow, and kidney cysts in pink) and volumes 

computed by stereology in the CRISP 2 Study
29

. Top row shows 3 CRISP 2 participants for 

whom the TraceOrg measurement agree well with CRISP stereology volume measurements.  

Bottom row shows three examples with excellent TraceOrg annotations but some disagreement 

raising the possibility that discrepancies may be due to limitations of the stereology reference. 

  
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
Figure 6.  TraceOrg model (top left) shows excellent segmentation performance for liver 

(yellow), left kidney (green) and right kidney (red) as well as cysts (pink-bottom left) compared 

to poorer performance of TotalSegmentator
35

 (top middle and bottom middle), MRAnnotator
34

 
(top right), Kim
7
 (second row middle images) where the exophytic cysts (blue in second row 

right image, the model output) are remapped as kidneys (second row middle image) to calculate 

TKV and Gregory-Kline model
20

 (bottom right). TraceOrg demonstrated Dice = 0.99 (kidney), 

0.99 (liver), 0.87 (kidney cyst) in this case surpassing all other models although Woznicki 

(middle left) performed nearly as well for liver and kidneys. 

 
ACCEPTED


Copyright © 2025   The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 

 
ACCEPTED


	Disclosure Supplement.pdf
	Bazojoo_Vahid_Journal_Disclosure_09032025_184125
	Blumenfeld_Jon_Journal_Disclosure_08252025_134107
	Caroli_Anna_Journal_Disclosure_08262025_105400
	Chevalier disclosure JASN
	Davoudi_Vahid_Journal_Disclosure_08252025_123105
	Dev_Hreedi_Journal_Disclosure_09042025_230738
	He_Xinzi_Journal_Disclosure_09042025_123915
	Hu_Zhongxiu_Journal_Disclosure_09032025_130608
	Lepping_Rebecca_Journal_Disclosure_09082025_121507
	Min_Robert_Journal_Disclosure_09082025_151417
	Moghadam_Mina_Journal_Disclosure_08252025_171855
	Pasini_Siria_Journal_Disclosure_09042025_035645
	Prince_Martin_Journal_Disclosure_09032025_123858
	Prince_Serena_Journal_Disclosure_09042025_202347
	Sabuncu_Mert_Journal_Disclosure_10022025_165315
	Sattar_Usama_Journal_Disclosure_09042025_235220
	Scalco_Elisa_Journal_Disclosure_09042025_031424
	Shen_Mengjun_Journal_Disclosure_08252025_130610
	Shih_Sophie_Journal_Disclosure_08252025_130409
	Shimonov_Daniil_Journal_Disclosure_09102025_101249
	Xiong_Qing_Journal_Disclosure_09032025_131406
	Yu_Alan_Journal_Disclosure_08252025_123654
	Zhu_Chenglin_Journal_Disclosure_08272025_145152