Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Journal of the American Society of Nephrology Publish Ahead of Print DOI: 10.1681/ASN.0000000904 Automatically Measuring Kidney, Liver, and Cyst Volumes in Autosomal Dominant Polycystic Kidney Disease Qing Xiong 1 ; Xinzi He 2 ; Elisa Scalco 3 ; Siria Pasini 4 ; Chenglin Zhu 1 ; Mina C. Moghadam 2 ; Usama Sattar 1 ; Vahid Davoudi 1 ; Vahid Bazojoo 1 ; Hreedi Dev 1 ; Mengjun Shen 1 ; Zhongxiu Hu 1 ; Sophie Shih 1 ; Serena J. Prince 1 ; Jon D. Blumenfeld 5,6 ; Robert J. Min 1 ; James M. Chevalier 5,6 ; Daniil Shimonov 5,6 ; Rebecca J. Lepping 7 ; Alan S.L. Yu 8,9 ; Mert R. Sabuncu 1 ; Anna Caroli 4 ; Martin R. Prince 1 1 Department of Radiology, Weill Cornell Medicine, New York City, New York, U.S.A.; 2 Biomedical Engineering, Cornell University, Ithaca, New York, U.S.A.; 3 Institute of Biomedical Technologies, Italian National Research Council (ITB-CNR), Segrate (MI), Italy; 4 Bioengineering Department, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Ranica (BG), Italy; 5 Department of Medicine, Weill Cornell Medicine, New York City, New York, U.S.A., 6 Rogosin Institute, New York, New York, USA; 7 Department of Neurology, University of Kansas Medical Center, Kansas City, Kansas, U.S.A.; 8 Department of Internal Medicine, Division of Nephrology and Hypertension, University of Kansas Medical Center, Kansas City, Kansas, U.S.A.; 9 Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, Kansas, U.S.A. Address Correspondence to Prof. Martin R. Prince, 416 East 55th Street, New York, NY 10022, map2008@med.cornell.edu M.R.S., A.C., and M.R.P. contributed equally to this work. This is an open access article distributed under the terms of the Creative Commons Attribution- NonCommercial-NoDerivatives License 4.0 (CC-BY-NC-ND), which permits downloading and sharing the work provided it is properly cited. The work cannot be changed in any way or commercially without permission from the journal. ACCEPTED mailto:map2008@med.cornell.edu Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Abstract Background Kidney, liver and cyst volumes are important for diagnosis, classification and management of autosomal dominant polycystic kidney disease (ADPKD) but challenging to measure accurately and reproducibly. Here, we develop a web-based deep learning platform to automatically and robustly measure kidneys, liver and cyst volumes in ADPKD. Methods MRI and CT scans from ADPKD patients (n=611) and participants without ADPKD (n=109) were used to train a 3D hybrid model combining U-Net and transformer elements for segmenting kidneys, liver and cysts. The model is implemented as a web-based calculator at www.traceorg.com, providing segmentation labels, volumes and Mayo Clinic Image Classification (MIC). Automatic browser anonymization of DICOM images ensures privacy. Internal validation was conducted on 70 MRIs for kidney and liver segmentations, 46 MRIs for cyst segmentations and performance was compared to 5 open access segmentation models (TotalSegmentator, MR Annotator, Kim, Woznicki and Gregory-Kline). External validation was performed on one single-center dataset (n=58), one multicenter dataset (n=73), CRISP2 (n=30) and PKD-RRC (n=115) MRIs with T2-weighted and T1-weighted images. Results After training on 720 participants (mean age=48±15, eGFR=74±32 ml/min/1.73m 2 and htTKV=826±772ml/m), TraceOrg internal validation performance achieved high mean Dice scores of 0.97 (kidneys), 0.97 (liver), 0.93 (kidney cysts) and 0.82 (liver cysts) outperforming existing models for ADPKD. External validation showed strong performance with Dice scores of 0.92-0.94 (kidney), 0.87-0.96 (liver), 0.85 (kidney cysts) and 0.76-0.90 (liver cysts) for the single-center and 0.95 (kidney), 0.81 (kidney cysts) for the multicenter dataset. Compared to CRISP volumes measured by stereology, mean absolute percent difference was 5.3% (kidneys, n=30), 11% (kidney cysts, n=30) and 5.5% (liver, n=22). Compared to PKD-RRC (n=115), mean absolute percent difference in TKV was 4.9%. Conclusions TraceOrg is a publicly available web-based tool that automatically measures kidney, liver and cyst volumes from abdominal MRI in ADPKD with high accuracy compared to manual segmentations. Supplemental Digital Content --- http://links.lww.com/JSN/F518 ACCEPTED http://links.lww.com/JSN/F518 Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Introduction Autosomal dominant polycystic kidney disease (ADPKD) is the most common inherited kidney disease affecting an estimated 1:1000 live births 1 . Patients with ADPKD develop cysts in the kidneys and liver, causing these organs to enlarge 2,3 . Total kidney volume (TKV), the sum of right and left kidney volumes, is a validated biomarker that is routinely used for tracking ADPKD progression and predicting the onset of end stage kidney disease 4-7 . TKV was used as a primary endpoint in clinical trials of tolvaptan for evaluating therapeutic efficacy. Tolvaptan is currently the only Food and Drug Administration (FDA)-approved pharmacologic treatment for ADPKD, specifically for patients at risk of rapid disease progression 8-10 . The Mayo Clinic Image Classification (MIC), using age and height-adjusted total kidney volume (Ht-TKV), is a standard predictor of disease severity in ADPKD and is increasingly being used in the US to determine eligibility for tolvaptan therapy 11 . Consequently, accurate and reproducible measurement of TKV is essential for clinical management and investigational studies. Traditional methods, to estimate kidney volume (e.g. modified ellipsoid formula using length, width and depth), have high inter-reader variability (7–15%) 12,13 . Manual contouring (3–6.7% variability) of the kidneys is more accurate than the ellipsoid formula, and has greater precision needed for longitudinal follow-up 14 but is tedious, time consuming and subject to operator variability. Deep learning models address these limitations by automating kidney segmentation on multiple imaging sequences, further improving reproducibility and reducing reader burden 14-18 . Recent advances have brought deep learning performance close to that of expert manual contouring, significantly reducing the time required for accurate segmentation 19 . Nevertheless, there are important limitations of TKV in the clinical assessment of ADPKD. For example, TKV provides only limited information in the early stages of the disease when cyst volume is less than TKV measurement noise 7,20 . Measuring cyst volumes has potential to improve the accuracy of ADPKD prognosis 21,22 . Furthermore, there are extrarenal manifestations of ADPKD, including liver cyst growth, which are associated with significant morbidity and impairment of quality of life 23-25 . Therefore, comprehensive assessment of the overall ADPKD phenotype requires readily accessible imaging biomarkers that can be measured efficiently, accurately and scaled to incorporate large populations of ADPKD patients in clinical and investigational settings. This study introduces a 3D deep learning model that segments kidneys, liver, and cysts, on multiple MRI pulse sequences. The tool is accessible via an online calculator (traceorg.com) supporting widespread use. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Methods Training and Validation Datasets Training and Internal Validation Dataset The TraceOrg training/internal validation dataset consisted of all available abdominal CT scans and MR series with ground truth labels available, including T2-weighted, T1-weighted, Steady State Free Precession (SSFP), DWI and contrast enhanced images, see Figure 1. From this dataset, all images from 70 participants were held out from training to use for internal validation on kidney and liver segmentations. For cyst segmentation, training data were derived from contrast-enhanced CT and MRI series including T2-weighted, T1-weighted (primarily for hemorrhagic/proteinaceous cysts), and SSFP holding out all images on 46 participants for internal validation. External Validation Datasets Two datasets from Italy were used for external validation. The first dataset (n=58) was acquired from a single-center (Bergamo) and included MRI data acquired in the context of two prospective, longitudinal, multi-center, completed clinical studies 26,27 . All MRIs in this dataset were acquired between June 2014 and November 2020, including coronal T2-weighted fat saturated, out-of-phase coronal T1-weighted spoiled gradient echo and SSFP sequences. For liver cysts segmentation performance, we excluded 7 participants with no liver cysts. The second external dataset (n=73) was a multicenter cohort from a completed clinical trial involving 6 Italian medical centers with T2-weighted fat saturated scans acquired between May 2006 and May 2008, incorporating a variety of image resolutions, scanner types, and acquisition protocols 28 . For both single and multicenter cohorts, only baseline MRI scans were utilized for external validation so there was only one protocol MRI per participant. CRISP 2 Dataset Further external validation was performed comparing TraceOrg organ and cyst volume measurements to the stereology volume measurements in the Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) dataset 29,30 . For this dataset, ADPKD participants were scanned 6 times over 8 years with organ volume measurements on the first 4 scans, CRISP 1, performed by gadolinium-enhanced T1-weighted MRI and on non-contrast MRI for the later CRISP 2 scans 5 and 6, similar to our current approach. Thus, we compared the performance of TraceOrg volume measurements to those derived by stereology from the non-contrast MRI scans in CRISP 2, which were acquired between 2006 and 2008. In most CRISP 2 participants, multiple breath hold coronal T2-weighted acquisitions were utilized to cover the entire kidneys and liver, creating a challenge to compose the acquisition without corrupting organ volumes. To ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology avoid this confounding issue, we selected only cases where the entirety of both kidneys was acquired in a single acquisition. For each CRISP 2 participant meeting this criterion (n=30), the absolute percent difference in kidney, and kidney cyst volume between the CRISP stereology measurement and TraceOrg measurement was calculated. We performed a similar analysis on liver volumes (n=22). For volumetric accuracy assessment, we performed three analysis because CRISP 2 volumes were derived from T1-weighted imaging: (1) T1-only analysis using those sequences, (2) T2-only analysis, and (3) all-sequence analysis. For the all-sequence analysis, we averaged all available sequences for each participant (excluding outlier values >10% different from the median) to produce a single participant-level volume for comparison with the stereology reference volumes. PKD-RRC Dataset External validation with more recently acquired scans from the Polycystic Kidney Disease- Research Resource Consortium (PKD-RRC) was performed with images acquired at the University of Kansas Medical Center, University of Chicago Medical Center and the Children’s Mercy Hospital Kansas City from 2016 to 2025 (n=115). For this dataset, participants are still being recruited and scanned biennially by non-contrast MRI including coronal T2, T2 fat saturated, T1 and SSFP sequences. For some participants, additional axial and sagittal T2 fat saturated and non-contrast MRA sequences were acquired. Because PKD-RRC reference volumes were obtained from T2-weighted fat saturated imaging, we separately performed a T2- only analysis using those sequences, a T1-only analysis and an all-sequence average analysis analogous to the methodology described for CRISP 2. Ethics Approval This study adheres to the Declaration of Helsinki. Permission for data reuse was obtained from the local ethics committee Lombardy 6 (Reg. N.2024-3.11/486) for both single-center and multicenter Italian datasets, as well as for the CRISP data. Internal validation was conducted under Weill Cornell IRB approval #1610017623. Ethics approval for PKD-RRC imaging and analysis was provided by the University of Kansas Medical Center with reliance by the University of Chicago Medical Center and the Children’s Mercy Hospital Kansas City, approval #STUDY00146013. Data Annotation Protocol Training Dataset Annotation Labeling utilized an iterative model-assisted workflow to streamline the annotation process and enhance annotation accuracy. A previously described 3D nnUnet model 19 was utilized to generate initial segmentation predictions for target structures, including kidneys, liver, and their related cysts. These predictions served as a foundation for manual refinement by trained research ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology assistants using ITK-SNAP (www.itksnap.org version 3.8.0). Manual refinement tasks were randomly assigned among the annotators, who labeled only those MRI sequences (T1-weighted, T2-weighted, and SSFP) and CT scans containing kidneys and liver completely within the field of view. Labeling was supervised by a board-certified radiologist with >30 years of experience (MRP), who checked annotation accuracy for every case. This two-tiered process ensured a high standard of annotation while maintaining efficiency. The segmentation labels were stored in a standardized format, Neuroimaging Informatics Technology Initiative (NIfTI) to ensure compatibility with downstream deep learning pipelines. Internal Validation Dataset Annotation We used the same semi-automatic approach as the training data annotation approach where the TraceOrg model segmentations were manually refined by expert reviewers, serving as the ground truth labels for evaluating model performance. Annotations for kidney and liver were refined by 4 physician experts (VB, VD, US, MS) and 2 novices (SS, SP) for all available complete sequences. Annotations for kidney and liver cysts were manually refined by 2 physician experts (VB, VD). External Validation Dataset Annotation External validations on kidney, liver, and kidney cysts were performed using all available images with existing ground truth data, thereby ensuring the ground truth was created blinded to model outputs and there was no selection bias. For the single-center dataset, manual segmentation was performed on the MR images where delineation was most suitable for each structure. Specifically, kidney and liver annotations were performed manually on thin-slice T1-weighted images, whilst kidney cysts delineation was made on T2-weighted images. Binary masks were then transformed onto the other sequences to create the ground truth for comprehensive assessment. For liver cysts, we used the semi-automatic approach; TraceOrg model labels were refined by experts from the external center to create the ground truth. For the multicenter dataset, the ground truth was performed directly on T2-weighted sequences with varying slice thicknesses across the different centers for kidney. Ground truth for kidney cysts was obtained by a previously published semi-automatic method 28 . For the multicenter dataset, no ground truth was available for liver and liver cysts since the liver was incompletely imaged. CRISP 2 Dataset Annotation The CRISP dataset employed stereology for precise volume measurements for kidneys on T1- weighted images, and cysts on T2-weighted images 29 . Trained raters placed point grids over orthogonal views of the organs and cysts and manually counted points falling within target structures. Volume calculation incorporated point counts, grid size, and slice thickness. ACCEPTED http://www.itksnap.org/ Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology PKD-RRC Dataset Annotation For all scans, the right and left kidney volumes were measured at the U of Kansas Medical Center by manual planimetry labeling right and left kidneys on every slice of the coronal T2 fat saturated scans. In participants who had more than one coronal T2 fat saturated scan (usually because the first scan was corrupted by respiratory motion), the measurement was made on the best quality scan determined by visual inspection of artifacts. To improve inter-rater reliability, manual planimetry was automatically translated into stereology estimates using a 100 mm 2 square grid overlaid on the images and points inside the tracing boundary being automatically labeled. Total volume via stereology was estimated as (#points*100*slice thickness)/1000. Reliability of stereology volume estimates between raters was assessed with intraclass correlation. Training of new raters was continued until ICC exceeded 0.9 indicating excellent agreement 31 . TraceOrg Model The TraceOrg model builds upon our previously published work 19 but extends its capabilities significantly to improve organ segmentations and include cyst segmentations. It is a hybrid 3D U-Net transformer architecture designed for high-accuracy segmentation of abdominal organs and cysts 19 . The model integrates 3D convolutional neural networks with transformer-based components to capture both local and global contexts, improving segmentation performance across diverse imaging modalities and patient populations (Figure 2 and Supplemental Figure 1). Specifically, the U-Net architecture is used for detailed feature extraction, while the transformer component enhances the model's ability to understand complex spatial relationships. Model training details are described in Supplemental Material. Comparison to Existing Publicly Available Open-Source Models The PubMed database was searched for all manuscripts reporting on models for kidney or cyst segmentations in ADPKD using the terms “ADPKD” and “deep learning.” TraceOrg performance was compared to all publicly available models identified. Three ADPKD-specific models were found: Kim 7 model, Woznicki 32 model and Gregory-Kline model 21 . For the Kim 7 and Woznicki 32 models, only T2-weighted images were used, as those models were originally trained exclusively on T2 images. Kim model outputs included exophytic cysts, which were reassigned to the nearest kidney label using a majority-vote approach; in cases of ties or absent neighboring labels, the nearest label based on minimum distance was assigned. The Gregory- Kline model 21 was evaluated only on coronal T2 images, as this was the sole modality supported. In addition, TraceOrg performance was compared to two whole body segmentation models, TotalSegmentator 33 and MRAnnotator 34 , which are publicly available for use on MR images although not specifically optimized for ADPKD. For the internal test set and the external test sets in which ground truth segmentations were available, the performance of each publicly available model was compared to TraceOrg by calculating Dice similarity coefficient and Hausdorff Distance. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Web Interface Implementation TraceOrg is implemented as a web-based calculator (www.traceorg.com) that offers two deployment options. For imaging expert users including radiologists and nephrologists who prefer localized solutions, the model can be deployed directly on their local systems. This enables offline use without relying on internet connectivity by downloading the code with checkpoints. Alternatively, users can leverage the web-based interface, which requires no additional hardware or programming knowledge. The web interface allows users to upload DICOM images for segmentation and automatically receive kidney, liver, and cyst volumes as well as the MIC (Figure 3). Patient privacy is protected through in-browser data anonymization occurring on the user computer prior to image uploading, ensuring compliance with privacy standards including HIPAA and related privacy regulations. A report is generated providing users with snapshots of the organ and cyst segmentations to allow quick verification of their accuracy as well as label volumes and MIC. Segmentation labels are provided in a compressed and anonymized (NIfTI) format and any model errors can be refined manually, thereby achieving performance comparable to manual contouring but in a fraction of the time. Statistical Analysis For the evaluation of segmentation accuracy, the Dice similarity coefficient, ASSD, Hausdorff Distance and Jaccard index were calculated for each segmented structure on both internal and external validation datasets. These metrics were used to assess the overlap and boundary precision of the model's predictions compared to expert manual annotations. TraceOrg model performance on the internal and external validation datasets was further compared to five open source models which are publicly available: TotalSegmentator 33 , MRAnnotator 34 , Kim 7 , Woznicki 32 and the kidney cyst segmentation model proposed by Gregory and Kline et al. 20 using a paired t-test, with a significance level set at p < 0.05 on calculation of Dice, Hausdorff Distance and ASSD. Liver cyst segmentation comparison was excluded, as there was no baseline tool available to support this task. For CRISP 2 and PKD-RRC datasets, volumetric accuracy was assessed by calculating the mean absolute percent differences between TraceOrg predicted volumes and CRISP2 stereology/PKD-RRC manual contouring derived reference volumes. This assessment was performed separately for T1-only, T2-only, and all-sequence average volumes as described above. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Results The study included diverse cohorts across training dataset (n=720 for kidneys/liver, n=395 for cysts), internal validation (n=70 for kidneys/liver, n=46 for cysts), external validation datasets (single-center n=58, multi-center n=73), CRISP 2 dataset (n=30) and PKD-RRC dataset (n=115), see Figure 1. Demographic data, body habitus data, estimated glomerular filtration rate (eGFR – for training data, internal validation data, single-center and PKD-RRC datasets), measured glomerular filtration rate (for CRISP and the multi-center datasets), TKV and total cyst volume are summarized in Table 1. Mean age for the training cohort was 48 years but for testing cohorts ranged from 22 to 53 years indicating validations were being performed over a broad age range. Mean body mass index (BMI) ranged from 24 to 27 kg/m² and mean ht-TKV ranged from 432 to 1176 mL/m, demonstrating the model's validation across a broad range of disease severity. TraceOrg Model Performance Internal Validation TraceOrg demonstrated excellent performance on the internal validation dataset when compared to expert manual segmentations. For kidney segmentation, Table 2a, the model achieved an average Dice similarity coefficient of 0.97, ASSD of 0.91 mm, Hausdorff distance of 28 mm, and Jaccard index of 0.95 with good agreement across six independent observers (Supplemental Table 1). Similarly, for liver segmentation (Table 2a), the model achieved a Dice of 0.97, ASSD of 0.81 mm, Hausdorff distance of 26 mm, and Jaccard index of 0.95. For both liver and kidney segmentations the performances were similar across 5 different MRI pulse sequences. For kidney cysts (Table 2b), TraceOrg achieved a Dice coefficient of 0.93, ASSD of 0.48 mm, Hausdorff distance of 17 mm, and Jaccard index of 0.88 when averaged across two observers. Liver cyst segmentation showed a Dice of 0.86, ASSD of 2.6 mm, Hausdorff distance of 37 mm, and Jaccard index of 0.79. Notably, the model's performance metrics were comparable to inter-observer variability for all structures, indicating that TraceOrg achieves expert level accuracy (Supplemental Table 1, Table 2b). External Validation Single-center Dataset On the external single-center dataset, TraceOrg maintained robust performance across different MR sequences (Table 3, Figure 4). The model maintained an average Dice similarity coefficient of 0.94 for kidney labels on T1-weighted images. Projecting the T1-weighted mask onto the T2- weighted and SSFP images had a Dice of 0.93 and 0.92. For the liver labels, the T1-weighted ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Dice was 0.96 and SSFP was 0.92. Projecting the T1-weighted mask onto the T2-fat saturated images gave a lower Dice of 0.87 reflecting a greater challenge of accurately labeling the dark liver against a dark fat suppressed background. For kidney cysts the Dice was 0.85. For liver cysts the Dice was 0.79. However, for participants with polycystic liver disease (> 20 cysts), the Dice was 0.90 in participants with a larger cyst burden, >50ml 30 , and 0.76 in participants with smaller liver cyst burden. Participants with liver cyst volumes <50ml often had many cysts <1cm diameter, which are more challenging to segment and more readily confused with hepatic vessels or bile ducts. Multicenter Dataset On the multi-center dataset, TraceOrg maintained high performance with a Dice of 0.95 for kidneys and 0.81 for kidney cysts on T2-weighted images (Table 3, Figure 5). CRISP 2 Validation When validated against the CRISP 2 dataset (n=30), which used stereology as the reference standard, TraceOrg demonstrated good volumetric accuracy (Table 4). The mean absolute percent difference was 5.3% for TraceOrg kidney volumes on T1-weighted images, 8.7% for kidney volumes on T2 images and 6.0% when comparing the average of all MRI sequences to the stereology ground truth. The absolute percent difference was 11% for kidney cyst volumes. The liver was entirely within the field of view in 22 of those 30 participants for T1-weighted (n=5) and T2-weighted (n=17) sequences. The mean absolute percent difference for liver volume was 5.5%. Example comparisons are shown in Figure 5 highlighting both cases with high agreement and edge cases where discrepancies may stem from limitations of the stereology reference method. PKD-RRC Validation Validation against the more recent PKD-RRC dataset from University of Kansas Medical Center (n=115), Table 4, also showed excellent agreement between the TraceOrg volume measurements and the ground truth University of Kansas planimetry volume measurements. For the coronal T2 fat saturated images on which the ground truth measurements were performed, the absolute percent difference between TraceOrg and PKD-RRC was 6.1%. A larger error of 11% was observed when comparing TraceOrg measurements from coronal T1 images to ground truth measurements from coronal T2 fat saturated images which reflects the tendency of T1 to underestimate ADPKD kidney volumes and T2 to overestimate ADPKD kidney volumes. Interestingly, the best agreement was between the average TraceOrg measurement for all sequences to the ground truth measurements with a mean absolute percent difference of 4.9%, which may reflect the benefit of eliminating outlier values when calculating the mean of all TraceOrg measurements. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Comparison to Existing Segmentation Models TraceOrg performance on publicly available segmentation models for kidney and liver segmentations is shown in Table 5a, Figure 6. On the external test set, TraceOrg achieved Dice similarity coefficients of 0.93 (left kidney) and 0.94 (right kidney) compared to TotalSegmentator 33 (0.19 left, 0.28 right), MRAnnotator 34 (0.13 left, 0.17 right), Kim 7 (0.30 left, 0.53 right) and Woznicki 32 (0.93 left and 0.93 right). For the Kim 7 model trained on ADPKD images, most of the errors were right-left kidney swaps so Dice improved to 0.81 for TKV but this was still well below TraceOrg performance (0.94). Woznicki performed comparable to TraceOrg on the T2-weighted external validation images but not as well as TraceOrg on the internal validation and Woznicki did not work on other sequences or cysts. For liver segmentation, TraceOrg achieved a Dice of 0.92 on the external test set compared to TotalSegmentator 33 (0.57), MRAnnotator 34 (0.72) and Woznicki 32 (0.87). The Kim 7 model did not segment liver. For kidney cyst segmentation, TraceOrg achieved an average Dice of 0.82, substantially outperforming TotalSegmentator (0.09) and achieving similar performance to the Gregory-Kline model 21 (0.76). Note, however, that the Gregory-Kline model could only be run on coronal T2 images and required a secondary channel kidney mask. No open-source models for liver cyst segmentation were found. Bland-Altman analysis (Supplemental Figure 2) confirmed superior agreement between TraceOrg measurement and ground truth. Discussion In this study, we developed an internet-based tool, TraceOrg, for automatically measuring kidney, liver and cyst volumes on MRI datasets from ADPKD patients. A major finding of this study is that TraceOrg was accurate and performed robustly across a wide range of clinical scenarios. For Dice similarity coefficient which measures the percentage of voxels that are identical between the model output and ground truth, TraceOrg performance was equal or superior to open-source models on T2 and superior to all existing models for T1 and SSFP MR images. For Hausdorff Distance, the largest distance between an erroneous model voxel label and the ground truth, the model also outperformed existing models for kidney and liver, and performed similarly to the Gregory-Kline model 21 for kidney cyst segmentations. No existing models were available for comparing liver cyst labeling performance. Model generalizability was confirmed through multiple external validations using both single center and multicenter datasets, as well as older ADPKD images from CRISP 2 and more recent images from PKD-RRC. Traceorg is based upon deep learning technology and convolutional neural networks with many interconnections somewhat resembling the interconnections of the human brain and visual cortex. The MR images pass from one computer network layer to another gradually forcing greyscale MR images to transform into a simpler image with just 4 shades representing kidney, liver, cyst ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology or background. We used a convolutional neural network, known as U-Net because it performs well at these image segmentation tasks. It has a first set of layers known as “encoder” which gradually transforms the high-resolution MR images into lower resolution images with more dimensions. After encoding, a transformer network allows the model to handle larger anatomic structures and provides more interconnections to absorb the information from our relatively large training dataset. Finally, a decoder restores the original resolution. Many skip connections between the encoder and decoder elements ensure preservation of the high-resolution features. This network is trained by repeatedly inputting MR images and comparing the model output to ground truth annotations of experts. With each iteration of training, known as an epoch, the error between model performance and the ground truth is used to refine millions of interconnection parameters, also known as checkpoints, to gradually improve the model performance. We are routinely training with 1000 epochs and gradually reducing the learning rate with each epoch to ensure convergence and to make sure that big adjustments to the model are only occurring toward the beginning of training. Factors to consider when evaluating the strengths and weaknesses of a deep learning model include the number of patients and scans used for model training, the variety of training data, the target organs it can segment, model architecture, availability of external validation, Dice similarity coefficient for the external validation and public availability. These factors are listed in Table 6 for the deep learning models published for segmenting ADPKD kidneys. The high performance of TraceOrg reflects meeting all of these criteria. TraceOrg has the largest number of participants and the largest number of scans in the training data. TraceOrg trains with a variety of modalities, both CT and MRI and MRI included T2-weighted, T1-weighted, SSFP and DWI images as well as images with contrast enhancement. TraceOrg is a 3D model with an extra dimension of analysis compared to 2D models and segments multiple structures, kidneys, liver and cysts because the training data included all of this information. External validation of TraceOrg was performed on datasets spanning a wide range of imaging protocols and extent of disease. The Dice similarity coefficient was high on these external validations and the agreement with volume measurements for CRISP2 and PKD-RRC participants was also good indicating TraceOrg can be expected to perform well outside of our local institution. Finally, TraceOrg is available to download and run locally and also available as an internet calculator. TraceOrg effectively segments liver cysts performing better in PKD patients with liver cyst volumes exceeding 50 mL (Table 3). Segmenting smaller liver cyst volumes on MRI in ADPKD 22 is challenging because cross-sections of blood vessels and bile ducts can resemble cysts as they are also bright on T2 and dark on T1. When there are few cysts, less than 20, the Dice metric is not reliable, oscillating over a wide range with small label changes involving just a few voxels. For this reason, we focused on examining cases with 20 or more cysts meeting the ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology criteria for polycystic liver disease. Since ground truth segmentation for liver cysts were not available pre inferencing on any dataset, they were created from model-assisted segmentations which is less reliable compared to the other ground truth data created prior to model inferencing. Additionally, liver segmentation was tested on a limited number of CRISP cases because the liver was frequently incompletely imaged in CRISP MRI data. This may have biased our CRISP comparison towards ADPKD patients earlier in the course of their disease with fewer liver cysts. However, Bland Altman analysis showed that model performance was similar across a wide range of TKV. Another limitation of the study is that while CT images were included in the training set, model performance on CT was not specifically validated in this study. So presently it is intended only for MR images but a CT validation is planned for the future. The impact of segmentation accuracy on clinical decision-making was not directly assessed in this study and will be the focus of future research. Future validation for longitudinal analyses assessing the accuracy of TKV growth rate measurements is also planned. Another limitation is that the dataset, while diverse, may not fully capture all possible variations in imaging protocols and patient demographics. Further validation on additional external datasets from different geographical regions and continually retraining as more images become available will help to strengthen the generalizability of the model. Model performance is also inherently dependent on the manual annotation protocol; differences in annotation protocols between training and external validation datasets could affect observed performance. Another practical limitation is that inferencing was not instantaneous due to limitations on our computer resources, so there can be a delay of several hours between uploading images and obtaining results depending upon internet speed, case backlog, number of pulse sequences and other parameters. As the employed computer infrastructure improves, these delays are expected to diminish. In conclusion, TraceOrg provides a practical, robust, accurate and scalable solution for the segmentation of abdominal organs and cysts in ADPKD. The model is easy to use, generalizable and outperforms existing publicly available models, making it suitable for both clinical and multicenter research use. Future work will expand the model to include additional anatomical structures, imaging modalities, and longitudinal analysis of TKV growth rates to determine the clinical impact of the segmentation results. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Acknowledgments The data from CRISP study conducted by the study investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) were supplied by NIDDK Central Repository (NIDDK-CR). This manuscript was not prepared under the auspices of the CRISP study and does not necessarily reflect the opinions or views of the CRISP study, NIDDK- CR, or NIDDK. A version of this work was previously presented in abstract form at the 4th Annual Virtual PKD RRC Symposium, March 2025 and portions of this work were presented during a virtual roundtable discussion hosted by PKD RRC on May 5, 2025. Because Dr. Alan S.L. Yu is a Deputy Editor of JASN, he was not involved in the peer-review process for this manuscript. Another editor oversaw the peer-review and decision-making process for this manuscript. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Supplemental Material Table of Contents Model Training Glossary of Terms and Abbreviations Supplemental Table 1. Interobserver variability of TraceOrg internal validations. Supplemental Figure 1. Model Architecture Supplemental Figure 2. Bland Altman Plots of TraceOrg comparisons to publicly available models ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology References: 1. Chapman AB, Devuyst O, Eckardt KU, et al. Autosomal-dominant polycystic kidney disease (ADPKD): executive summary from a Kidney Disease: Improving Global Outcomes (KDIGO) Controversies Conference. Kidney Int. Jul 2015;88(1):17-27. doi:10.1038/ki.2015.59 2. Zhang ZY, Wang ZM, Huang Y. Polycystic liver disease: Classification, diagnosis, treatment process, and clinical management. World J Hepatol. Mar 27 2020;12(3):72-83. doi:10.4254/wjh.v12.i3.72 3. Norcia LF, Watanabe EM, Hamamoto Filho PT, et al. Polycystic Liver Disease: Pathophysiology, Diagnosis and Treatment. Hepat Med. 2022;14:135-161. doi:10.2147/HMER.S377530 4. Fick-Brosnahan GM, Belz MM, McFann KK, Johnson AM, Schrier RW. Relationship between renal volume growth and renal function in autosomal dominant polycystic kidney disease: a longitudinal study. Am J Kidney Dis. Jun 2002;39(6):1127- 1134. doi:10.1053/ajkd.2002.33379 5. Sedman A, Bell P, Manco-Johnson M, et al. Autosomal dominant polycystic kidney disease in childhood: a longitudinal study. Kidney Int. Apr 1987;31(4):1000-1005. doi:10.1038/ki.1987.98 6. Kistler AD, Poster D, Krauer F, et al. Increases in kidney volume in autosomal dominant polycystic kidney disease can be detected within 6 months. Kidney Int. Jan 2009;75(2):235-241. doi:10.1038/ki.2008.558 7. Kim Y, Tao C, Kim H, Oh GY, Ko J, Bae KT. A Deep Learning Approach for Automated Segmentation of Kidneys and Exophytic Cysts in Individuals with Autosomal Dominant Polycystic Kidney Disease. J Am Soc Nephrol. Aug 2022;33(8):1581-1589. doi:10.1681/ASN.2021111400 8. Torres VE, Chapman AB, Devuyst O, et al. Tolvaptan in patients with autosomal dominant polycystic kidney disease. N Engl J Med. Dec 20 2012;367(25):2407-2418. doi:10.1056/NEJMoa1205511 9. Torres VE, Ahn C, Barten TRM, et al. KDIGO 2025 clinical practice guideline for the evaluation, management, and treatment of autosomal dominant polycystic kidney disease (ADPKD): executive summary. Kidney Int. Feb 2025;107(2):234-254. doi:10.1016/j.kint.2024.07.010 10. Irazabal MV, Rangel LJ, Bergstralh EJ, et al. Imaging classification of autosomal dominant polycystic kidney disease: a simple model for selecting patients for clinical trials. J Am Soc Nephrol. Jan 2015;26(1):160-172. doi:10.1681/ASN.2013101138 11. Cigna. Cigna National Formulary Coverage - Policy:Tolvaptan Products – Jynarque Prior Authorization Policy. 2024. https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coverageposition criteria_tolvaptan_products_jynarque_pa.pdf 12. Demoulin N, Nicola V, Michoux N, et al. Limited Performance of Estimated Total Kidney Volume for Follow-up of ADPKD. Kidney Int Rep. Nov 2021;6(11):2821-2829. doi:10.1016/j.ekir.2021.08.013 13. Sharma K, Caroli A, Quach LV, et al. Kidney volume measurement methods for clinical studies on autosomal dominant polycystic kidney disease. PLoS One. 2017;12(5):e0178488. doi:10.1371/journal.pone.0178488 ACCEPTED https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coveragepositioncriteria_tolvaptan_products_jynarque_pa.pdf https://static.cigna.com/assets/chcp/pdf/coveragePolicies/cnf/cnf_626_coveragepositioncriteria_tolvaptan_products_jynarque_pa.pdf Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 14. Dev H, Zhu C, Sharbatdaran A, et al. Effect of Averaging Measurements From Multiple MRI Pulse Sequences on Kidney Volume Reproducibility in Autosomal Dominant Polycystic Kidney Disease. J Magn Reson Imaging. Oct 2023;58(4):1153- 1160. doi:10.1002/jmri.28593 15. Goel A, Shih G, Riyahi S, et al. Deployed Deep Learning Kidney Segmentation for Polycystic Kidney Disease MRI. Radiol Artif Intell. Mar 2022;4(2):e210205. doi:10.1148/ryai.210205 16. Raj A, Tollens F, Hansen L, et al. Deep Learning-Based Total Kidney Volume Segmentation in Autosomal Dominant Polycystic Kidney Disease Using Attention, Cosine Loss, and Sharpness Aware Minimization. Diagnostics (Basel). May 7 2022;12(5)doi:10.3390/diagnostics12051159 17. Kline TL, Korfiatis P, Edwards ME, et al. Performance of an Artificial Multi- observer Deep Neural Network for Fully Automated Segmentation of Polycystic Kidneys. J Digit Imaging. Aug 2017;30(4):442-448. doi:10.1007/s10278-017-9978-1 18. Taylor J, Thomas R, Metherall P, Ong A, Simms R. MO012: Development of an Accurate Automated Segmentation Algorithm to Measure Total Kidney Volume in ADPKD Suitable for Clinical Application (The Cystvas Study). Nephrology Dialysis Transplantation. 2022;37(Supplement_3)doi:10.1093/ndt/gfac061.007 19. He X, Hu Z, Dev H, et al. Test Retest Reproducibility of Organ Volume Measurements in ADPKD Using 3D Multimodality Deep Learning. Acad Radiol. Mar 2024;31(3):889-899. doi:10.1016/j.acra.2023.09.009 20. Kline TL, Edwards ME, Fetzer J, et al. Automatic semantic segmentation of kidney cysts in MR images of patients affected by autosomal-dominant polycystic kidney disease. Abdom Radiol (NY). Mar 2021;46(3):1053-1061. doi:10.1007/s00261- 020-02748-4 21. Gregory AV, Chebib FT, Poudyal B, et al. Utility of new image-derived biomarkers for autosomal dominant polycystic kidney disease prognosis using automated instance cyst segmentation. Kidney Int. Aug 2023;104(2):334-342. doi:10.1016/j.kint.2023.01.010 22. Chookhachizadeh Moghadam M, Aspal M, He X, et al. Deep learning-based liver cyst segmentation in MRI for autosomal dominant polycystic kidney disease. Radiology Advances. 2024;1(2)doi:10.1093/radadv/umae014 23. van Gastel MDA, Edwards ME, Torres VE, Erickson BJ, Gansevoort RT, Kline TL. Automatic Measurement of Kidney and Liver Volumes from MR Images of Patients Affected by Autosomal Dominant Polycystic Kidney Disease. J Am Soc Nephrol. Aug 2019;30(8):1514-1522. doi:10.1681/ASN.2018090902 24. Sharbatdaran A, Romano D, Teichman K, et al. Deep Learning Automation of Kidney, Liver, and Spleen Segmentation for Organ Volume Measurements in Autosomal Dominant Polycystic Kidney Disease. Tomography. Jul 13 2022;8(4):1804-1819. doi:10.3390/tomography8040152 25. Zhu C, He X, Blumenfeld JD, et al. A Primer for Utilizing Deep Learning and Abdominal MRI Imaging Features to Monitor Autosomal Dominant Polycystic Kidney Disease Progression. Biomedicines. May 20 2024;12(5)doi:10.3390/biomedicines12051133 ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 26. Winterbottom J, Simms RJ, Caroli A, et al. Flank pain has a significant adverse impact on quality of life in ADPKD: the CYSTic-QoL study. Clin Kidney J. Nov 2022;15(11):2063-2071. doi:10.1093/ckj/sfac144 27. Trillini M, Caroli A, Perico N, et al. Effects of Octreotide-Long-Acting Release Added-on Tolvaptan in Patients with Autosomal Dominant Polycystic Kidney Disease: Pilot, Randomized, Placebo-Controlled, Cross-Over Trial. Clin J Am Soc Nephrol. Feb 1 2023;18(2):223-233. doi:10.2215/CJN.0000000000000049 28. Caroli A, Perico N, Perna A, et al. Effect of longacting somatostatin analogue on kidney and cyst growth in autosomal dominant polycystic kidney disease (ALADIN): a randomised, placebo-controlled, multicentre trial. Lancet. Nov 2 2013;382(9903):1485- 1495. doi:10.1016/S0140-6736(13)61407-5 29. Chapman AB, Guay-Woodford LM, Grantham JJ, et al. Renal structure in early autosomal-dominant polycystic kidney disease (ADPKD): The Consortium for Radiologic Imaging Studies of Polycystic Kidney Disease (CRISP) cohort. Kidney Int. Sep 2003;64(3):1035-1045. doi:10.1046/j.1523-1755.2003.00185.x 30. Bae KT, Tao C, Feldman R, et al. Volume Progression and Imaging Classification of Polycystic Liver in Early Autosomal Dominant Polycystic Kidney Disease. Clin J Am Soc Nephrol. Mar 2022;17(3):374-384. doi:10.2215/CJN.08660621 31. Lepping RJ, Karcher, R. T., Keselman, P., Wallace, D., Yu, A., Martin, L. E., Brooks, W. M. Inter-rater reliability and translational implications of MR-based polycystic kidney volume measurements by stereology at early and late stage disease. presented at: International Society of Magnetic Resonance in Medicine (ISMRM) Annual Meeting; April 2017; Honolulu, HI. 32. Woznicki P, Siedek F, van Gastel MDA, et al. Automated Kidney and Liver Segmentation in MR Images in Patients with Autosomal Dominant Polycystic Kidney Disease: A Multicenter Study. Kidney360. Dec 29 2022;3(12):2048-2058. doi:10.34067/KID.0003192022 33. Akinci D'Antonoli T, Berger LK, Indrakanti AK, et al. TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI. Radiology. Feb 2025;314(2):e241613. doi:10.1148/radiol.241613 34. Zhou A, Liu Z, Tieu A, et al. MRAnnotator: multi-anatomy and many-sequence MRI segmentation of 44 structures. Radiology Advances. 2024;2(1)doi:10.1093/radadv/umae035 35. Wasserthal J, Breit HC, Meyer MT, et al. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell. Sep 2023;5(5):e230024. doi:10.1148/ryai.230024 36. Sharma K, Rupprecht C, Caroli A, et al. Automatic Segmentation of Kidneys using Deep Learning for Total Kidney Volume Quantification in Autosomal Dominant Polycystic Kidney Disease. Sci Rep. May 17 2017;7(1):2049. doi:10.1038/s41598-017- 01779-0 37. Bevilacqua V, Brunetti A, Cascarano GD, et al. A comparison between two semantic deep learning frameworks for the autosomal dominant polycystic kidney disease segmentation based on magnetic resonance images. BMC Med Inform Decis Mak. Dec 12 2019;19(Suppl 9):244. doi:10.1186/s12911-019-0988-4 ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology 38. Shin TY, Kim H, Lee JH, et al. Expert-level segmentation using deep learning for volumetry of polycystic kidney and liver. Investig Clin Urol. Nov 2020;61(6):555-564. doi:10.4111/icu.20200086 39. Jagtap JM, Gregory AV, Homes HL, et al. Automated measurement of total kidney volume from 3D ultrasound images of patients affected by polycystic kidney disease and comparison to MR measurements. Abdom Radiol (NY). Jul 2022;47(7):2408-2419. doi:10.1007/s00261-022-03521-5 40. Li D, Xiao C, Liu Y, et al. Deep Segmentation Networks for Segmenting Kidneys and Detecting Kidney Stones in Unenhanced Abdominal CT Images. Diagnostics (Basel). Jul 23 2022;12(8)doi:10.3390/diagnostics12081788 41. Cui HaM, Yiyi and Yang, Ming and Lu, Yang and Zhang, Mingzi and Fu, Lili and Fu, Chicheng and Su, Beilin and He, Chuan and Xue, Cheng and Mei, Changlin and Song, Shuwei. Automatic Segmentation of Kidney Volume Using Multi-Module Hybrid Based U-Shape in Polycystic Kidney Disease. IEEE Access. 2023;11:58113-58124. doi:10.1109/ACCESS.2023.3284029 42. Taylor J, Thomas R, Metherall P, et al. An Artificial Intelligence Generated Automated Algorithm to Measure Total Kidney Volume in ADPKD. Kidney Int Rep. Feb 2024;9(2):249-256. doi:10.1016/j.ekir.2023.10.029 43. Hsu JL, Singaravelan A, Lai CY, et al. Applying a Deep Learning Model for Total Kidney Volume Measurement in Autosomal Dominant Polycystic Kidney Disease. Bioengineering (Basel). Sep 26 2024;11(10)doi:10.3390/bioengineering11100963 44. Krishnan C, Schmidt E, Onuoha E, Mrug M, Cardenas CE, Kim H. nnUNet for Automatic Kidney and Cyst Segmentation in Autosomal Dominant Polycystic Kidney Disease. Curr Med Imaging. 2024;20:e15734056272767. doi:10.2174/0115734056272767231130110017 45. Sore R, Cathier P, Vlachomitrou AS, et al. Deep learning-based segmentation of kidneys and renal cysts on T2-weighted MRI from patients with autosomal dominant polycystic kidney disease. Eur Radiol Exp. Oct 30 2024;8(1):122. doi:10.1186/s41747- 024-00520-7 46. Sheng TW, Onthoni DD, Gupta P, Lee TH, Sahoo PK. Segmentation of ADPKD Computed Tomography Images with Deep Learning Approach for Predicting Total Kidney Volume. Biomedicines. Jan 22 2025;13(2)doi:10.3390/biomedicines13020263 ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 1. Demographic, physical and imaging characteristics of the participants included in the training, internal and external validation datasets. For participants with more than one MRI examination, all measurements are based on the values at the time of the baseline scan. For normally distributed data mean ± standard deviation is reported; otherwise, median and (interquartile range) are reported. Parameter Training Data ** Internal Validation External Validation Kidneys & Liver Cysts Kidneys & Liver Cysts Single- center Multi-center CRISP 2 PKD-RRC Number 720 395 70 46 58 73 30 115 Age (years) 48 ± 15 55 ± 16 53 ± 14 39 ± 6.7 44 ± 11 37 ± 8.1 38 ± 11 22 ± 7 Gender (F:M) 374:346 158:117 38:32 20:0 29:29 39:34 22:8 68:47 Height (m) 1.71 ± 0.1 1.69 ± 0.11 1.71 ± 0.12 1.65 ± 0.08 1.73±0.09 1.70 ± 0.1 1.68 ± 0.09 1.68 ± 0.15 Weight (kg) 77 ± 18 77 ± 21 75 ± 17 70 ± 12 71 ±15 73 ± 14 72 ± 17 70 ± 23 Body mass index (kg/m 2 ) 26 ± 5.3 27 ± 6.1 26 ± 4.2 26 ± 3.9 24 ± 3.8 25 ± 3.7 26 ± 6.3 24 ± 5.8 GFR* (ml/min/1.73m 2 ) 74 ± 32 70 ± 31 60 ± 28 87 ± 23 83 ± 27* 76 ± 38* 119 ±26 ht-TKV* (mL/m) 826 ± 772 910 ± 904 1176 ± 900 703 ± 397 894 ± 524 1097 ± 674 703 ± 409 432 ± 297 ht-kidney cyst volume* (mL/m) N/A 81(13:512) N/A 369 ± 361 624 ± 436 733 ± 552 476 ± 386 N/A * Glomerular Filtration Rate (GFR) is measured by iothalamate clearance for CRISP 2 dataset and by iohexol clearance for multi- center dataset and estimated by Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) function for the other datasets. ht- TKV stands for height adjusted total kidney volume; ht-kidney cyst volume stands for height adjusted kidney cyst volume. ** Inside training data for kidneys and liver, 2 participants’ demographic data, 2 participants’ heights, 1 participant’s weight, and 7 participants’ total kidney volumes (TKV) were not available. Inside training data for cysts, 120 participants were anonymized without any demographic data and 1 participant’s height were not available. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 2a. TraceOrg internal validation for kidney and liver segmentations for individual pulse sequences. Values are averaged over all sequences. The evaluation metrics used were the Dice similarity coefficient, Average Symmetric Surface Distance (ASSD), Hausdorff Distance, and Jaccard Index. Model performance compared to Average of all sequences Axial T2 Axial T1 Axial SSFP Coronal T2 Coronal SSFP Kidney Dice 0.98 ± 0.02 0.97 ± 0.02 0.98 ± 0.02 0.97 ± 0.02 0.97 ± 0.02 0.97 ± 0.02 ASSD[mm] 0.77 ± 0.67 1.1 ± 2.0 0.79 ± 0.68 0.97 ± 1.2 0.85 ± 0.86 0.91 ± 1.2 Hausdorff[mm] 28 ± 18 28 ± 21 26 ± 16 28 ± 22 27 ± 19 28 ± 20 Jaccard 0.95 ± 0.04 0.95 ± 0.04 0.95 ± 0.04 0.94 ± 0.04 0.95 ± 0.04 0.95 ± 0.04 Liver Dice 0.98 ± 0.01 0.98 ± 0.01 0.98 ± 0.01 0.97 ± 0.02 0.96 ± 0.02 0.97 ± 0.02 ASSD[mm] 0.55 ± 0.34 0.81 ± 0.51 0.78 ± 0.39 0.78 ± 0.42 1.2 ± 0.73 0.81 ± 0.53 Hausdorff[mm] 22 ± 10 24 ± 10 26 ± 14 25 ± 22 34 ± 37 26 ± 21 Jaccard 0.96 ± 0.02 0.96 ± 0.02 0.95 ± 0.02 0.95 ± 0.03 0.93 ± 0.03 0.95 ± 0.03 ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 2b. TraceOrg internal validation for kidney cyst and liver cyst segmentations. Model segmentations on T2-weighted MRI pulse sequences are compared to manual corrections performed independently by 2 observers calculating the Dice similarity coefficient, Average Symmetric Surface Distance (ASSD), Hausdorff Distance, and Jaccard Index. Also shown is the Inter-observers mean pairwise average showing that the differences between each observer and the model are similar to the difference between observers. Model performance compared to Average of all observers Inter-observers Mean Pairwise average Observer 1 Observer 2 Kidney Cysts Dice 0.94 ± 0.05 0.93 ± 0.06 0.93 ± 0.06 0.94 ± 0.07 ASSD[mm] 0.44 ± 0.35 0.52 ± 0.39 0.48 ± 0.37 0.50 ± 0.48 Hausdorff[mm] 16 ± 7.4 19 ± 13 17 ± 11 18 ± 15 Jaccard 0.88 ± 0.09 0.87 ± 0.10 0.88 ± 0.10 0.89 ± 0.11 Liver Cysts* Dice 0.86 ± 0.24 0.79 ± 0.27 0.82 ± 0.26 0.86 ± 0.19 ASSD[mm] 2.4 ± 7.4 3.0 ± 6.2 2.7 ± 6.8 2.6 ± 6.0 Hausdorff[mm] 31 ± 32 43 ± 38 37 ± 35 37 ± 37 Jaccard 0.80 ± 0.26 0.72 ± 0.30 0.76 ± 0.28 0.79 ± 0.24 ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 3. TraceOrg validation on the external single-center and multicenter datasets. Dice similarity coefficient, Average Symmetric Surface Distance (ASSD), Hausdorff Distance, and Jaccard Index were calculated. Single Center (n=58) Kidney Liver Metric Coronal T2-w Coronal T1-w Steady State Free Precession Coronal T2-w Coronal T1-w Steady State Free Precession Dice 0.93 ± 0.03 0.94 ± 0.02 0.92 ± 0.04 0.87 ± 0.13 0.96 ± 0.01 0.92 ± 0.04 ASSD[mm] 1.7 ± 0.67 1.3 ± 0.35 1.7 ± 0.80 3.8 ± 1.6 1.2 ± 0.21 2.1 ± 1.2 Hausdorff[mm] 21 ± 10 19 ± 6.6 20 ± 7 31 ± 11 23 ± 8.2 25 ± 7.6 Jaccard 0.88 ± 0.05 0.89 ± 0.03 0.86 ± 0.06 0.78 ± 0.13 0.92 ± 0.01 0.86 ± 0.07 Single Center (n=58) Multi Center (n=73) Kidney Cysts Liver Cysts* Kidney Kidney Cysts Metric Coronal T2-w (n=58) Coronal T2-w, Vcyst > 50 mL, Ncyst > 20 (n= 22) Coronal T2-w, Vcyst < 50 mL, Ncyst > 20 (n= 20) Coronal T2-w, all cases (n=51) Coronal T2-w (n=73) Coronal T2-w (n=73) Dice 0.85 ± 0.06 0.90 ± 0.04 0.76 ± 0.19 0.79 ± 0.17 0.95 ± 0.02 0.81 ± 0.08 ASSD[mm] 1.3 ± 0.5 0.75 ± 0.31 2.2 ± 1.2 3.0 ± 6.0 0.95 ± 0.45 1.5 ± 0.48 Hausdorff[mm] 26 ± 13 35 ± 18 49 ± 30 48 ± 31 20 ± 11 23 ± 9 Jaccard 0.74 ± 0.09 0.83 ± 0.07 0.62 ± 0.17 0.68 ± 0.19 0.91 ± 0.03 0.68 ± 0.10 * For liver cysts performance, Vcyst stands for total liver cysts volume. Ncyst stands for number of cysts. Validation focused on polycystic liver diseases cases with Ncyst > 20 2 . Results for all cases including cases with Ncyst < 20 are reported for reference. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 4. TraceOrg external Validation on CRISP 2 (n=30) and PKD-RRC (U of Kansas (n=115). Total Kidney Volume Kidney Cysts Liver dataset Coronal T1 Coronal T2-fatsat Mean of All Sequences* Axial/Coronal T2-w Coronal T1-w/T2-w CRISP (Kidney =30, Liver =22*) TraceOrg Volume 1260 ± 695 1374 ± 781 1324 ± 739 815 ± 674 1539 ± 175 CRISP 2 Stereology Volume 1272 ± 736 see Coronal T1 see Coronal T1 806 ± 671 1530 ± 227 Mean |% difference| 5.3 ± 3.6 8.7** ± 4.9 6.0*** ± 4.4 11 ± 11 5.5 ± 5.1 PKD-RRC (– 115 participants with 225 scans at age >18 years and 56 scans at age <18 years) TraceOrg Volume 891 ± 672 936 ± 771 911 ± 747 PKD-RRC Volume see T2 fatsat 929 ± 769 925 ± 767 Mean |% difference| 11 ± 11 6.1 ± 7.7 4.9 ± 4.9 *Among the 30 participants, 22 had complete sequences for liver validation. ** This mean absolute percent difference is calculated between the TraceOrg measurements from Coronal T2 fat saturated images and the CRISP2 stereology volume measurements made from Coronal T1 images. *** CRISP mean of all sequences included more bad sequences which is why it is worse. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 5a. Performance comparison between TraceOrg and existing models for the segmentation of left kidney, right kidney, and liver with mean Dice and Hausdorff distance calculated as the mean of comparisons with ground truth manual tracing on the internal and external validation datasets for kidney and liver. Left Kidney Right Kidney Total Kidney Liver Dice Hausdorff [mm] Dice Hausdorff [mm] Dice Hausdorff [mm] Dice Hausdorff [mm] Internal Validation TraceOrg 0.97 ± 0.02 24 ± 15 0.94 ± 0.15 24 ± 19 0.97 ± 0.02 28 ± 20 0.97 ± 0.02 26 ± 21 TotalSegmentator 33 0.21 ± 0.20 68 ± 45 0.25 ± 0.23 64 ± 47 0.22 ± 0.20 88 ± 58 0.69 ± 0.26 65 ± 42 MRAnnotator 34 0.18 ± 0.24 120 ± 68 0.20 ± 0.26 100 ± 60 0.19 ± 0.23 129 ± 60 0.69 ± 0.30 157 ± 102 Kim 7 * 0.15 ± 0.17 267 ± 54 0.31 ± 0.18 286 ± 38 0.49 ± 0.23 218 ± 51 N/A N/A Woznicki 32 * 0.92 ± 0.13 34 ± 43 0.92 ± 0.09 41 ± 48 0.91 ± 0.14 54 ± 55 0.96 ± 0.03 45 ± 69 External Validation TraceOrg 0.93 ± 0.04 19 ± 20 0.94 ± 0.04 19 ± 16 0.94 ± 0.03 20 ± 9 0.92 ± 0.08 26 ± 10 TotalSegmentator 0.19 ± 0.20 52 ± 32 0.28 ± 0.26 41 ± 27 0.23 ± 0.22 64 ± 38 0.57 ± 0.30 47 ± 29 MRAnnotator 34 0.13 ± 0.18 117 ± 55 0.17 ± 0.23 99 ± 59 0.14 ± 0.18 130 ± 55 0.72 ± 0.29 91 ± 66 Kim 7 * 0.30 ± 0.22 187 ± 53 0.53 ± 0.15 199 ± 39 0.81 ± 0.15 134 ± 37 N/A N/A Woznicki 32 * 0.93 ± 0.08 18 ± 16 0.93 ± 0.05 18 ± 14 0.94 ± 0.06 21 ± 14 0.87 ± 0.11 42 ± 40 * Kim and Woznicki models were trained exclusively on T2-w images and did not work on other sequences so this comparative analysis was restricted to T2-w images to show the most favorable performance for those models. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 5b. Performance comparison between TraceOrg and existing models for kidney cyst segmentation on the internal and external validation datasets for cysts. Mean Dice, Hausdorff distance, and asymmetrical surface distance (ASSD) are reported. Kidney Cysts Dice Hausdorff [mm] ASSD Internal Validation TraceOrg 0.94 ± 0.05 17 ± 11 0.48 ± 0.37 TotalSegmentator 35 0.07 ± 0.07 78 ± 18 21 ± 9 Gregory-Kline 20 0.74 ± 0.14 23 ± 20 1.97 ± 1.17 External Validation TraceOrg 0.82 ± 0.07 24 ± 11 1.39 ± 0.49 TotalSegmentator 35 0.09 ± 0.05 69 ± 15 20 ± 5 Gregory-Kline 20 0.76 ± 0.10 23 ± 9 1.44 ± 0.34 *Gregory-Kline model could only run coronal T2-weighted images and a kidney mask was a required input. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Table 6. Summary of Deep Learning models for ADPKD kidney segmentations. Note that TraceOrg has the largest number of patients and scans used for training data, the most modalities used for training, works on the most targets (kidney liver cysts), uses a combination 3D U-Net transformer architecture, provides external validation showing high Dice and is publicly available. Model 1st Author Year Training Data Target Modality Model Architecture Validation Type(s)* Dice Similarity Coeff Public Availability of Model Pts Scans Kidney Cyst Kline 17 2017 2000 Kidney MRI – T2 U-Net Internal 0.96 No Sharma 36 2017 125 165 Kidney CT CNN Internal 0.86 No Bevilacqua 37 2019 18 526 Kidney MRI – T2 CNN Internal 0.85-0.87 No Van Gastel 23 2019 440 Kidney, Liver MRI – T2 CNN Internal 0.96 No Shin 38 2020 175 Kidney CT V-Net Internal 0.96 ? GregoryKline 20 2021 40 40 Kidney cysts MRI - T2 U-Net Internal 0.85 Yes Goel 15 2022 129 213 Kidney MRI – T2 U-Net External 0.98 Yes-TraceOrg precursor Jagtap 39 2022 22 132 Kidney Ultrasound 2D U-Net Internal 0.8 No Kim 7 2022 157 157 Kidney+ ExophyticCysts MRI- Cor T2 3D U-Net Internal 0.96 0.95- Exophytic Yes Li 40 2022 260 K+Stones CT Two-Stage** External 0.82-0.97 No Raj 16 2022 100 Kidney, Cysts MRI-T1 U-Net Internal 0.92 No Sharbatdaran 24 2022 151 Kidney, Liver, Spleen MRI-T2 2D U-Net External 0.96-0.98 On Request Woznicki 32 2022 327 992 Kidney, Liver MRI-T2 nn-U-net External 0.92-0.97 Yes ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Cui 41 2023 355 Kidney MRI-T1 HU-Net Internal 0.92 No Taylor 42 2024 227 Kidney MRI nnU-Net External 0.96 No He 19 2024 413 Kidney, Liver, Spleen MRI, CT nnU-Net External Test-retest 0.98 |%diff|=1.3 Yes-TraceOrg precursor Hsu 43 2024 40 Kidney MRI – T2 3D U-Net Internal 0.82-0.89 On Request (Data) Krishnan 44 2024 604 Kidney, Cysts MRI U-Net Internal 0.92-0.95 0.83-0.89 No Sore 45 2024 160 Kidney, Cysts MRI – T2 U-Net Internal 0.93-0.94 0.86-0.87 On Request (Data) Sheng 46 2025 97 160 Kidney CT DeepLabV3+ Internal 0.96 No TraceOrg 2025 720 5052 Kidney, liver, cysts CT MRI T1 SSFP T2 3D U-Net and Transformer External 0.92-0.95 0.82 Yes *External validation implies internal validation was also performed. ** Two-stage dependent segmentation framework using interchangeable models (e.g., 3D U-Net, Res U-Net, DeepLabV3+, UNETR) ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure Legends Figure 1. Participant Flow Chart showing the number of CT scans with (C+) and without (C-) contrast enhancement and number of each MRI sequence included in the training, internal validation, external validation data sets as well as from CRISP and PKD-RRC. Publicly available models compared to TraceOrg for segmentation performance (lower right) are color coded to indicate the MR sequences each comparator model was able to run. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure 2. TraceOrg deep learning model architecture. A stack of CT or MRI input images, too large for the memory to handle, enters the model in two ways, first (upper pathway) the images are down sampled to a lower resolution that fits within the computer memory to learn the global context. Second (lower pathway), the images are cropped down to small 3D patches that fit within computer memory. After processing by global and patch encoders respectively, processed images are combined in the transformer, then decoded to restore the full resolution. After output an organ completion quality assurance check ensures the entire organ is within the field of view, a FAIL on this case results from kidney lower poles being off the bottom edge of the images. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure 3. TraceOrg calculator website interface. The top two figures are the landing page (left) and the upload page (right). The bottom two figures are an example of the TraceOrg output with selected tile segmentation snapshots of each series, 25%, 50% and 75% of the way through the stack of images (left) and volume report for organ segmentations (top right), cyst segmentations (middle right) and TKV on the Mayo Clinic Image Classification plot (bottom right). ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure 4. Examples of external validation results showing TraceOrg model performance compared to human labeled ground truth. Top right shows different annotation protocols used in the training dataset and the external testing datasets on kidneys. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure 5. Example of comparisons between volumes resulting from TraceOrg segmentations (left kidney in green, right kidney in red, liver in yellow, and kidney cysts in pink) and volumes computed by stereology in the CRISP 2 Study 29 . Top row shows 3 CRISP 2 participants for whom the TraceOrg measurement agree well with CRISP stereology volume measurements. Bottom row shows three examples with excellent TraceOrg annotations but some disagreement raising the possibility that discrepancies may be due to limitations of the stereology reference. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology Figure 6. TraceOrg model (top left) shows excellent segmentation performance for liver (yellow), left kidney (green) and right kidney (red) as well as cysts (pink-bottom left) compared to poorer performance of TotalSegmentator 35 (top middle and bottom middle), MRAnnotator 34 (top right), Kim 7 (second row middle images) where the exophytic cysts (blue in second row right image, the model output) are remapped as kidneys (second row middle image) to calculate TKV and Gregory-Kline model 20 (bottom right). TraceOrg demonstrated Dice = 0.99 (kidney), 0.99 (liver), 0.87 (kidney cyst) in this case surpassing all other models although Woznicki (middle left) performed nearly as well for liver and kidneys. ACCEPTED Copyright © 2025 The Author(s). Published by Wolters Kluwer Health, Inc. on behalf of the American Society of Nephrology ACCEPTED Disclosure Supplement.pdf Bazojoo_Vahid_Journal_Disclosure_09032025_184125 Blumenfeld_Jon_Journal_Disclosure_08252025_134107 Caroli_Anna_Journal_Disclosure_08262025_105400 Chevalier disclosure JASN Davoudi_Vahid_Journal_Disclosure_08252025_123105 Dev_Hreedi_Journal_Disclosure_09042025_230738 He_Xinzi_Journal_Disclosure_09042025_123915 Hu_Zhongxiu_Journal_Disclosure_09032025_130608 Lepping_Rebecca_Journal_Disclosure_09082025_121507 Min_Robert_Journal_Disclosure_09082025_151417 Moghadam_Mina_Journal_Disclosure_08252025_171855 Pasini_Siria_Journal_Disclosure_09042025_035645 Prince_Martin_Journal_Disclosure_09032025_123858 Prince_Serena_Journal_Disclosure_09042025_202347 Sabuncu_Mert_Journal_Disclosure_10022025_165315 Sattar_Usama_Journal_Disclosure_09042025_235220 Scalco_Elisa_Journal_Disclosure_09042025_031424 Shen_Mengjun_Journal_Disclosure_08252025_130610 Shih_Sophie_Journal_Disclosure_08252025_130409 Shimonov_Daniil_Journal_Disclosure_09102025_101249 Xiong_Qing_Journal_Disclosure_09032025_131406 Yu_Alan_Journal_Disclosure_08252025_123654 Zhu_Chenglin_Journal_Disclosure_08272025_145152