br Fig plots for each D OAR
Fig. 3 plots for each 3D OAR separately the extent of the correc-tions made to the automated delineations versus IOV of the man-ual delineations, both quantified by ASSD, which shows that the accuracy of the automated delineation tool was better than the manual delineation variability for most OARs (grey shaded area; RO1: 93% of OARs; RO2: 72%). Two sets of outliers can be observed in Fig. 3. A first set of outliers, in the top left corner, involves few automated delineations that needed relatively large corrections (between 4 and 6 mm on average) for some individual OARs, namely one left-sided submandibular gland (1), one left-sided par-otid gland (2), one inferior PCM (3) and one supraglottic ASP1517 (4). A second set of outliers, in the bottom right corner of Fig. 3, involves the spinal cord, which shows a large IOV between the manual delineations and large differences in the corrections made by RO2. Upon inspection of the contours, it was found that differ-ent cranial and caudal borders were used by the ROs.
Table 3 represents IOV (intra and inter) as assessed by DSC and ASSD. IOV for the corrected delineations is significantly lower than IOV for the manual delineations for almost all OARs. The left cochlea did not show a significant reduction in IOV with DSC and both cochleae showed no improvement with ASSD. The oral cavity and brainstem showed no significant improvement in DSC but
Evaluation of accuracy of CNN based automated OAR delineation as perceived by each observer (Acc1, Acc2). The automated delineation is compared to the corrected delineation made by each observer (RO1, RO2) for each 3D OAR separately by computing their DSC and ASSD. Volumes for manual, automated and corrected OAR delineations are reported as well. All values are reported as mean ± STD for all patients (n = 15) and for both observers for manual and corrected volumes.
Abbreviations: OARs: organs at risk; DSC: Dice similarity coefficient; ASSD: average symmetric surface distance; STD: standard deviation; RO: radiation oncologist; PCM:
pharyngeal constrictor muscles; PG: parotid gland; SG: submandibular gland; U: upper; S: supra.
ASSD reduced significantly. For most OARs, IOV for the corrected delineations is smaller than intra-observer variability.
The computation time needed by the CNN to automatically delineate all 16 OARs was 3 minutes per image . The time needed to review and correct the automated delineations was for both ROs significantly shorter than the time needed for manual delineation (RO1: 17 vs 30 minutes; RO2: 27 vs 38 minutes; mean: 23 vs 34 minutes, p < 10–6). Manual delineation time per patient ranged from 22 to 44 minutes, correction time ranged from 13 to 33 min, and time gain ranged from 6 to 19 min (33% on average). For an average correction time of 23 min for 16 OARs, this means that less than 1.5 min was spend on each OAR.
The aim of this study was to evaluate the benefits for clinical practice of the use of an automated delineation tool with respect to delineation accuracy, efficiency and reduction of IOV. Auto-mated delineation was performed by a CNN, which was trained using manual expert delineations of 16 OARS of HNC patients in agreement with international consensus guidelines of Brouwer et al.  and Christianen et al. .
Increased delineation efficiency by the use of the tool was demonstrated by the reduction in the time needed to review and correct the automated delineations versus the time needed for manual delineation. Even though one of the ROs was faster in both manual delineation and correction of automated delineations, both ROs delineated faster using the automated delineations.
The accuracy of the CNN was examined by comparing the auto-mated delineations to the corrected delineations. The corrections necessary for clinical acceptance were smaller than those observed for atlas-based models . Scatter due to dental fillings did not impact the accuracy as illustrated in Fig. 2, likely because such artefacts were also present in some of the images in the training set. Fig. 3 shows that for most OARs, the differences between the automated and the corrected delineations were smaller than inter-observer variability obtained with manual delineation, i.e. the delineation variability as typically observed in clinical practice. There were four outliers, where the CNN sporadically showed dif-ficulty with adenopathies or primary tumours near OARs. These are in the top left corner of Fig. 3, namely one left-sided submandibu-lar gland (1), one left-sided parotid gland (2), one inferior PCM (3)
and one supraglottic larynx (4). For outliers (1) and (2), this was due to nearby enlarged lymph nodes, which were mistakenly delineated in the same volume as the OARs. For outlier (3), this patient had a low supraglottic larynx tumour originating in the left ary-epiglottic fold. Outlier (4) was a patient with an oropharyngeal tumour, invading the base of tongue, which made delineation of the cranial part of the supraglottic larynx challenging.