Abstract and 1. Introduction

II. Related Work

III. Methodology

IV. Experiments and Results

V. Conclusion and References

III. METHODOLOGY

A. CycleGAN

In this research, we aim to translate CT images into ultrasound images. Conventionally, one might consider training a neural network that inputs a CT image and outputs its corresponding ultrasound image, followed by computing the similarity between the synthesized ultrasound image and the actual ultrasound image to update the network. However, we face a challenge: our datasets, one containing abdominal CT volumes [16] and the other comprising abdominal ultrasound images [14], are unpaired. Given this situation, we have chosen to employ CycleGAN, as it is designed for image-toimage translation tasks where paired images are unavailable. The architecture of CycleGAN includes four key components: two generator networks and two discriminator networks. The generators are responsible for translating images from one domain (e.g., CT) to another (e.g., ultrasound) and vice versa. Each generator has a corresponding discriminator that aims to distinguish between real images from the target domain and fake images created by the generator. A distinctive feature of CycleGAN is the incorporation of a cycle consistency loss. This design is based the assumption that for instance, translating a sentence from English to French and then back to English should ideally return the original sentence, they apply a similar concept in the image-to-image translation. Mathematically, if G : X → Y represents a translator from domain X to domain Y , and F : Y → X serves as its counterpart, then G and F should function as inverses to each other, with both mappings being bijections. To enforce this structure, they train the mappings G and F concurrently while incorporating a cycle consistency loss [17]. This loss ensures that F(G(x)) ≈ x and G(F(y)) ≈ y, promoting fidelity in the translation process between the two domains. If we directly apply the CycleGAN to our task, it should follow the pipeline in Fig. 2.

B. Proposed semantic segmentation enhanced S-CycleGAN

After training and testing an original CycleGAN model, we observed that while the overall style (color and texture) of the CT images was effectively transformed to match the ultrasound style, the anatomical details in the generated ultrasound images are hard to distinguish. This difficulty stems from the fact that in traditional image translation tasks, images from both domains are treated as samples from the joint distribution of all relevant sub-classes (such as different organs), and the translation is essentially a mapping between these distributions. Even with the use of cyclical mappings, there is no assurance that the marginal distributions of these sub-classes (or modes) are properly matched (e.g., ‘liver’ correctly translating to ‘liver’).

To maintain pixel-level semantic accuracy while converting image-level style (color and texture distribution), we incorporated two additional segmentation networks as

semantic discriminators. The fake images produced by the generator are analyzed by these segmentation networks to produce a semantic mask. Subsequently, a segmentation loss is computed between this semantic mask and the label of the real source image. Moreover, unlike other similar studies [18] that continue to use an image alone as input, our network architecture employs both the image and its corresponding semantic map as inputs. This dual-input approach equips the generator with a more refined understanding of per-pixel semantic information. Hereby, we propose our S-CycleGAN (Fig. 4), which includes the following components:

4) Adversarial Loss:

5) Cycle Consistency Loss:

Cycle consistency loss ensures that translating an image to the other domain and back again will yield the original image, maintaining cycle consistency across translations, which is the key to training image-to-image translation models with unpaired image sets. The L1 norm here measures the absolute differences between the original and the reconstructed image.

The propagation flow is provided in Algorithm 1.

Authors:

(1) Yuhan Song, School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1292, Japan ([email protected]);

(2) Nak Young Chong, School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa 923-1292, Japan ([email protected]).


This paper is available on arxiv under ATTRIBUTION-NONCOMMERCIAL-NODERIVS 4.0 INTERNATIONAL license.