## TECHNICAL NOTE

## Diffusion Model using Quantum Neural Network with Channel Attention

Generative AI has revolutionized the world, and one of its most captivating applications is image generation through diffusion models. In this article, we delve into the exciting potential of harnessing quantum computing for diffusion models. By replacing the traditional classical neural networks with quantum neural networks, we may unlock the ability to generate even larger and more intricate images and videos.

We will start by reviewing the important concepts behind diffusion models, followed by an examination of how quantum neural networks can be integrated into this framework. Finally, we will delve into the concept of channel attention, which empowers quantum neural networks to perform the essential denoising function within diffusion models.

INTRODUCTION

DIFFUSION MODELS

Diffusion models are a popular type of generative AI that have become well-known for their ability to create high-quality and varied images. These models work by first adding noise to an existing image until it becomes pure random noise. Then, they learn to reverse this process, gradually removing the noise step-by-step to create new images that match a user's request. For instance, if a user asks for a picture of a "dog," the model starts by adding noise to a simple image. Then, it systematically removes noise while being guided by the concept of a dog, until the resulting image looks like a dog.

To understand how diffusion models work, it's important to grasp how computers interpret images. In computer vision, images are represented as two-dimensional grids (matrices) of numbers, where each number corresponds to a pixel on a screen. For black and white images, the value in each cell of the matrix indicates the brightness of that pixel.

For example, a noisy image and a handwritten "zero" image might be represented by computers like this:

Thus, when we ask an AI to generate a picture of a handwritten "zero," the task essentially becomes transforming the random pixel values of the noise into specific pixel values that collectively resemble a handwritten zero.

As we've seen, an 8x8 pixel image requires a vector representing 64 pixels. This quadratic scaling becomes challenging with high-definition images like those with 4K resolution or above. In such cases, a vector with a length of roughly 8 million elements would need to be processed simultaneously during the denoising diffusion process. This challenge intensifies with video, where 60 to 120 images per second are necessary for modern video production.

This is where the use of quantum computers for diffusion models can be advantageous. For instance, with a quantum computer, we could potentially process the 8 million elements needed to produce a 4K image using only 23 qubits.

## QUANTUM

## Quantum Neural Network for Diffusion Model

Neural network models have been highly successful and are the most widely used machine learning approach for denoising diffusion models. Recently, quantum neural networks (QNNs), the quantum analog to classical neural networks, have emerged as a promising alternativeM. Schuld et al., Quantum Inf Process 13, 2567–2586 (2014).. QNNs are parameterized quantum circuits that can learn patterns from datasets, offering advantages such as the ability to manipulate qubit states and exhibit high robustness across various tasks. A generic QNN circuit designed to generate a handwritten "0" or "1" of 8x8 pixel is shown below:

Using this generic model, the output of the diffusion models is shown below:

As illustrated above, the generic QNN model struggles with the seemingly simple task of generating a handwritten "0" or "1." This is because the noise injection process in diffusion models is inherently non-unitary, meaning it cannot be reversed by the standard unitary operations allowed in quantum computing. Due to this constraint, a generic QNN model is unable to effectively perform the denoising task required for diffusion models.

FINALLY

## Quantum Neural Network with Channel Attention for Diffusion Model

To overcome this limitation, Quemix has introduced the concept of channel attention for QNNsBlog008. Introducing Channel Attention to Quantum Machine Learning. Channel attention is an effective post-processing method that enables non-unitary operations to be performed by a classical computer after the QNN has completed its calculations. The key idea is to let the QNN handle the computationally demanding tasks and then use simple classical processing to fine-tune the results. This slight "nudge" is sufficient to effectively perform non-unitary operations, allowing QNNs with channel attention to handle both unitary and non-unitary operations, unlike generic QNNs.

Using the QNN with channel attention, the output of the diffusion models is shown below:

As demonstrated, QNNs enhanced with channel attention enable effective image denoising by allowing both unitary and non-unitary operations on the input data. This hybrid approach leverages the quantum computer's ability to efficiently process large amounts of data, while only incurring a small additional computational cost from classical processing at the end that allows the non-unitary operation.

QNN

## Channel Attention for QNNs

The proposed channel attention for QNN is implemented by adding ancilla qubits into the calculation and measuring them. This additional ancilla qubits is what we called “attention qubits”. The measurement results are used to create multiple output state channels, each assigned different weights reflecting their relative importance. These weighted outputs are summed and then passed through a softmax function, ensuring the final values fall between 0 and 1 and sum to 1, thus representing probabilities. For details regarding channel attention mechanisms you can refer to our publication https://journals.aps.org/pra/abstract/10.1103/PhysRevA.110.012447 or to our previous blog post regarding the topic https://www.quemix.com/en/notes008.