sagan 自注意力_请使用英语:自我注意生成对抗网络(SAGA...

当前位置：首页 > 新闻动态 >

热卖商品

新闻详情

sagan 自注意力_请使用英语:自我注意生成对抗网络(SAGA..._CSDN博客

来自 : CSDN技术社区发布时间：2021-03-25

sagan 自注意力

介绍 (Introduction)

In my effort to better understand the concept of self-attention, I tried dissecting one of its particular use cases on one of my current deep learning subtopic interests: Generative Adversarial Networks (GANs). As I delved deeply into the Self-Attention GAN (or “SAGAN”) research paper, while following similar implementations on Pytorch and Tensorflow in parallel, I noticed how exhausting it could get to power through the formality and the mathematically intense blocks to arrive at a clear intuition of the paper’s contents. Although I get that formal papers are written that way for precision of language, I do think there’s a need for bite-sized versions that define the prerequisite knowledge needed and also lay down the advantages and disadvantages candidly.

为了更好地理解自我注意的概念我尝试将其特定用例之一与我当前的深度学习子主题兴趣之一进行比较生成对抗网络(GAN)。当我深入研究Self-Attention GAN(或“ SAGAN”)研究论文时在并行地在Pytorch和Tensorflow上执行类似的实现时我注意到如何通过形式化和数学上密集的块来耗尽它会变得强大起来清楚地了解论文的内容。尽管我认为正式论文的编写方式是为了提高语言的精确度但我确实认为需要一口大小的版本来定义所需的先决知识并坦率地列出优缺点。

In this article, I am going to try to make a computationally efficient interpretation of the SAGAN without reducing too much of the accuracy for the “hacky” people out there who want to just get started (Wow, so witty).

在本文中我将尝试对SAGAN进行计算上有效的解释而又不会降低那些刚开始使用“ hacky”的人的准确性(哇很机智)。

So, here’s how I’m going to do it:

所以这就是我要做的事情

What do I need to know?

我需要知道些什么

What is it? Who made it?

它是什么谁干的

What does it solve? Advantages and Disadvantages?

它能解决什么的优点和缺点

Possible further studies?

可能需要进一步研究吗

Source/s

源/秒

我需要知道些什么 (What do I need to know?) Basic Machine Learning and Deep Learning concepts (Dense Layers, Activation Functions, Optimizers, Backpropagation, Normalization, etc.)

基本机器学习和深度学习概念(密集层激活功能优化器反向传播规范化等) Vanilla GAN

甘香草 Other GANs: Deep Convolutional GAN (DCGAN), Wasserstein GANs (WGAN)

其他GAN 深度卷积GAN(DCGAN) Wasserstein GAN(WGAN) Convolutional Neural Networks — Intuition, Limitations and Relational Inductive Biases (Just think of this as assumptions)

卷积神经网络-直觉局限性和关系归纳偏见(仅将其视为假设) Spectral Norms and the Power Iteration Method

谱范数和功率迭代法 Two Time-Scale Update Rule (TTUR)

两个时标更新规则(TTUR) Self-Attention

自我注意

First and foremost, basic concepts are always necessary. Let’s just leave it at that, haha. Moving on, a working understanding of the game mechanics of classical GAN training would be quite handy. In practice, I think most versions of GANs now are trained with convolutional layers and a non-saturating or wasserstein loss so learning about DCGANs and WGANs are very useful. Also, the understanding that CNNs have a locality assumption are key to the usefulness of self-attention in SAGANs (or, in general). For the people who get restless without the proof (a.k.a. math nerds), it would be helpful to check out spectral norms and the power iteration method, an eigenvector approximation algorithm, beforehand. As for TTUR, honestly this is just having two separate learning rates for your generator and discriminator models. Feel free to check out the paper on Attention too even though I’ll be mildly going through it.

首先最基本的概念总是必要的。我们就这样吧哈哈。继续对经典GAN训练的游戏机制的工作理解将非常方便。实际上我认为现在大多数GAN版本都经过卷积层训练并且具有非饱和或wasserstein损失因此了解DCGAN和WGAN非常有用。同样对于CNN具有局部性假设的理解对于SAGAN(或一般而言)中自我注意的有用性至关重要。对于那些没有证明而烦躁不安的人(又称数学书呆子) 事先检查频谱范数和功率迭代方法(特征向量近似算法)将很有帮助。至于TTUR 说实话这只是针对生成器和鉴别器模型的两个单独的学习率。即使我会温和地进行阅读也可以随时查阅有关注意的论文。

它是什么谁干的 (What is it? Who made it?)

Essentially, SAGAN is a convolutional GAN that uses a self-attention layer/block in the generator model, does spectral normalization on both the generator and discriminator, and trains via the two time-scale update rule (TTUR) and the hinge version of the adversarial loss. Everything else is common GAN practice; some of these would be using tanh function at the end of a generator model, using leaky ReLU for the discriminator and just generally using Adam as your optimizer. This architecture was created by Han Zhang, Ian Goodfellow, Dimitris Metaxas and Augustus Odena.

本质上 SAGAN是卷积GAN 它在生成器模型中使用了一个自注意层/块对生成器和鉴别器进行了频谱归一化并通过两个时标更新规则(TTUR)和该算法的铰链版本进行训练。对抗性损失。一切都是GAN的惯例。其中一些将在生成器模型的末尾使用tanh函数使用泄漏的ReLU作为判别器并且通常只使用Adam作为优化器。该架构由Han Zhang Ian Goodfellow Dimitris Metaxas和Augustus Odena创建。

If you looked through the prerequisites, this definition would be pretty straightforward.

如果您仔细研究了先决条件则此定义将非常简单。

$\"Image$ Hinge Version of Adversarial Loss Used in the Paper 本文中使用的对抗性损失的铰链版本它能解决什么的优点和缺点 (What does it solve? Advantages and Disadvantages?)

To start, an attention module is something that is incorporated in your model to be able to use all of your input’s information (global access) for the output in a not so computationally expensive way. Self-attention is just a specific version wherein your query, key and value vectors are all the same. In the figure below, these are the f, g and h functions. Primarily used in NLP, it has found its way to CNNs and GANs because of the locality assumption that CNNs make. Since CNNs and previous convolution-based GANs use a small window to predict the next layer, complex geometry of certain outputs (ex. dogs, full body photos, etc.) are harder to generate as compared to pictures of oceans, skies and other backgrounds. I’ve also read that previous GANs had a harder time generating images in multi-class situations but I need to read up more on that. Now, self-attention makes it possible to have global access to input information, giving the generator the ability to learn from all feature locations.

首先注意模块是模型中集成的模块它能够以计算上不那么昂贵的方式将输入的所有信息(全局访问)用于输出。自我注意只是一个特定的版本其中您的查询键和值向量都相同。在下图中这些是f g和h函数。它主要用于NLP中由于CNN的位置假设它已经找到了通往CNN和GAN的途径。由于CNN和以前的基于卷积的GAN使用一个小窗口来预测下一层因此与海洋天空和其他背景的图片相比某些输出(例如狗全身照片等)的复杂几何图形更难以生成。我还阅读了以前的GAN 很难在多类情况下生成图像但是我需要阅读更多内容。现在自我关注使全局访问输入信息成为可能从而使生成器能够从所有特征位置进行学习。

$\"Image$ ⊗ just means matrix multiplication. The first part just shows how the previous layer is converted into three identical pieces (query, key and value) using 1x1 convolutions. ⊗仅表示矩阵乘法。第一部分仅说明如何使用1x1卷积将上一层转换为三个相同的部分(查询键和值)。

Another thing about the SAGAN is that it uses spectral normalization on both the generator and the discriminator for better conditioning. What spectral normalization does is that it allows less discriminator updates per generator update via limiting the spectral norm of the weight matrices to constrain the Lipschitz of the network function. That’s a mouthful but you can just imagine it to be a more powerful normalization technique. Lastly, SAGANs use the two time-scale update rule to address slow learning discriminators. Typically, the the discriminator starts with a higher learning rate to avoid mode collapse.

关于SAGAN的另一件事是它在生成器和鉴别器上都使用频谱归一化以实现更好的调节。频谱归一化的作用是通过限制权重矩阵的频谱范数来限制网络函数的Lipschitz 从而允许每个生成器更新执行的鉴别器更新更少。那是一个大嘴但是您可以想象它是一种更强大的规范化技术。最后 SAGAN使用两个时标更新规则来解决学习缓慢的区分因素。通常鉴别器以较高的学习率开始以避免模式崩溃。

可能需要进一步研究吗 (Possible further studies?)

As of the moment, I’m personally having a difficult time generating 256x256 images due to either computational expense or something I don’t fully understand about the capacity or nuances of the model. Has anyone tried progressively growing a SAGAN?

目前由于计算费用或我对模型的容量或细微之处尚不完全了解因此我个人很难生成256x256图像。有没有人尝试逐步发展SAGAN

Thanks for reading! I hope you enjoyed! I would love to do more of these so feedback is very much welcome. :)

谢谢阅读希望您喜欢我想做更多的事情因此非常欢迎反馈。 :)

源/秒 (Source/s)

Self-Attention GAN Paper

自我注意GAN论文

Spectral Normalization for GANs

GAN的光谱归一化

本文链接： http://amsolv.immuno-online.com/view-742379.html

发布于： 2021-03-25 阅读（0）

没有了