GAN学习笔记 1. GAN原理 论文链接:Generative Adversarial Networks
生成式对抗网络(GAN, Generative Adversarial Networks)是一种深度学习模型,是近年来复杂分布上无监督学习最具前景的方法之一。模型通过框架中(至少)两个模块:生成模型(Generative Model)和判别模型(Discriminative Model)的互相博弈学习产生相当好的输出。原始 GAN 理论中,并不要求 G 和 D 都是神经网络,只需要是能拟合相应生成和判别的函数即可。但实用中一般均使用深度神经网络作为 G 和 D 。一个优秀的GAN应用需要有良好的训练方法,否则可能由于神经网络模型的自由性而导致输出不理想。 Ian J. Goodfellow等人于2014年10月在Generative Adversarial Networks 中提出了一个通过对抗过程估计生成模型的新框架。框架中同时训练两个模型:捕获数据分布的生成模型G,和估计样本来自训练数据的概率的判别模型D。G的训练程序是将D错误的概率最大化。这个框架对应一个最大值集下限的双方对抗游戏。可以证明在任意函数G和D的空间中,存在唯一的解决方案,使得G重现训练数据分布,而D=0.5。在G和D由多层感知器定义的情况下,整个系统可以用反向传播进行训练。在训练或生成样本期间,不需要任何马尔科夫链或展开的近似推理网络。实验通过对生成的样品的定性和定量评估证明了本框架的潜力。 —- 摘自百度百科
GAN是由两部分组成的,第一部分是生成,第二部分是对抗。简单来说,就是有一个生成网络G和一个判别网络D,通过训练让两个网络相互竞争,生成网络G接受一个随机噪声z来生成假的数据G(z),对抗网络D通过判别器去判别真伪概率,最后希望生成器G生成的数据能够以假乱真。在最理想的状态下,D(G(z)) = 0.5。
以上原理的数学公式为:
$$ min_{G}max_{D}V(D,G) = \mathbb{E} _ {x \sim p_{data}(x)} [\log D(x)] + \mathbb{E} _ {z \sim p_{z}(z) [\log (1-D(G(z)))]} $$
式子中,x表示真实数据,z表示噪声,G(z)表示G网络根据z生成的数据,D(x)表示D网络判断真实数据是否为真的概率,因此D(x)接近1越好。而D(G(z))代表D网络判断G网络生成的虚假数据是真实的概率。 因此,对于D网络(辨别器):
如果x来自$P_{data}$,那么D(x)要越大越好,可以用$\log(D(x)) \uparrow$表示。
如果x来自于$P_{generator}$,那么D(G(z))越小越好,进而表示为$\log[1−D(G(z))] \uparrow$。
因此需要最大化$max_D$ 对于G网络(生成器):
$D(G(z))$越大越好,进而表示为log[1−D(G(z))]↓
因此需要最小化$min_{G}$。
第一步我们训练D,D是希望V(D,G)越大越好,所以是加上梯度(ascending)。第二步训练G时,V(D,G)越小越好,所以是减去梯度(descending)。整个训练过程交替进行。
2. GAN实例 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 import torchfrom torch import nn,optimimport torchvision.transforms as tfsimport numpy as npfrom PIL import Imageimport matplotlib.pyplot as pltfrom tqdm.auto import tqdmtransforms = tfs.Compose([ tfs.Resize((32 ,32 )), tfs.ToTensor(), ]) flat_img = 32 *32 *3 noise_dim = 100 img = Image.open ('1.jpg' ) real_img = transforms(img) torch.manual_seed(2 ) fake_img = torch.rand(1 ,noise_dim) plt.imshow(np.transpose(real_img.numpy(),(1 ,2 ,0 )))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class Discriminator (nn.Module ): def __init__ (self ): super ().__init__() self.linear = nn.Sequential( nn.Linear(flat_img, 1024 ), nn.ReLU(), nn.Linear(1024 , 2048 ), nn.ReLU(), nn.Linear(2048 , 1 ), nn.Sigmoid() ) def forward (self, img ): img = img.view(1 , -1 ) out = self.linear(img) return out
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class Generator (nn.Module ): def __init__ (self ): super ().__init__() self.linear = nn.Sequential( nn.Linear(noise_dim, 1024 ), nn.LeakyReLU(), nn.Linear(1024 , 2048 ), nn.LeakyReLU(), nn.Linear(2048 , flat_img) ) def forward (self, latent_space ): latent_space = latent_space.view(1 , -1 ) out = self.linear(latent_space) return out
1 2 3 4 5 6 7 8 9 device = 'cuda:0' if torch.cuda.is_available() else 'cpu' discr = Discriminator().to(device) gen = Generator().to(device) opt_d = optim.SGD(discr.parameters(), lr=0.001 , momentum=0.9 ) opt_g = optim.SGD(gen.parameters(), lr=0.001 , momentum=0.9 ) criterion = nn.BCELoss()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 epochs = 200 discr_e = 4 gen_e = 4 for epoch in range (epochs): for k in range (discr_e): out_d1 = discr(real_img.to(device)) loss_d1 = criterion(out_d1, torch.ones((1 , 1 )).to(device)) out_d2 = gen(fake_img.to(device)).detach() loss_d2 = criterion(discr(out_d2.to(device)), torch.zeros((1 , 1 )).to(device)) opt_d.zero_grad() loss_d = loss_d1+loss_d2 loss_d.backward() opt_d.step() for i in range (gen_e): out_g = gen(fake_img.to(device)) loss_g = criterion(discr(out_g.to(device)), torch.ones(1 , 1 ).to(device)) opt_g.zero_grad() loss_g.backward() opt_g.step() if (epoch+1 )%10 ==0 : print ('Epoch[{}/{}],d_loss:{:.6f},g_loss:{:.6f}' .format (epoch+1 ,epochs,loss_d.data.item(),loss_g.data.item())) out=gen(fake_img.to(device)).detach() out_score=discr(out_g.to(device)) loss = criterion(out_score, torch.ones(1 , 1 ).to(device)) print ("score:" ,out_score.item(),"loss:" ,loss.item())out=out.reshape((3 ,32 ,32 )).cpu() plt.subplot(1 ,2 ,1 ) plt.title('fake' ) plt.imshow(np.transpose(out.numpy(),(1 ,2 ,0 ))) plt.subplot(1 ,2 ,2 ) plt.title('real' ) plt.imshow(np.transpose(real_img.numpy(),(1 ,2 ,0 )))
3. DCGAN原理 https://arxiv.org/pdf/1511.06434.pdf
DCGAN的原理和GAN是一样的。只不过DCGANs体系结构有所改变:
使用指定步长的卷积层代替池化层
在生成器和鉴别器中使用batch norm。
移除全连接层,以实现更深层次的体系结构,减少参数。
在生成器中使用ReLU激活,但输出使用Tanh。
在鉴别器中使用LeakyReLU激活
DCGAN中提到了网络的训练细节:
使用Adam算法更新参数,betas=(0.5, 0.999);
batch size选为128;
权重使用正太分布,均值为0,标准差为0.02;
学习率0.0002。
4. DCGAN实例 生成动漫头像,数据集来自https://www.kaggle.com/soumikrakshit/anime-faces
1 2 3 4 5 6 7 8 9 10 11 import osimport numpy as npimport imageiofrom tqdm.auto import tqdmimport torch,torchvisionimport torch.nn as nnfrom torch.utils.data import Dataset, DataLoaderfrom torchvision import transforms, utilsimport matplotlib.pyplot as pltavatar_img_path = "E:/python/dataset/anime face/data"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 trans = transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5 , 0.5 , 0.5 ), (0.5 , 0.5 , 0.5 )) ]) noise_dim = 100 batch_size = 16 beta1=0.5 ''' #自定义数据集 file_train=[] for image_name in tqdm(os.listdir(avatar_img_path)): file_train.append(os.path.join(avatar_img_path,image_name)) def default_loader(path): img = imageio.imread(path) img = img/255 img = trans(img) return img class trainset(Dataset): def __init__(self, loader=default_loader): #定义好 image 的路径 self.images = file_train self.target = 0 self.loader = loader def __getitem__(self, index): fn = self.images[index] img = self.loader(fn) target = self.target return img,target def __len__(self): return len(self.images) ''' img_dataset=torchvision.datasets.ImageFolder("E:/python/dataset/anime face" , transform=trans) img_dataloader=DataLoader(img_dataset,batch_size=batch_size,shuffle=True )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 class Generator (nn.Module ): def __init__ (self, z_dim ): super (Generator,self).__init__() self.z_dim = z_dim self.generator = nn.Sequential( nn.ConvTranspose2d(self.z_dim,512 ,4 ,1 ,0 ,bias=False ), nn.BatchNorm2d(num_features=512 ), nn.ReLU(True ), nn.ConvTranspose2d(512 ,256 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=256 ), nn.ReLU(True ), nn.ConvTranspose2d(256 ,128 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=128 ), nn.ReLU(True ), nn.ConvTranspose2d(128 ,64 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=64 ), nn.ReLU(True ), nn.ConvTranspose2d(64 ,3 ,4 ,2 ,1 ,bias=False ), nn.Tanh() ) self.weight_init() def weight_init (self ): for m in self.generator.modules(): if isinstance (m, nn.ConvTranspose2d): nn.init.normal_(m.weight.data, 0 , 0.02 ) elif isinstance (m, nn.BatchNorm2d): nn.init.normal_(m.weight.data, 0 , 0.02 ) nn.init.constant_(m.bias.data, 0 ) def forward (self, x ): out = self.generator(x) return out
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 class Discriminator (nn.Module ): def __init__ (self ): """ initialize :param image_size: tuple (3, h, w) """ super (Discriminator,self).__init__() self.discriminator = nn.Sequential( nn.Conv2d(3 ,64 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=64 ), nn.LeakyReLU(0.2 ), nn.Conv2d(64 ,128 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=128 ), nn.LeakyReLU(0.2 ), nn.Conv2d(128 ,256 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=256 ), nn.LeakyReLU(0.2 ), nn.Conv2d(256 ,512 ,4 ,2 ,1 ,bias=False ), nn.BatchNorm2d(num_features=512 ), nn.LeakyReLU(0.2 ), nn.Conv2d(512 ,1 ,4 ,2 ,0 ,bias=False ), nn.Sigmoid() ) self.weight_init() def weight_init (self ): for m in self.discriminator.modules(): if isinstance (m, nn.ConvTranspose2d): nn.init.normal_(m.weight.data, 0 , 0.02 ) elif isinstance (m, nn.BatchNorm2d): nn.init.normal_(m.weight.data, 0 , 0.02 ) nn.init.constant_(m.bias.data, 0 ) def forward (self, x ): out = self.discriminator(x) out = out.view(x.size(0 ), -1 ) return out
1 2 3 4 5 6 7 8 9 device = torch.device("cuda:0" if torch.cuda.is_available else "cpu" ) generator = Generator(noise_dim).to(device) discriminator = Discriminator().to(device) bce_loss = nn.BCELoss() optimizer_D = torch.optim.Adam(discriminator.parameters(), lr=0.00005 , betas=(beta1, 0.999 )) optimizer_G = torch.optim.Adam(generator.parameters(), lr=0.0002 , betas=(beta1, 0.999 ))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 epochs=20 fixed_z=torch.randn(batch_size,noise_dim,1 ,1 ,device=device) for epoch in range (epochs): for step,(image,_) in enumerate (img_dataloader): batch_size=image.size(0 ) optimizer_D.zero_grad() d_out_real = discriminator(image.type (torch.FloatTensor).to(device)) real_loss = bce_loss(d_out_real, torch.ones(size=(batch_size, 1 )).to(device)) real_scores = d_out_real noise = torch.randn(batch_size,noise_dim,1 ,1 ,device=device) fake_img = generator(noise) d_out_fake = discriminator(fake_img.detach()) fake_loss = bce_loss(d_out_fake, torch.zeros(size=(batch_size, 1 )).to(device)) fake_scores = d_out_fake d_loss = (real_loss + fake_loss)/2 d_loss.backward() optimizer_D.step() optimizer_G.zero_grad() d_out_fake = discriminator(fake_img) g_loss = bce_loss(d_out_fake, torch.ones(size=(batch_size, 1 )).to(device)) g_loss.backward() optimizer_G.step() if step % 200 == 0 : generator.eval () fixed_image = generator(fixed_z) generator.train() print ("[epoch: {}/{}], [iter: {}], [G loss: {:.3f}], [D loss: {:.3f}], [R Score: {:.3f}], [F Score: {:.3f}]" .format (epoch+1 ,epochs,step, g_loss.item(), d_loss.item(),real_scores.data.mean(), fake_scores.data.mean())) utils.save_image(fixed_image.detach(), str (epoch+1 )+"fake.jpg" ,normalize=True ) utils.save_image(image,str (epoch+1 )+"real.jpg" ,normalize=True )
结果如下: [epoch: 1/20], [iter: 0], [G loss: 0.699], [D loss: 0.694], [R Score: 0.499], [F Score: 0.500] [epoch: 1/20], [iter: 200], [G loss: 0.803], [D loss: 0.715], [R Score: 0.512], [F Score: 0.529] [epoch: 1/20], [iter: 400], [G loss: 0.734], [D loss: 0.692], [R Score: 0.492], [F Score: 0.491] [epoch: 1/20], [iter: 600], [G loss: 0.730], [D loss: 0.693], [R Score: 0.496], [F Score: 0.496] [epoch: 1/20], [iter: 800], [G loss: 0.748], [D loss: 0.686], [R Score: 0.500], [F Score: 0.492] [epoch: 1/20], [iter: 1000], [G loss: 0.745], [D loss: 0.680], [R Score: 0.514], [F Score: 0.499] [epoch: 1/20], [iter: 1200], [G loss: 0.715], [D loss: 0.701], [R Score: 0.527], [F Score: 0.532] [epoch: 2/20], [iter: 0], [G loss: 0.762], [D loss: 0.679], [R Score: 0.524], [F Score: 0.508] [epoch: 2/20], [iter: 200], [G loss: 0.815], [D loss: 0.686], [R Score: 0.507], [F Score: 0.498] [epoch: 2/20], [iter: 400], [G loss: 0.836], [D loss: 0.665], [R Score: 0.509], [F Score: 0.479] [epoch: 2/20], [iter: 600], [G loss: 0.759], [D loss: 0.694], [R Score: 0.523], [F Score: 0.520] [epoch: 2/20], [iter: 800], [G loss: 0.973], [D loss: 0.646], [R Score: 0.551], [F Score: 0.499] [epoch: 2/20], [iter: 1000], [G loss: 0.926], [D loss: 0.671], [R Score: 0.531], [F Score: 0.495] [epoch: 2/20], [iter: 1200], [G loss: 1.100], [D loss: 0.582], [R Score: 0.497], [F Score: 0.362]
第7个epoch:
batch_size以及其他参数可自行调整。
5. WGAN原理 论文:Wasserstein GAN Towards Principled Methods for Training Generative Adversarial Networks
总所周知,GAN的训练存在很多问题和挑战:
训练困难,需要精心设计模型结构,协调G和D的训练程度
G和D的损失函数无法指示训练过程,缺乏一个有意义的指标和生成图片的质量相关联
模式崩坏(mode collapse),生成的图片虽然看起来像是真的,但是缺乏多样性
WGAN相比较于传统的GAN,做了如下修改:
D最后一层去掉sigmoid
G和D的loss不取log
每次更新D的参数后,将其绝对值截断到不超过一个固定常数c
不要用基于动量的优化算法(包括momentum和Adam),推荐RMSProp,SGD也行
G的损失函数原本为$\mathbb{E} _ {z \sim p _ z}[\log(1-D(G(z)))]$ ,其导致的结果是,如果D训练得太好,G将学习不到有效的梯度。但是,如果D训练得不够好,G也学习不到有效的梯度。 因此以上损失函数导致GAN训练特别不稳定,需要小心协调G和D的训练程度。
WGAN参考资料:https://zhuanlan.zhihu.com/p/44169714 https://www.cnblogs.com/Allen-rg/p/10305125.html