Derin Öğrenme Teknikleri Kullanılarak Sahte Yüz Fotoğrafı ve Videosu Sentezi

Çalışmadaki amaçla; bir kişinin gerçekte var olan kısa bir videosundan veya birkaç fotoğrafından bile, o kişinin yüz fotoğrafını içeren sahte videolar oluşturulabileceği kanıtlanmaktadır. Bu birkaç fotoğraf veya kısa bir video alınıp derin öğrenme teknikleriyle eğitilerek sahte fotoğraflar ve videolar oluşturulabilir. Sahte videolarda kişinin yüzüyle başka bir kişinin yüz değişimi uygulanabilir veya kişinin yüzüne yeniden canlandırma (hareketlendirme) yapılabilir. Yeniden canlandırmada ise kaynak bir kişinin yüzüne başka bir kişinin videosundaki yüz hareketleri uygulanabilir. Hatta StyleGAN gibi teknikler ile gerçek insan yüz fotoğraflarından oluşan bir fotoğraf kümesi kullanılarak var olmayan insan fotoğrafları bile üretilebilir. Yaygın olarak Derin sahtelik (Deepfake) teknolojisi olarak bilinen bu teknikler, bu çalışmada yüzde kullanılan çeşitleri ve yapıları ile birlikte ele alınmıştır. Bu teknikler, eski dönemlerde yaşamış bilim adamlarının, ünlülerin var olan fotoğraflarına yeniden canlandırma yapılıp konuşturularak çocuklar için eğitim amaçlı kullanılabilir. Kuklacılıkta bu yöntem kullanılabilir. Oyuncuların (Aktörlerin-Aktrislerin) yerine sahnelerde, onların fotoğraflarıyla bilgisayarda yeniden canlandırma yapılabilir. Portreler canlandırılabilir (hareketlendirilebilir). Bu çalışmanın benzer araştırmalardan ayrıldığı nokta ise eğitim için kullanılacak olan verinin diğer çalışmalardaki verilerden daha az olması ve sahte video oluşturma çeşitlerinin, yapılarının birlikte ele alınmasıdır. Veri eğitimi için kullanılan materyal GPU ve veri seti olarak ise VoxCeleb veri seti, birkaç kısa video ve birkaç fotoğraftan oluşmaktadır. Kullanılan yöntem ise Çekişmeli üretici ağlar ve Otomatik kodlayıcılar gibi üretken ağlardır. Yapılan çalışma kullanılan video ve fotoğraflarda yüzün karşıya (öne) dönük veya hafif sağa ya da hafif sola dönük iken, yüz hareketinin belirli bir alanda sınırlı olduğunda ve yüzün yavaş hareket ettiğinde yapay zekayı daha iyi eğittiği ve bu eğitim verileri kullanılarak oluşturulan sahte videoların daha başarılı olduğunu göstermiştir.

Anahtar Kelimeler:

Yüz değiştirme, Yüz canlandırma, Derinsahte video, Derin öğrenme

Fake face photo and video synthesis using deep learning techniques

For the purpose of the study; It is proven that even from a short video or a few photos of a person that actually exists, fake videos can be created containing a photo of that person's face.Fake photos and videos can be created by taking these few photos or a short video and training them with deep learning tech-niques. In fake videos, face swapping of another person can be applied with the face of the person or face reenactment can be applied to the person's face. In re-enactment, facial movements of another person's video can be applied to the face of a source person. Even non-existent human photographs can be produced using techniques such as StyleGAN using a set of photographs of real human faces. These techniques, commonly known as deepfakes technology, are dis-cussed in this study together with the types and structures used on the face. These techniques can be used for educational purposes for children by animat-ing existing photographs of scientists and celebrities who lived in ancient times. This method can be used in puppetry. Instead of actors (Actors-Actresses), scenes can be animated with their photos on the computer. Portraits can be ani-mated. The difference of this study from similar studies is that the data used for training is less than the data in other studies, and the types and structures of fake video creation are considered together. The material used for data training is the GPU and the dataset consists of VoxCeleb dataset, several short videos and several photos. The method used is generative networks such as Generative adversarial networks (GAN) and Auto-encoders. The study has shown that in the videos and photos used, when the face is turned forward or slightly to the right or slightly to the left, when the facial movement is limited in a certain area and the face moves slowly, it trains the artificial intelligence better. It has shown that the fake videos created using this training data are more successful.

Keywords:

Face swapping, Face reenactment, Deepfake video, Deep learning,

PDF

___

[1] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville ve Y. Bengio, “Generative Adversarial Nets,” NIPS, vol. 27, pp. 2672-2680, 2014.
[2] Y. LeCun, Y. Bengio ve G. Hinton, “Deep Learning, ” Nature, vol. 521, pp. 436-444, 2015.
[3] D. Bank, N. Koenigstein ve R. Giryes, “Autoencoders,” CoRR, vol. abs/2003.05991, 2020.
[4] T. Karras, S. Laine ve T. Aila, “A Style-Based Generator Architecture for Generative Adversarial Networks,” CVPR , pp. 4401-4410, 2019.
[5] NVIDIA, Vingelmann, P., & Fitzek, F. H. P.. CUDA, release: 10.2.89. Retrieved from https://developer.nvidia.com/cuda-toolkit, 2020.
[6] M. Abadi, P. Barham, J. Chen,, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard ve diğerleri, “TensorFlow: A System for Large-Scale Machine Learning,” OSDI , pp. 265-283, 2016.
[7] Y. Nirkin, Y. Keller ve T. Hassner, “FSGAN: Subject Agnostic Face Swapping and Reenactment”, ICCV, pp. 7183-7192, 2019.
[8] E. Zakharov, A. Shysheya, E. Burkov ve V. S. Lempitsky, “Few-Shot Adversarial Learning of Realistic Neural Talking Head Models,”ICCV, 2019, pp. 9458-9467.
[9] Y. Zhang, S. Zhang, Y. He, C. Li, C. C. Loy ve Z. Liu, “One-shot Face Reenactment”, CoRR, vol. abs/1908.03251, 2019.
[10] W. Wu, Y. Zhang, C. Li, C. Qian ve C. C. Loy, “ReenactGAN: Learning to Reenact Faces via Boundary Transfer,” ECCV , pp. 622-638, 2018.
[11] A. Nagrani, J. S. Chung ve A. Zisserman, “VoxCeleb: A Large-Scale Speaker Identification Dataset”, INTERSPEECH, pp. 2616-2620, 2017.
[12] T. Karras, T. Aila, S. Laine ve J. Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” 2017.
[13] Deepfakes (n.d.). Deepfakes/faceswap.Retrieved March 10, 2021, from https://github.com/deepfakes/faceswap, 2021.
[14] A. Siarohin, S. Lathuilière, S. Tulyakov, E. Ricci ve N. Sebe, “First Order Motion Model for Image Animation”. CoRR, vol. abs/2003.00196, 2020.
[15] K. R. Prajwal, R. Mukhopadhyay, V. P. Namboodiri ve C. V. Jawahar, “A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild”, ACM Multimedia, pp. 484-492, 2020.
[16] I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Umé, M. Dpfks,, C. S. Facenheim, L. RP, J. Jiang, S. Zhang, P. Wu, B. Zhou ve W. Zhang, “DeepFaceLab: A simple, flexible and extensible face swapping framework,” CoRR, vol. abs/2005.05535, 2020.
[17] S. Zhang, X. Zhu, Z. Lei, H. Shi, X. Wang ve S. Z. Li, “S3FD: Single Shot Scale-invariant Face Detector,” CoRR, vol. abs/1708.05237,2017.
[18] J. Deng, J. Guo, Y. Zhou, J. Yu, I. Kotsia ve S. Zafeiriou, “Retinaface:Single-stage dense face localisation in the wild”, arXiv preprint, arXiv:1905.00641,2019.
[19] K. Zhang, Z. Zhang, Z. Li ve Y. Qiao, “Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks,” IEEE Signal Process, vol. 23, pp. 1499-1503, 2016.
[20] A. Bulat ve G. Tzimiropoulos, “How Far are We from Solving the 2D & 3D Face Alignment Problem? (and a Dataset of 230, 000 3D Facial Landmarks),” ICCV , pp. 1021-1030, 2017.
[21] Y. Feng, F. Wu, X. Shao, Y. Wang ve X. Zhou, “Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network,” arXiv preprint, arXiv:1803.07835,2018.
[22] V. Iglovikov ve A. Shvets, “TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation,” CoRR, vol. abs/1801.05746, 2018.
[23] S. Umeyama, “Least-Squares Estimation of Transformation Parameters Between Two Point Patterns”, IEEE Trans. Pattern Anal. Mach, vol. 13, pp. 376-380,1991.
[24] M. Berger, Geometry I, Berlin, Springer, 1987, ISBN 3-540-11658-3.
[25] L. A. Gatys, A. S. Ecker ve M. Bethge, “A Neural Algorithm of Artistic Style”, arXiv preprint, arXiv: 1508.06576, 2015.
[26] X. Huang ve S. J. Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization”, CoRR, vol. abs/1703.06868, 2017.
[27] V. Dumoulin, J. Shlens ve M. Kudlur, “A Learned Representation For Artistic Style”, CoRR, vol. abs/1610.07629, 2016.
[28] G. Ghiasi, H. Lee, M. Kudlur, V. Dumoulin ve J. Shlens, “Exploring the structure of a real-time, arbitrary neural artistic stylization network,” CoRR, vol. abs/1705.06830, 2017.
[29] V. Dumoulin, E. Perez, N. Schucher, F. Strub, H. Vries, A. Courville ve Y. Bengio, “Feature-wise transformations.Distill”, Retrieved March 10, 2021, from https://distill.pub/2018/feature-wise-transformations. 2, 2018.
[30] N. E. Özmen ve E. Buluş, “Derin Sinir Ağları Yardımıyla Fotomontaj Tespiti,” Mühendislik Bilimleri ve Tasarım Dergisi, c. 8 s. 5, ss. 236-240, 2020.