Among the many training methods used for image recognition and modification, two are particularly interesting: the “autoencoder” and GAN (Generative Adversarial Network). Both methods involve two networks that together train one another to recognize and generate data (e.g. text or image).
The autoencoder model consists of an encoder that receives an image and tries to “summarize” it using much less information. Meanwhile, its “partner” – the decoder – tries to recreate the original based on the summary alone. It’s a little akin to playing charades. In the initial phase of this “guessing game,” the encoder loses a lot of key information and the product that the decoder delivers is nowhere near the original. However, after a number of repetitions, it turns out that the key information provided by the encoder is enough to reconstruct the image.
“Autoencoders are trained to achieve a result as close to the original data as possible, while limiting the size of the intermediate representation (called the hidden vector),” explained Filip Wolski, Research Scientist at OpenAI. “As a side effect of the process, the features that may differentiate images (e.g. age, sex, skin color, hair length, etc.) must be encoded in the hidden vector and thus become separated from the information necessary to recreate the original. These features can be easily read from the vector – as well as changed: we can obtain a picture of the same person with one feature changed,” he added.
Training GANs, on the other hand, resembles a “true or false” challenge. One of the networks
in this model, the generator, creates a random set of data forming e.g. an image. The other
network, the discriminator, is supposed to guess if the image it “sees” is real or the product
of the generator. “We train the generator only for the purpose of misleading the discriminator,”
Wolski explained. It is being “rewarded” each time it is able to deceive the discriminator. But
the discriminator is also learning in the process and is able to recognize more accurately real
images from the generator’s “noise.” And so on, and so forth. “In time, the discriminator learns to recognize real images very well, and the generator learns to create very realistic looking pictures,” Wolski added.
This is how we can create photorealistic pictures, and soon also videos, indistinguishable
from the images of real people. It is also the technology behind making faces look older
or younger. Making people look older in images is possible through the cooperation of two
or more neural networks. Based on sample images, one network trains in determining which
features are typical of people in different age groups. This abstract representation of age
can then be blended with a real image to make the person look older, while the other networks in the model make sure the face maintains the features which are characteristic of the person.
コメント