top of page
Writer's pictureBeata Socha

How does gen AI work?

The release of ChatGPT and a plethora of other Large Language Models as well as image-generating bots, including Midjourney winning the Colorado State Fair Prize, are just some of the flashiest examples of AI accomplishments. 


But, contrary to what some believe, the current capabilities of AI creative robots are by no means a revolution. We’ve been working on them over the years, steadily improving, tweaking and adjusting. And training. Lots and lots of training. 


Over the past few decades, artificial intelligence has been crossing more and more boundaries that once seemed unattainable: from “Deep Blue” beating Kasparov in a game of chess in 1997, “Watson” triumphing at Jeopardy! in 2011, through to “Alpha Go” besting a champion in one of the hardest strategic games of all time – the Chinese Go back in 2017. 


These three examples show how, until recently, AI has been used predominantly to solve problems. Human problems, like finding a winning strategy at a human-devised game. 


Gen AI — a gamechanger


But it’s time for a real gamechanger now. The new generation of AI is designed for an entirely different purpose.


Rather than answering the question: “How do I win?” “How do I solve this?”, they are created to provide an array of options and possibilities, to expand our imagination and provide us with skills we may lack. The questions we ask AI today might be: “What would a fractal growing in a crystal ball look like?” (like the image featured at the top of the article, maybe?) “What should I know about, say, driverless cars?” Today’s bots are far broader and general than any previous iterations. 


With the dawn of multi-layered neural networks, like the ones used in modern AI bots, we no longer need to be very specific about what we want them to do. These machines have been designed to go out of their way to understand us. Let's see how it works.


What is artificial intelligence and how does Gen AI work?


The foundations for artificial neural networks, commonly called Artificial Intelligence, were created in mid-20th century. Since then, the field went through rapid advancement as well as periods of “AI winter,” when scientists came across a barrier they couldn’t overcome with the technology available at the time. Then there was a breakthrough and another surge, and so on. 


Theoretical basis is only the first of the necessary ingredients that made Artificial Intelligence possible. The second element was data – the fuel of neural networks. 


The breakthrough in digitization and mass communication we saw happen in early 21st century resulted in vast amounts of data that could be used to power the emerging neural networks. The development of cloud computing was another factor.


Multi-layered deep neural networks — what does that even mean? 


The thing that differentiates deep neural networks from regular machine learning is their multi-layered structure. A classic neural network has three layers: input, a data processing algorithm inside and output. Deep networks contain numerous intermediate layers, which are hidden inside and which make doing the most complex operations possible, producing more valuable insights. 


Deep neural networks are more efficient with every piece of additional information they receive, while classical methods reach a point of data saturation, where their effectiveness plateaus. Andrew Ng, one of the biggest authorities on artificial intelligence, used a simple analogy that deep learning is a rocket and data is its fuel: “the more fuel it gets, the further it can go.”


Dirty data? No problem


What makes deep neural networks so amazing at learning, is that they thrive on ambiguity. Unlike their predecessors, they don’t need features to be precisely defined. They identify these features themselves. That is also why they are more effective when working on dirty data. Meanwhile, for old-fashioned Machine Learning algorithms to work, data needs to be separated from the so-called “noise,” which requires a lot of effort. 


Multi-layered networks can learn to distinguish features at various levels of abstraction. For example, if we carefully train a deep neural network to classify images, we will find out that the first layer trained itself to recognize very basic objects like edges, the next layer is able to recognize collections of edges such as shapes, the third layer trained itself to recognize collections of shapes like eyes and noses, and a further layer will learn even higher-order features like faces.


How do they learn?


Knowing how robots learn to do all these amazing things can help understand them and their capabilities better. Here, we will look at two particularly interesting methods of training AIs: autoencoder and GAN (Generative Adversarial Network). Both methods involve two networks that together train one another to recognize and generate data (e.g. text or image). 


A charades game — training autoencoder


The autoencoder model consists of an encoder that receives an image and tries to “summarize” it using much less information. Meanwhile, its “partner” – the decoder tries to recreate the original based on the summary alone. 


It’s a little akin to a charades game. In the initial phase of this guessing game, the encoder loses a lot of key information and the product that the decoder delivers is nowhere near the original. However, after a number of repetitions, it turns out that the key information provided by the encoder is enough to reconstruct the image. 


Autoencoders try to achieve a result as close to the original data as possible. As a side effect, the features that may differentiate images (e.g. age, sex, skin color, hair length, etc.) must be encoded in the hidden vector and thus becomes separated from the information necessary to recreate the original. These features can be easily read from the vector – as well as changed: we can obtain a picture of the same person with one feature changed,” he adds.  


True or False — training GANs


Training GANs, on the other hand, resembles a “true or false” challenge. One of the networks in this model, the generator, creates a random set of data forming e.g. an image. The other network, the discriminator, is supposed to guess if the image it “sees” is real or the product of the generator. 


We train the generator only to try and mislead the discriminator. The generator is being “rewarded” each time it is able to deceive it. But the discriminator is also learning in the process and is able to more accurately recognize real images from the generator’s “noise.”


So we train both a master lie detector and a master deceiver. In time, the discriminator learns to recognize which image is real, while the generator learns to create very realistically looking pictures. 


This is how we can create photorealistic pictures, and videos, indistinguishable from the images of real people. It is also the technology behind making faces look older or younger. And, incidenally, these are the broad strokes of how gen AI works.


3 views0 comments

Comments


bottom of page