A brief history of broadcasting, technology at the heart of modern image-generating AI • TechCrunch


Text-to-image AI has exploded this year as technical advances dramatically increase the fidelity of art that AI systems can create. While systems like Stable Diffusion and OpenAI’s DALL-E 2 are controversial, platforms including DeviantArt and Canva have adopted them to power creative tools, personalize brands, and recommend new products.

But the technology at the heart of these systems is capable of more than creating art. So-called diffusion is being used by some daring research groups to produce music, synthesize DNA sequences, and even discover new drugs.

So what is diffusion, exactly, and why is it such a big leap over the previous state of the art? As the year winds down, it’s worth looking back at the origins of the spread and how it’s progressed over time to become the influential force it is today. The history of broadcasting isn’t over—improvements to the techniques arrive every month—but the past year or two has brought particularly impressive progress.

The birth of distribution

You may remember deep spoofing apps from several years ago – apps that inserted people’s portraits and videos to create realistic-looking replacements of the main subjects in that target content. Using AI, the apps “implant” a person’s face — or in some cases, their entire body — into a scene, which is often enough to fool someone at first glance.

Many of these applications rely on an AI technology called generative adversarial networks, or GANs. GANs consist of two parts: a Generator Create artificial examples (eg images) from random data and a Biased which attempts to distinguish between synthetic examples and real examples from a training data set. (Typical GAN ​​training data sets consist of hundreds to millions of examples of what the GAN is expected to eventually generate.) Both the generator and the discriminator are each improved until the discriminator can no longer tell which real examples are better than the combined examples. 50% accuracy expected by chance.

Harry Potter and Hogwarts sand sculptures created by Stable Diffusion. Image Credits: Stability AI

High-performance GANs can, for example, create snapshots of virtual apartment buildings. Stilgan, an Nvidia system developed a few years ago, can create high-resolution headshots of fictional people by learning features like facial expressions, freckles, and hair. Beyond image generation, GANs have been applied to 3D modeling space and vector graphics, showing the ability to decode video clips as well as samples of speech and songs.

In practice, however, GANs suffer from several weaknesses due to their architecture. Simultaneous training of generator and discriminator models was inherently unstable; Sometimes the generator “crashes” and produces many similar-looking samples. GANs required large amounts of data and computing power to run and train, making them difficult to scale.

Enter distribution.

How distribution works

Diffusion is inspired by physics – the physics process by which something moves from a region of high concentration to a region of low concentration, like a sugar cube dissolving in coffee. The sugar particles in the coffee initially collect on top of the liquid, but gradually spread.

The distribution systems are borrowed from expansion Incommensurate thermodynamics Especially, The process increases the entropy – or randomness – of the system over time. Consider a gas—eventually it diffuses in random motion to fill an entire space. Similarly, data such as images can be transformed into a uniform distribution by adding random noise.

Broadcast systems gradually destroy the structure of the data by adding noise until there is nothing left but noise.

In physics, diffusion is spontaneous and irreversible – sugar dissolved in coffee cannot return to cube form. But the distributions in machine learning aim to learn a kind of “reverse distribution” process to restore the damaged data, gaining the ability to recover the data from the noise.

Stability AI OpenBioML

Image Credits: Open BioML

Distribution systems are almost ten years old. But a relatively recent innovation from OpenAI, CLIP (short for “Contrary-Language-Image Pre-Training”), has made them more practical in everyday applications. CLIP classifies data – for example, images – at each step of the distribution process to give a “score” based on how likely it is to be classified under a given text query (eg, “a pattern of a dog in a field of flowers”).

Initially, the data provided by CLIP has a very low score, as it is mostly noise. But as the broadcast system reprocesses information from the noise, it gradually comes closer to matching the query. A useful analogy is uncarved marble – like a master carver telling a beginner where to carve, CLIP guides the distribution to a high-scoring image.

OpenAI introduced CLIP with its image generation system DALL-E. Since then, it has evolved into DALL-E’s successor, DALL-E 2, as well as open source alternatives such as stable distributions.

What can distribution do?

So what can models of CLIP-driven transmissions do? Well, as mentioned earlier, they are great at creating art – from photographic art to sketches, drawings and paintings in practically any artist’s style. In fact, there is evidence to suggest that they recall some of their training data with difficulty.

But the talent of the models – although controversial – does not end there.

Researchers have also experimented with using directed diffusion models to model new music. Harmony, a firm with funding from Stable AI, the London-based startup behind Stable Diffusion, has released a diffusion-based model that can extract music snippets by training on hundreds of hours of songs. Recently, developers Seth Forsgren and Hayk Martiros created a hobbyist project called Riffusion, which cleverly uses a spectrogram – a visual representation – of a trained diffusion model to generate audio DTTs.

Beyond the world of music, many laboratories are trying to apply diffusion technology to biomedicine in hopes of finding new treatments for diseases. A team from startup Inventor Biomedicine and the University of Washington trained diffusion-based models to produce designs for proteins with unique properties and functions, MIT Tech Review reported earlier this month.

The models work in different ways. Create biomechanics and increase noise By unraveling the amino acid chains that make up a protein and joining random chains together, guided by the constraints defined by the researchers, to create a new protein. The University of Washington model, on the other hand, starts with a disordered structure and uses information about how protein fragments fit together, provided by a separate AI system trained to predict protein structure.

Image Credits: Pasieka/Science Photo Library/Getty Images

They have had some success in the past. The model developed by the University of Washington was able to obtain protein that is better than the existing drugs with parathyroid hormone – the hormone that regulates the level of calcium in the blood.

Meanwhile, at OpenBioML, a Stability AI-backed effort to bring machine learning-based approaches to biochemistry, researchers have developed a system called DNA-diffusion to generate cell-type-specific DNA sequences—specific parts of nucleic acid molecules in an organism. Expression of genes. DNA replication—if all goes according to plan—generates regulatory DNA sequences from written instructions, such as “a sequence that activates a gene to its highest level of expression in cell type X” and “a sequence that activates a gene in the liver and heart but not in the brain.” .

What does the future hold for distribution models? The sky may be the limit. Previously, researchers applied it to generate videos, compress images, and synthesize speech. This is not to say that just because GANs were distributed, they will not eventually be replaced by a more efficient, more performant machine learning technique. But for some reason it’s the building du jour; Distribution is nothing if not versatile.



Source link

Related posts

Leave a Comment

five + fifteen =