One picture can be turned into a painting in real time in 1 second. How fast is the development of converting text into pictures in the field of AI?

aitubo

Dec 3, 2023 • 5 min read

On November 28, Stability AI released the open source Vincent graph model SDXL Turbo on its official website, which can generate real-time response to text generated images.
Compared with DALL·E 3, Midjourney and Stable Diffusion, SDXL Turbo has a very fast generation efficiency and can fine-tune images in real time. The time to generate a picture is basically controlled within 2 seconds.

According to the official introduction of Stability AI, SDXL Turbo is iteratively based on SDXL 1.0 and uses a new Adversaral Diffusion Distillatio technology. The required image generation steps are reduced from 50 steps to 1 step without damaging the image. quality.

A user of the "X" platform shared that he used SDXL Turbo to generate 256 images in 24 seconds. The user "HylaruCoder" also said that the 4060Ti configuration generation speed was 0.3 seconds.

Did you not expect that text to image technology could advance so quickly?

640 (1).png

A year ago, AI could only generate some rough outline pictures, but now it is no longer a problem to generate pictures in real time. Even comic sequels created by AI have been published and sold and have entered the commercial stage.

On November 22, the work "Heart of the Black Jack Machine" created with the assistance of AI was officially published and sold.

640 (3).png

1 year ago vs now, a huge change in the field of text-to-pictures

The GAN (Generative Adversarial Network) method has long been the standard model for generating images. Since GAN, the AI painting model has gone through several iterations, and DALL·E, Imagen, and Diffusion have been launched one after another.

In addition to the differences between models, from Disco Diffusion to Midjourney to SDXL Turbo, what other changes have occurred in the text-to-image conversion effect?

The picture above is an image generated by Midjourney in August 2022, and the picture below is an image generated by the same set of prompt words at the end of November 2023.

Midjourney generated images in August 2022

Midjourney generated pictures in November 2023

640 (4).png

The above two pictures have the same prompt words: Batman (on the left) and Dwight Schrute (on the right) are in a fistfight in a parking lot in Scranton, Pennsylvania. Dramatic lighting. Photo realistic. Monochrome. High detail.

It can be seen that the most obvious difference is that in scene creation, AI can clearly depict specific scenes, and there is a clear distinction between characters and scenes.

In addition, AI has made great progress in its understanding of sentences and imagination. It can outline the facial details of two characters and show a certain level of aesthetics in composition and perspective.

There are also significant differences in the interpretation of the characters' movements. For example, the picture below is generated with "a dancing man" as the prompt word.

Pictures generated by Midjourney, September 2022

640 (1).jpeg

At the end of November 2023, Midjourney generated pictures

640 (2).jpeg

The current text-to-picture conversion technology effectively avoids the visual shortcomings of pixelation and blurring, and can generate clearer and more realistic images with more contour details. Even the much-ridiculed phenomenon of AI’s “inability to draw hands” has evolved in this year.

Compared with a year ago, while the speed has been greatly improved, the quality of the images generated is not inferior.

On Reddit, many netizens discussed the iterative upgrade of artificial intelligence art.

Some netizens said, "It's like people pointing out that GPT-4 has some minor flaws and inferring that AI will maintain its current capabilities for half a century, but they are wrong time and time again. So never underestimate artificial intelligence. Note".

640 (4).jpeg

Some netizens joked, "Compared to now: I cannot generate the image according to your request because it does not comply with our content policy (copyright)."

Some people also questioned that 1 year ago was a bit exaggerated. It should have been 2 years ago, but there was indeed progress visible to the naked eye.

640 (5).jpeg

How difficult is it to reach publication level with AI painting?

The rapid development of text-to-picture technology has made using AI to create comics an inevitable trend.

Since the announcement of the use of AI to create the sequel to "Black Jack", discussions on AI comics creation have never stopped on social platforms.

Some netizens left a message saying that Japan has a group of the most talented cartoonists, but it relies on AI to publish comics, which is very strange.

Some comics fans on the "X" platform believe that "these characters are perfect because they were created by humans."

640 (6).jpeg

Of course, some people expressed surprise, "AI can write 10,000 plots, but does one express Osamu Tezuka's thoughts?"

But AI-generated comics are not that simple, and the post-production work is far more than imagined.

Some cartoonists mentioned that the "Strange Doctor" project is more like research than comic creation.

In terms of storyline, project team members need to deconstruct the original world view, plot, dialogue, and character settings of the comics and feed them to GPT-4, and then let GPT-4 learn the style and ideas of Osamu Tezuka’s previous works to generate lines and storyboards.

These lines and story content are then handed over to Stable Diffusion for processing. Finally, members of the project team organize and summarize them into works that are qualified for publication.

According to NHK reports, in order to successfully imitate Osamu Tezuka's style, AI drew on more than 6,000 character pictures and identified 65 works of Osamu Tezuka.

Fed with a large amount of data, AI generated the basic plot and character settings, but the final fine adjustments and design were still completed by human creators.

Some cartoonists said that due to the improved resolution of the model, the details of the hands can be generated through fine-tuning, which is a qualitative leap compared to the time when the outline and general frame were incorrect.

What is surprising is not that AI automatically generates comics, but the speed of AI iteration. The brushstroke skills that cartoonists need to hone for decades can be achieved by AI in one year and complete the evolution.

Of course, there are still some shortcomings in AI-generated images. For example, there is no open source software that can maintain character continuity, visually it is easier to generate large panoramas, storyboarding is difficult, scene consistency issues, etc. However, whether it is the successful publication of the AI continuation of "Black Jack" or the AI-assisted comic generation attempted by different players, they have all added a strong touch to the evolution of Vincent.

To what extent can AI assist us at this stage? Maybe every work is the answer.