Dall-E & Midjourney: Is AI-Generated Imagery the Next Big Thing?

With artificial intelligence shaking up the world, new tools like OpenAI’s DALL-E 2 and Midjourney allow anybody to become an artist. Entering a series of prompts such as “two bears walking down a futuristic street,” for example, would output an image that illustrates the text of the user input. These tools are having a disruptive effect on graphic art and are already striking a debate among traditional artists about how this may affect the industry’s future. Logically, one of these potential near-term advancements may be systems that animate these generated images.

AI animation could disrupt the entire animation industry.

One of the earliest technological shifts in hand-drawn animation was rotoscoping. This process of tracing over live-action footage helped animators make character movements more realistic. In recent years, motion capture has become one of the most preferred ways to animate 3D character models to shorten production time and make animations more realistic.

DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system loosely modeled on the network of neurons in the brain. That is the same technology that recognizes the commands spoken into smartphones and identifies the presence of pedestrians as self-driving cars navigate city streets.

A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of cat photos, for example, it can learn to recognize a cat. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. This way, it learns to recognize the links between the pictures and the words. When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet, and another might be the curve at the top of a teddy bear’s ear.

Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that, in many cases, look like photos. Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even more significant amounts of data.

Using art history to preview AI art’s future

AI-generated art is analogous to some artists like Andy Warhol, who had workers churn silk-screen lithograph images on his behalf. These screens were often based on pop culture references and pre-existing photos he did not shoot. Working like a factory, Andy Warhol’s workers are what Dall-E is to users. In that regard, automation in art has been a preview of the rise of AI in art with code, now standing in for teams of assistants.

AI images may optimize production work.

Regardless of industry skepticism, rotoscoping and motion capture were eventually widely adopted as acceptable tools of the animation trade. However, in both cases, a human hand was still directly involved in producing visuals. With AI-generated images, the animation process may evolve only to require human input into the backend algorithm and front-end prompts to dictate a scene. Some imagine AI art generation tools as a natural evolution of software like Photoshop and Procreate. Others see image-generating AI as a force with the potential to displace thousands of humans who support their families as professional illustrators. In the case of animation, this shift could cost many animators and special effects artists working in the US their jobs.

Nevertheless, with the rapid growth of AI art systems, it is only a matter of time before they are powerful enough to produce a form of animated film, turning scriptwriters into art directors and ushering in a new phase of animation.

DALL-E can also edit photos. You can erase a teddy bear’s trumpet and ask for a guitar instead; a guitar appears between the furry arms. A team of seven researchers spent two years developing the technology, which OpenAI plans to eventually offer as a tool for people like graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer programmers already use Copilot, a tool based on similar technology from OpenAI, to generate snippets of software code. But for many experts, DALL-E is a cause for concern, as stated before.

AI-powered image manipulation, including spoof imagery of real people, termed “deepfakes”, has become a concern for AI researchers, lawmakers, and nonprofits that work on online harassment. Advances in machine learning could enable many valuable uses for AI-generated imagery and malicious use cases such as spreading lies or hate.
Experts believe researchers will continue to hone such systems. Ultimately, those systems could help companies improve search engines, digital assistants, and other standard technologies and automate new tasks for graphic artists, programmers, and other professionals.