OpenAI’s DALL-E – Draws Anything at Your Command!

DALL-E images generated based on the question:  “Do an impressionist painting of an Italian family growing up in an apartment with a bathtub in the kitchen in the South Bronx in the 1950s.”   (Click on to enlarge.)

Dear Commons Community,

OpenAI, one of the world’s most ambitious artificial intelligence labs, which has shocked the education world and beyond with its Chat GPT-3 essay writer, also has a program (DALL-E) that lets you create digital images simply by describing what you want to see.

They call it DALL-E in a nod to both “WALL-E,” the 2008 animated movie about an autonomous robot, and Salvador Dalí, the surrealist painter.

OpenAI, backed by billions of dollars in funding from Microsoft, has recently started  sharing the technology with the general public.  

I tried DALL-E with several different requests a few days ago.  As some of you know, in 2020, I published  a novel (Our Bathtub Wasn’t In the Kitchen Anymore under the pen name, Gerade DeMichele) about growing up in the Bronx in the 1950s.  I had a very difficult time finding an image for the cover.  I finally settled on a combination of images that my wife Elaine assembled after days of searching in our photo albums and on the Internet.

In any case, I tried DALL-E a couple of days ago with the request:

“Do an impressionist painting of an Italian family growing up in an apartment with a bathtub in kitchen in the South Bronx in the 1950s.”

The images above were generated by DALL-E  in about 30 seconds.

I was flabbergasted by its results.

A team of seven researchers spent two years developing the technology, which OpenAI plans to offers as a tool for people like graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer programmers already use Copilot, a tool based on similar technology from OpenAI, to generate snippets of software code.

But for many experts, DALL-E is worrisome. As this kind of technology continues to improve, they say, it could help spread disinformation across the internet, feeding the kind of online campaigns that may have helped sway the 2016 presidential election.

“You could use it for good things, but certainly you could use it for all sorts of other crazy, worrying applications, and that includes deep fakes,” like misleading photos and videos, said Subbarao Kambhampati, a professor of computer science at Arizona State University.

A half decade ago, the world’s leading A.I. labs built systems that could identify objects in digital images and even generate images on their own, including flowers, dogs, cars and faces. A few years later, they built systems that could do much the same with written language, summarizing articles, answering questions, generating tweets and even writing blog posts.

Now, researchers are combining those technologies to create new forms of A.I.  DALL-E is a notable step forward because it juggles both language and images and, in some cases, grasps the relationship between the two.

“We can now use multiple, intersecting streams of information to create better and better technology,” said Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence, an artificial intelligence lab in Seattle.

DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system loosely modeled on the network of neurons in the brain. That is the same technology that recognizes the commands spoken into smartphones and identifies the presence of pedestrians as self-driving cars navigate city streets.

A neural network learns skills by analyzing large amounts of data. By pinpointing patterns in thousands of avocado photos, for example, it can learn to recognize an avocado. DALL-E looks for patterns as it analyzes millions of digital images as well as text captions that describe what each image depicts. In this way, it learns to recognize the links between the images and the words.

When someone describes an image for DALL-E, it generates a set of key features that this image might include. One feature might be the line at the edge of a trumpet. Another might be the curve at the top of a teddy bear’s ear.

Then, a second neural network, called a diffusion model, creates the image and generates the pixels needed to realize these features. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.

Though DALL-E often fails to understand what someone has described and sometimes mangles the image it produces, OpenAI continues to improve the technology. Researchers can often refine the skills of a neural network by feeding it even larger amounts of data.

To be sure, DALL-E isn’t perfect.  I tried several other requests using more conceptual keywords such as “adaptive learning” and “online education” and the results were not very good.

Regardless, it is another step forward in the A.I. world!

Tony

Comments are closed.