As ZDNET reported on Monday, stock photography giant Getty Images has unveiled a generative artificial intelligence (AI) image service that it says is “safe” to use because it is trained on Getty’s licensed content library and, therefore, does not run the same risk of copyright infringement as other generative programs.
The announcement follows Getty’s announcement of a generative AI capability in September. At the time, that capability was presented only as a demo, whereas the iStock site is open for business now.
Getty’s service, developed with AI chip giant Nvidia, was unveiled at the annual CES trade show in Las Vegas. The program comes amidst a legal firestorm over copyright infringement, with the New York Times suing Microsoft and OpenAI a week earlier over alleged copyright infringement, and scholars documenting how the image AI program Midjourney could be prompted to reproduce protected images from movies.
Getty emphasizes that its program provides indemnification to users. The content license agreement posted after signing up specifies that “iStock’s total maximum aggregate liability (meaning the total amount iStock is responsible for, whether under this agreement or any other agreement for the same content) is limited to $10,000 US dollars per item of content.” An “extended” indemnification of $250,000 per content item can be purchased as an additional capability.
I took the program, “Generative AI by iStock”, for a spin, using the introductory $14.99 bucket of 100 image generations and found it to be a workable substitute to images created with OpenAI’s DALL-E and Clipdrop by Stability AI.
To get started, I created an account on istockphoto.com, and put in details for a credit card that was instantly billed $14.99. I was then faced with a blank prompt. After entering a prompt, the results showed four images at a time, with each batch of four counting as one of the initial 100 images in the bucket.
I tried the same prompts on DALL-E and ClipDrop. The results from iStock were noticeably less interesting aesthetically and from a narrative perspective, and they were overall rather obvious to the point of being bland. But the images were generally in accord with the prompt provided.
For example, to create an imaginary scenario of apples inside some kind of experiment, I had previously submitted to DALL-E the prompt, “An apple inside of a bottle lying on its side, with apples on either side of the bottle.” That produced a vivid scene of a table full of interesting science-like instruments. The version by iStock is appropriate to the prompt, but far less interesting (see below).
Another wild prompt was used to dramatize an imaginary impossible computer: “An incredibly complex computer the size of a room with hundreds of gears, levers and dials and a digital interface”. In Clipdrop, that prompt produced an intriguiging, detailed scene of a room with various machine parts, with detailed texture and a doorway that had an ominous air to it. In iStock, the result was simply what looked like a concentration of gears, with none of the implicit drama that made the Clipdrop image interesting.
A third example, also in Clipdrop, was meant to dramatize cloud computing as a mysterious realm. I offered the prompt, “Hundreds of tiny workers with cranes building castles in the sky, photographic.” In Clipdrop, that prompt led to a depiction of a construction site, which is focused around a sort of Tower of Babel, an interesting improvisational touch by Clipdrop that went beyond the explicit prompt guidelines.
The iStock rendering, again, had all the elements mentioned but added up to a rather bland, very literal rendering, devoid of any atmosphere or mood.
Obviously, prompt engineering may yield more creative uses of iStock over time. Out of the box, however, its results are fairly dull. The program seems to mostly pick up on the simplest elements of the prompt and stick them in the frame.
There appears to be very little ability to parse complex ideas, such as “Inside of a raindrop as if you are a tiny, tiny person who is seeing all the little creatures that live and work and play in there,” which requires multiple levels of composing elements in a way that is not realistic.
In fact, when a fantastical situation is realized by iStock, the results seem rather degraded compared to more realistic scenarios, as is the case in the prompt, “A fleet of trucks driving up a waterfall outside a fairytale kingdom,” in the illustration at the top of this story.
It’s important to note there are important qualifications and limitations to the indemnification provided by Getty. The content license agreement notes that the coverage stops where the user provides prompts that mention copyright material.
“iStock’s indemnification obligations do not apply to the extent you generate content that includes prompts or inputs that include the names, likeness of real people, trademark, trade dress, logos, works of art of architecture or other elements protected by third-party intellectual property rights that you do not have the right to use,” the agreement states.
I tried out several controversial image prompts that scholars Gary Marcus and Reid Southen have claimed can be used in Midjourney to reproduce copyrighted images. In each case, either iStock produced an image that did not appear to have any obvious aspects of copyrighted material, or the program would not generate an image and produced a warning that the prompt was blocked because it was not compliant.
For example, the phrase “protocol droid from classic sci-fi movie” was used by Marcus and Southen in Midjourney to reproduce images that are almost identical to images of the droid C-3PO from Star Wars. The same prompt with iStock produced several images that look like toy robots, but they have nothing to do with Star Wars.
In another instance, the phrase “man in robes with light sword, screencap” was used by Marcus and Southen to induce Midjourney to produce an almost exact replica of a shot of Obi-Wan Kenobi from Star Wars. In iStock, the same prompt generated not only a refusal to generate an image, but also a warning that the word “sword” was forbidden because it “may violate our AI policy.”
Some brands might slip through the filter, however. I was able to type, “The ZDNET journalists as interstellar superheroes”, and produced images of costumed people with a heroic air about them.