Yuma 4×4

Media and Communications

CVPR17 Spotlight: Automatic Understanding of Image and Video Advertisements

CVPR17 Spotlight: Automatic Understanding of Image and Video Advertisements


Good afternoon everyone, my name is Mingda
Zhang and I’m from University of Pittsburgh. Today I am very excited to share our work
on automatic understanding advertisement. Our community has made great progress in visually
recognizing physical content, but for purposefully designed images, there’s more to tell. Specifically, image advertisement employ persuasive
visual rhetoric, and understanding such images requires us to infer what the content implies. For example, in this image, an advanced recognition
system correctly recognizes “airport” and “luggage”, and even generates a reasonable
caption. However, it still misses the point, namely,
this woman is coming back from vacation, she has souvenirs from dead animals in her suitcase,
and the ad is trying to tell us that buying animal souvenirs is cruel. Similarly in this image, a vision system recognizes
the zebra in this hybrid but misses the hippo, which is required to infer that the product
is both fast like a zebra and large like a hippo. And this last example visually references
cultural background. As we just see, existing computer vision systems
are insufficient to capture the rhetoric and decode the meaning of ads. In fact, advertisements use diverse strategies
to convey a message, and we have identified several of these, which correspond to insufficiencies
of current vision systems. For example, ads might employ symbolism, where
objects or regions in this image, like blood, refer to abstract concepts outside of the
image, like injury or death. Other symbols might be speed, danger or strength. As these examples show, associations between
objects and abstract concepts can be quite arbitrary and the same concept can be illustrated
with diverse visuals, thus posing a learning challenge. Other challenges involve understanding physical
processes, semantic contrast, atypical objects, etc. Even with a perfect recognition system, more
than half of ads cannot be understood, but we have identified concrete research tasks
that would help enable more complete ad understanding. To address this interesting problem, we collected
a large richly annotated dataset containing over sixty-four thousand ads, each labeled
by human annotators. We have shared the dataset with the community
at this link. We also developed a companion video ads dataset,
available at the same link. Here is one sample annotation from our dataset. Our annotations cover the topic of ads, the
sentiment it provokes in the viewers, the strategies it relies on, and questions and
answers about what action the viewer is prompted to take. Each ad is annotated by 3 to 5 annotators. Here are some examples of car, clothing and
environment ads. We have both commercial and public service
announcements ads, and diverse ads within each category. Here’s some other examples of ads that cause
the viewers to be alarmed, amused, or disturbed. We observe some interesting trends in the
sentiments ads provoke. For example, “confidence” is used in “car”
and “beauty” ads, but not as common in “restaurant” ads. “Environment protection” ads make the viewers
less uncomfortable or “empathetic” compared with other public service announcements. The core task for understanding ads is to
decode the message behind the image, which we formulate as answering the question “why
should the viewer take the action that the ad suggests?” We collected two sets of human annotations
and reformat them into a single question. Automatically answering this question is challenging
as it requires reading all the signs that the ads designers encoded for their human
audience. For example, according to the purple ad, the
viewer should buy this candy because it’s unique and better than all others; according
to the blue ad, the viewer should stop smoking because it destroys his lungs. A simple baseline achieves eleven and a half
percent accuracy on this challenging task. We also show some initial results on how capturing
symbolism in images can improve the results. Our dataset and annotations are all available
on this link. Thank you for your attention.

Leave comment

Your email address will not be published. Required fields are marked with *.