What are Multimodal Projects?

Multimodality is the idea that we can communicate in many different ways. Writer/Designer highlights 5 modes of communication: linguistic, or communication through written word; aural, or communication through sound; visual, communication through pictures; spatial, communication through how things are arranged in space; and gestural, communication through bodily gestures. I’ve selected a few pieces of multimodal text I encountered in my day-to-day Internet surfing that clearly show the connections of the different modes of communication, and my goal is to make clear to you, the reader, the prevalence, sophistication, and interconnectedness of textual modes and how they form a web of multimodality.


The first multimodal text I’d like to analyze is a Buzzfeed article entitled “Behind the Stunning Art and Animation of ‘Kubo and the Two Strings.’” When I first clicked on the link on my computer, I was immediately hit with a visual mode of communication in that there is a beautiful, illustrated gif of what I was soon to find out was from characters from the stop-motion animated film and, in the center, what I can only assume is the director. This interaction between the rest of the article and this non-captioned centerpiece shows the power of multimodality in communication, especially since that man could be anyway yet I’m able to pick up that it’s the director.

The linguistic communication consists of the written article, including the background to the director, the description of the movie, and the interview with the director. It is also used in labeling the rest of the pictures throughout, again showing interplay between linguistic and visual. There are many snapshots of what I assume will be scenes in the movie (highlighting the way movies create a separate, identifiable genre in terms of multimodal communication), and the clear facial expressions signify gestural communication. Also, the spatial arrangement of text being the least wide compared to the movie snapshots that take up the whole width of the screen highlights the importance of the pictures.

The final form of communication, aural, is found in the embedded movie trailer, though the trailer uses all 5 forms of multimodal communication. Specifically, it comes from the spoken word and the background music throughout the trailer, of which is so prevalent in the movie industry that it’s impossible to imagine a trailer without music. Words show us the director and studio, the animation visually communicates the setting as Asian, the empty landscapes give spatial communication as to the place’s vastness, and the facial expressions- through the use of move tropes- clearly denote the villains as the masked people.

My next text is a video of U-M’s Men’s Glee Club performing a recently commissioned piece, 7 Last Words of the Unarmed. I will go through the video chronologically. The visual is immediately present at 0:00: just seeing the Michigan M, that brand, it communicates who the Glee Club is affiliated with. Next, the aural is evident at 0:08 when the conductor begins speaking (although (a) I cheated because I used to be in the Glee Club and thus recognize his voice, and (b) there’s aural communication everywhere because it’s a video about a piece of music). At 0:23, the linguistic mode tells us who is speaking, in this case the composer of the piece.

Time 0:37 is interesting for its spatial arrangement. It’s not immediately obvious what it’s communicating by itself, however with the visual of the bland light, the aural of the sad music and the sung text, and the facial expression of the far singer at 0:48, it’s clear that the spatial arrangement is meant to provoke negative feelings. Also, the sung text combines aural and linguistic, and the spatial mode would have lost its power if the light were, say, bright and warm, again showing how the multimodal modes interact in a conscious, sophisticated, and highly prevalent way.

The entire video contains clear examples of multimodal communication: 1:07 combines aural, gestural, and linguistic to communicate happiness, and 1:29 uses those to communicate fear; 2:04 is the clearest example I can think of in terms of gestural communication, the professor conducting; and around 3:00, it combines the visual and gestural mode of clasped hands, the spatial of a monolith of unity, and the linguistic and aural mode of the rap pleading for unity, which all communicate the message that they’re promoting unity. Without the rap the others wouldn’t be totally clear, and with just the rap it wouldn’t be as powerful. Thus, the multimodality is totally necessary to best communicate the message.


Finally, this assignment asked me to find examples from my daily life. In lieu of having a Tumblr myself (because I’m 100% sure if I got one I’d forget to eat, pass a class, or sleep, or some combination thereof), I get my Tumblr fix through a Facebook page called “Tumblr is gay but I won’t accept it.” You can (and should) go through it if you want, however for convenience I’ve taken a screenshot of a few pieces of text that I want to analyze, focusing specifically on humor because it is sinful to study Tumblr without studying humor, in this case how the humor is conveyed.

screen-shot-2016-09-16-at-4-07-25-pm screen-shot-2016-09-16-at-4-09-10-pm

First, I will examine these pictures. I chose these because they do not have pictures, yet their humor comes from the sensory representation. For this exercise, these are only linguistic communication; however, they cross into visual, and aural and visual territory, respectively, blurring the lines between the five, stark multimodal categories.


Next, I’ll examine these 2 cats. The first one is clearly linguistic and gestural; the cat is funny because of its facial gesturing. The next one is different. This is humorous (in a different way) because the text’s author took a picture and pieced together a human identity for the cat. Using common tropes of the name Margaret, and then character traits for that name (including that she’d wear pink and would be small, unintimidating, and gentle-looking, like the cat) the author made the humor by making the cat a human.

This example serves as the beginning point for a controversial critique I’d like to levy: Writer/Designer missed a category when describing multimodality. I’d like to preface this by saying I’m not denying the usefulness of their framework, or even that I have a better one; rather, I found something that I think complements it. I’ll call this category Cultural Reference, because as you saw, without taking into account the cultural tropes of the name Margaret, that example would not have made any sense. Part of the trope/humor exists in just in the linguistic mode; however, the crucial cornerstone that held it all together was the combination of the trope of the picture with the trope of the name.

I have many more examples, which I’ll first analyze using the original framework, and then add Cultural Reference to show how, without knowing about the tropes, the humor, and thus communication, would not work.


In the first example, the first comment is enough to be funny. The picture is visual humor, and the comment adds linguistic communication to heighten its humor. However, the next comment adds humor because it references the Legend of Zelda games, in which the character on the ground (Link) smashes pots to get rupees, or in-game currency. The third comment is a play on words to add more humor to the picture; however, without knowing that it’s a reference to the song Familiar Faces, it doesn’t make any sense. The cornerstone to the humor is that it references a song that in and of itself holds humor as being used sarcastically, thus adding to its humorous capital. The reference in this picture only adds to its humorous capital. One final piece of reference that makes this funnier and thus better communicates the image is that the vase the boy is holding (the boy is Villager, from Animal Crossing) is an attack in Super Smash Bros that is difficult to control but deadly when it hits, which couples nicely with the final comment. All of this description is to show that without Cultural Reference, multimodality does not capture all of the communication in this piece of text.


Another, clearer example is this picture, which explicitly mentions the “three-way crossover” of pop culture references. The visual mode is humorous because of Harry’s face. The other humor from the picture is linguistic, with the ‘HE DOESN’T EVEN GO HERE!!!1’ making it funnier that he was chosen. Of course, the real humor is that this series of pictures/words come from three separate entities: the Harry Potter series, the Hunger Games series, and Mean Girls. Furthermore, the 1 instead of the exclamation point at the end references some pop reference, perhaps the Ermagerd Girl or the common mocking of the Internet gaming troll (I wasn’t able to find it in my cursory search. Good thing this isn’t a research paper).


This next example is funny (I think), but it’s hard to point out why. The multimodality framework works to an extent, as linguistic and visual modes are clearly working together. However, this becomes truly funny because the shark is representing a trope of an early teenage girl not caring about something. I imagine her texting on her phone, not looking at the speaker causing her to say “Whatever,” and just not caring about things around her. There’s no easily identifiable source of this image, it’s just become built in to our culture, making it part of the Cultural Reference category.


The next picture is funny in its own right. The gestural mode conveys dominance in the bird and fear in the boy, and this mismatch in what we perceive should be the power dynamic makes it funny. The linguistic mode, combined with the Cultural Reference of a Pokémon battle, adds another element to the picture (though in this case I think it’s perfectly fine without the caption).


The final picture is not funny without the Cultural Reference. There is still multimodality, however this multimodality does not communicate humor until you add the Cultural Reference. The visual mode conveys that the man really likes Sun Drop, while the linguistic mode adds to this understanding by giving background to the story. Notice that without the picture the linguistic mode holds much less…something. Credibility, perhaps? However, the boxed response adds a Cultural Reference because it conjures the trope of the early-level math problems. This adds humor, and thus contributes to the multimodal framework.


All of the texts hold some similarities. For example, they all use linguistic mode and visual mode. I’m sure with more analysis they all hold many if not all more, however I did come to the conclusion that all text has more than one mode. The most different are the Tumblr memes and the website. The website uses all modes, while the Tumblr memes never used them all; rather, they all used Cultural References to fully communicate the message, which the website did not.

What I’ve learned from this exercise (besides that I think something’s missing from the multimodality framework) is that multimodality is important because it’s literally everywhere. There is nothing that only communicates in only one mode; even a basic essay communicates its purpose as an essay with both linguistic mode and spatial mode in the form of being double-spaced, the heading, etc. All text is multi-modal, and thus if you pick apart genre you can find the unique multimodal qualities of each and make your text more quintessentially that genre.

2 thoughts to “What are Multimodal Projects?”

  1. This was great, Ben! I absolutely loved your new category: Cultural Reference. I think context, audience, and cultural associations are a key part of analyzing any text. Although the modes are a fantastic way of communicating and expressing ideas, I think the receptor’s side needs greater consideration such that if you are trying to communicate with someone who does not understand that there could be humorous cultural associates with the name “Margaret,” the post would not make sense at all. For example, I can assure you that my grandmother would not understand that there is anything humorous about the name “Margaret,” or how this grumpy cat could possibly reflect that name.

  2. I love how you tried to tackle such a complex subject as humor with this post. It is interesting especially how you put together the many elements of tumblr into an almost scientific view of how we laugh. It’s also interesting how you define Tumblr as a sort of culture and show how that permeates the use of humor in these posts.

Leave a Reply