Tuesday, 12 May 2020

NVivo Cantando!

NVivo is a popular software package for qualitative data analysis. Stephanie Jesper takes a topical look at it ahead of our NVivo Digital Wednesdays session next month.

An empty NVivo all ready to be filled

This week would've been the week of the Eurovision Song Contest: one of my favourite weeks of the year. But a certain global pandemic got in the way. So instead I'm spending the week playing with NVivo. It's not the same. Still, I'm keen to make my NVivo play as interesting as possible in every way that I can... maybe I could liven it up with a little Eurovision-related qualitative data analysis?

The 2020 contest may have been cancelled, but what's another year I could play with? My number one Eurovision Song Contest is 1977 (there were some really wild dances that year), but it's probably better to choose a contest with a more famous winner. And I believe pretty-much everybody knows the winner from 1974 so let's go with that...

NVivo is a qualitative data analysis tool. Most data analysis is quantitative: it's about counting numbers. And spreadsheets are really good at that. You can throw in a load of numerical data and get really quite sophisticated analysis at the touch of a button. But a lot of data we get is in the form of text; of words. And that sort of thing is a bit harder to automatically analyse in a meaningful way. NVivo is a tool to facilitate that analysis.


The first thing NVivo needs is some data. You can import all kinds of everything into NVivo: the spreadsheets you've collated, the bibliographic data you've amassed, the voice recordings you made when you were conducting interviews... all kinds of other materials you might want to analyse like emails, tweets, transcripts, video... or in our case song lyrics.

I've sourced the lyrics to all the 1974 Eurovision Song Contest entries (translated into English where necessary) and I've imported them into NVivo. Now what?

Frustratingly, it's not just as simple as saying "Hey, NVivo, my love: shine a light on these texts. I wanna know all the juicy details". NVivo isn't that clever. It's not an artificial intelligence tool. It's more like a glorified highlighter pen that can add up. You're going to have to do a lot of the hard work.

But that's no reason to go running scared from NVivo. Helpfully it's been built to look like a Microsoft Office application, so that makes it a bit easier than it could be to navigate. And down the left-hand side of NVivo's navigation pane are three important subsections: Files, Codes, and Cases. The first of these is relatively straightforward: we've just imported a load of files. But what are these codes and cases?

Cases and classifications

It's important to stress that NVivo's a pretty open environment and you can use these fields how you like, but there are some standard principles. We'll start with cases. Let's say you've done several interviews with different people. Each person might be considered a "case". You might've interviewed them twice so there'd be two files associated with them (or maybe even more), but they're the one case.

Files and cases also have associated "classifications". These are your metadata. File classifications may be things about the file itself: what type of file it is, when it was recorded, etc. Case classifications are the demographics of your case: maybe you interviewed some great operatic diva from the stage, some jazz heroes from the local club, and some rock'n'roll kids from satellite TV: here's where you'd put all the useful background information about them. In my case I'm putting in here the information about the songs: artist, country, score, placing, etc.:

I've linked my files to my cases, and added case classifications

These classifications are useful because they offer an extra layer of potential analysis with which to toy (do the songs sung in English perform better than the songs sung in other languages, for instance?).

Codes and nodes

And then there's the codes. These are where most of the action happens in a tool like NVivo. And it's action that is very much on you. There are ways to automatically code in NVivo but you'll miss a lot if you do that. Or get a lot of stuff you don't need. NVivo isn't some magic fairytale tool. You're going to have to go through all your files and manually code them up. This involves making your mind up about what approach to take. Is there a pre-existing set of themes or categories you could apply, or are you just going to work from the bottom up, tagging things as you see them? Which method works best in your eyes?

Here I've tagged up the winning song from 1974: Abba's "Waterloo":

Tagging up Waterloo: coding strips show where certain nodes are being applied

I'm working very much bottom-up: I've noticed certain themes and I've created a "node" for each one, e.g. "Love", "War", etc. I've even nested some nodes beneath others ("War", I've decided, is a subset of "Society"). Again, how you do this is up to you.

Another decision I've had to make is whether I mark up the refrain: does a repeated chorus count as a repetition of imagery, or does it just skew my analysis? Also, does a "la la la" count as musical imagery worthy of coding? You'll be faced with a lot of questions like this. You might want to save several copies of your project as you go, in case you change your mind about anything.

...and in case NVivo crashes. Which it did for me as I was coding up. That's why I have a file called "esc74 (Recovered).nvp". "Why me?" I despaired. I didn't realise how much this crash would rock me. I was about to cry at the frustration of having to do all that coding again, only teardrops were thankfully spared when NVivo persuaded my file to rise like a phoenix. I let out a little "hallelujah" such was my euphoria.

Exploring the data

Coding took a while. And I didn't do a particularly good job of it. Still, once it was done I could start on the analysis. There's a whole arcade of tools to play with in NVivo...

A wordcloud from Eurovision 1974: 'love' is the biggest word. 'sing

A simple thing that didn't need any coding up was this wordcloud. The words "sing", "one", and "lala" fly on the wings of "love", with "Waterloo" also quite obvious in the mix.

But now we've coded up we can look at other things too. Here's the nodes for "love" and "war" plotted against the "language" case:

Love versus War: 'love' is the dominant theme in all languages except Serbo-Croatian

You probably have to play with the analysis tools a bit to get something really telling from the data, and think about things you want to explore in more detail. But you can get counts and cross-tabulations on all your codes and cases, and one of those combinations might be the revelation you're looking for. Personally, I'm rather fond of this particular visualisation:

What comes before and after the word 'Waterloo' in the song 'Waterloo'?

I've only scratched at the surface of what's possible with the help of NVivo. If you're interested in finding out more, there's our Research data Skills Guide, but we're also doing an "Intro to NVivo" demo on Zoom as part of our Digital Wednesdays research theme this term. That takes place at 2pm on Wednesday 3rd June, and is open to all members of the University. In lieu of this year's Eurovision Song Contest, it may be the best gig taking place this year! Failing that, you could always shove this text into NVivo and see if you can code up all the winning songs I smuggled into it. There's 30 to find...

No comments:

Post a comment

Anybody can comment on this blog, provided that your comment is constructive and relevant. Comments represent the view of the individual and do not represent those of The University of York Information Directorate. All comments are moderated and the Information Directorate reserves the right to decline, edit or remove any unsuitable comments.