I don’t like it when tools purport to explain anything, like when someone suggests GIS “IS” Geography. Its not Geography, GIS is a tool of Geography, like Excel, Word, PowerPoint, and various other ways to present, organize, and analyze data are tools. GIS is an incredibly smart tool, but it can give you some really dumb outputs if you let it. Heck, I’m waiting for some of my maps to make it on the “bad maps” Google Image search. Maps, unlike prose and verbal communication, are similar to charts and graphs, there is an inherent authority to them that written and spoken words lack. I can tell you that the United States lies above the equator, you might even believe, but if I showed you a map. I’m still “telling you” but you’ll question me a lot less.
And these are just facts. Think about something more subjective, like New York City is more expensive than London, England. I can make that statement but you may or may not believe, some folks might even go the extra step and call it a hypothesis and test it. But what if I showed you a map of say, average rent per month per square foot, in 10 cities around the world. And New York City’s circle is larger than London’s. You’d probably believe it. But I just lied to you, or simply led you to the answer I wanted to give you. Who defines “expensive”? I chose the measurement, in this example: rent. Perhaps the average rent was $500 dollars per square foot in New York City and $499 dollars per square foot in London. But, to make my point, I broke the two symbols at $500 dollars. Now New York City “looks” bigger on the map, but its a $1 difference. Maps can lie and there are a few great books out there on the topic (I may even post a book review of one!).
While this post isn’t about lying maps and statistics, it is about managing expectations and challenging studies (somehow I have a reputation for being “contrarian [sic]”). Anytime I see a post entitled “The Geography of [fill in the blank]”, my hairs stand on end. Thus far in my own blog I’ve tried to avoid such arrogance. The Atlantic recently ran an article entitled “The Geography of Happiness According to 10 Million Tweets”. Naturally, I had a heart attack (not really). The article was published a while ago but I’ve felt I’ve been rather negative in the past so I wanted to take a break from critiquing others’ work.
There are some positive aspects to this work, first is the promise of a new analytic tool. Researchers in Vermont have developed a bot (presumably) to sift through rather large datasets (10 million tweets) to score “happiness” by “geography”. The article (which deserves praise for its even-handedness) provides an example of how happiness scores were calculated. Essentially, the bot measures the words in the tweet for “happiness” and “sadness” and scores the tweet. I’m guessing that all the tweets in a given locality are then tallied and a final score assigned (one imagines that its normalized across the dataset to make relative comparisons meaningful). With a properly scoped (translation: caveat-ed) study, this could have some useful applications.
However, there are serious limitations. The article points out some of them, for one, the context of the words aren’t taken into account. With Geography, its all about local context. I’m fairly sure I’ve heard people in Texas (I grew up there) say, “Hot damn, you’re hot shit”. Depending on the inflection this could be mean a) you’re pretty/handsome or b) you’re rather conceited. According to the methodology, however, this would be an overwhelmingly negative sentiment. Comparing Napa, California and Beaumont, Texas, I would absolutely expect to hear people saying “shit” and “ass” all over Texas. Doesn’t necessarily indicate that folks are sad. As the article points out, perhaps “people might just talk about happiness differently in other parts of the country or within demographic groups.”
For evidence of this latter point (demographic groups), the Vermont study found that people of Norwegian ancestry are happier than African Americans. Firstly, I have no idea how they would figure this out based on tweets. If what I think happened is correct, then we may have stumbled onto a Modifiable Areal Unit Problem (MAUP alert!). I’m guessing that when these tweets were geolocated, they were associated with Census Blocks (I HOPE!), or some larger administrative units (zip codes, counties, and so on). With the available Census data (2011), the researchers could probably guess the population group of the tweeter. This is a guess, perhaps they really did contact a few thousand users, track their tweets for a while, and asked standard demographic questions in survey format. Of course the problem with this approach is that people are affected by their environments and we aren’t rooted in the same place. So a Norwegian in St. Paul a daily commuter to Minneapolis tweets, “this whole place is going to hell”. So is the sadder city St. Paul, or Minneapolis? Considering my previous post of millions of Americans commuting, many of whom are stuck in traffic for any length of time (plenty of time to tweet about it!), one wonders how reflective happy/sad is of the place or the environment.
And that’s just in English! As the article points out Spanish wasn’t covered. So that leaves out an entire demographic group heavily represented in that part of the country that’s apparently the “saddest” the South (though California is high on the list).
All of this taken together, lack of context, unclear geographies, missing populations, one wonders what we’re left with. Its impossible to say what’s making people happy or sad, and its not like knowing that Beaumont, Texas is the “unhappiest” place (for English-speaking tweeters with access to the internet and who like to curse a lot) offers any clues. Did the study account for seasonality: “damn its cold”,”i LUV the SPRING”,”SUMMER IS AWESOME HAPPY HAPPY HAPPY”,”autumn makes me want to shoot myself because i know summer is ending”?
If it didn’t account for seasonality, I think a strong case could be made for a future study of environmental influence on happiness/sadness (environmental determinism alert!). However, were I to create this study I would limit the returned tweets based on landmarks and common sights such as “these broken windows make me want to rob somebody” or “clean facades in Potemkin villages make me happy!”. But in all seriousness, the scale of the study should be larger (that is, more local). With 10 million tweets, statistical significance can be maintained in a single urban area. Why not examine happiness and sadness in a single city, broken up by neighborhoods? Get the researchers out there to figure out if the neighborhood is commercial, industrial, or residential. Do the residents live and work there or do they commute? This sort of integrated study emphasizing both the quick fix fancy new tool of social media analysis with “old school” on the ground research might actually produce a worthwhile paper.