You Are What You Tweet: An Exploration of Tweets as an Auxiliary Data Source

Last fall at MAPOR , Joe Murphy presented the findings of a fun study he did with our colleague, Justin Landwehr, and me. We asked survey respondents if we could look at their recent Tweets and combine them with their survey data. We took a subset of those respondents and masked their responses on six categorical variables. We then had three human coders and a machine algorithm try to predict the masked responses by reviewing the respondents’ Tweets and guessing how they would have responded on the survey. The coders looked for any clues in the Tweets, while the algorithm used a subset of Tweets and survey responses to find patterns in the way words were used. We found that both the humans and machine were better than random in predicting values of most of the variables.

We recently took this research a step further and compared the accuracy of these approaches to multiple imputation, with the help of our colleague Darryl Creel. Imputation is the approach traditionally used to account for missing data and we wanted to see how the nontraditional approaches stack up. Furthermore, we wanted to check out these approaches because imputation cannot be used in the case where survey questions are not asked. This commonly occurs because of space limitations, the desire to reduce respondent burden, or other factors. I will be presenting on this research at the upcoming Joint Statistical Meetings (JSM), in early August. I’ll give a brief summary here, but if you’d like more details on it please check out my presentation or email me for a copy of the paper.

Income was the only variable for which imputation was the most accurate approach, but the differences between imputation and the other approaches were not statistically significant. Imputation correctly predicted income 32% of the time, compared to 25% for human coders and 26% for the machine algorithm. Considering that there were four income categories and a person would have a 25% chance of randomly selecting the correct response, I am unimpressed with these success rates of 25%-32%.

Human coders outperformed imputation on the other demographic items (age and sex), but imputation was more accurate than the machine algorithm. For these variables, the human coders picked up on clues in respondents’ Tweets. I was one of the coders and found myself jumping to conclusions, but I did so with a pretty good rate of success. For instance, if a Tweeter said “haha” a lot or used smiley faces, I was more likely to guess the person was young and/or female. These are tendencies that I’ve observed personally but I’ve read about them too.

As a coder I struggled to predict respondents’ health and depression statuses, and this was evident in the results. Imputation was better than humans at predicting these, but the machine algorithm was even more accurate. The machine was also best at predicting who respondents voted for in the previous presidential election, with human coders in second place and imputation in last place. As a coder I found that predicting voting was fairly simple among the subset of respondents who Tweeted about politics. Many Tweeters avoided the subject altogether, but those who Tweeted about politics tended to make it obvious who they supported.


So what does this all mean? We found that even with a small set of respondents, Tweets can be used to produce estimates with accuracy in the same range or better[1] as imputation procedures. There is quite a bit of room for improvement in our methods that could make them even more accurate. For example, we could use a larger sample of Tweets to train the machine algorithm and we could select human coders who are especially perceptive and detail-oriented. The finding that Tweets are as good or better as imputation is important because imputation cannot be used in the case where survey questions were not asked.

As interesting as these findings may be, they need to be taken with a grain of salt, especially because of our small sample size (n=29).[2] Relying on Twitter data is challenging because many respondents are not on Twitter, and those who are on Twitter are not representative of the general population and may not be willing to share their Tweets for these purposes. Another challenge is the variation in Tweet content. For example, as I mentioned earlier, some people Tweet their political views while others stay away from the topic on Twitter.

Despite these limitations, Twitter may represent an important resource for estimating values that are desired but not asked for in a survey. Many of our survey respondents are dropping clues about these values across the Internet, and now it’s time to decide if and how to use them. How many clues have you dropped about yourself online? Is your online identity revealing of your true characteristics?!?


[1] Even if approaches using Tweets may be more accurate than imputation, they require more time and money and in many cases may not be worth the tradeoff. As discussed later, these findings need to be taken with a grain of salt.

[2] We had more than 2,000 respondents, but our sample size for this portion of the study was greatly reduced after excluding respondents who don’t use Twitter, respondents who did not authorize our use of their Tweets, and respondents whose Tweets were not in English. Furthermore, half of the remaining respondents’ Tweets were used to train the machine algorithm.


Ashley will be presenting this research at the 2014 Joint Statistical Meetings in Boston, MA.

Session: 105

Date: 8/4/2014

Time: 8:30am

Location: CC-213

Watching the Fireworks Explosion on Google and Twitter

When I woke up this morning, I remembered I have the day off tomorrow.  Independence Day (July 4th) brings many images to mind in the United States, but one of the most common, and potentially dangerous, is “fireworks.”  The Nationwide Emergency Department Sample reports that between 2006 and 2010 fireworks-related injuries in the U.S. were most common in July (68.1%), followed by June (8.3%), January (6.6%), December (3.4%), and August (3.1%).  I was interested to see if others were thinking and/or talking about fireworks leading up to the holiday.  Perhaps this would suggest a “population at risk.”

Without a budget and still in my pajamas, I turned to a couple go-to sources for this kind of very cursory look – Google Trends and Twitter.  Google Trends allows you to see the relative volume of search activity on different terms over time and by geography.  To me, it is a really rough proxy of what people are thinking about.  Of course, not all people use Google or even have easy access to it.  Just because they are thinking about something doesn’t mean they’ll be searching for information on it. Even when they are searching on it, there’s no guarantee they are spelling it like I do or even using the same terms. Even if they are searching on the same term with the same spelling, maybe they’re looking for something else.  Still, in about 5 seconds, I can get a glimpse of some interesting trends, and I still haven’t changed out of my pajamas.  If Google might be a rough proxy of what people are thinking about, Twitter may be an equally rough proxy of what people are talking about, with some of the same and some of its own caveats.  To get those results, I go to Crimson Hexagon’s Forsight tool.

Here’s the Google search volume for “fireworks” over the last several years:

“Fireworks” Google Search Volume

Fireworks_Google Search Volume

The big spikes are in July, as I expected.  What about those secondary bumps?  On November 5, the U.K. celebrates Guy Fawkes Night.  Repeating this by country confirms the association:

U.S. “Fireworks” Google Searches Spike on July 4

Fireworks Google Search Spike

U.K. “Fireworks” Google Searches Spike on November 5

UK Fireworks Search Volume

Here are the raw volume numbers from Twitter.  Keep in mind that some of the overall increase here is due to the increase is popularity of Twitter itself over time.

“Fireworks” Total Posts on Twitter

Fireworks Twitter Volume

That second little bump is New Year’s Eve, another big fireworks night and also high on the emergency-room visit list.

And just what are people in the U.S. saying on Twitter about fireworks leading up to July 4?  Forsight’s “Clusters” gives some clues:

“Fireworks” Twitter Post Clusters, 6/27-7/2/2014

4th of July Word Cloud

Digging into a few of those terms makes it clear what many are saying or sharing.  For example, the Boston fireworks show has been moved from the 4th to the 3rd, there are methods to keep your pets calm during the fireworks, and the Onion is still a go to source for some holiday satire.

I’m tempted to dig further into these data, but its time to change into my day clothes and do some survey work.  Stay safe and have a Happy 4th!


RTP180 Series Explores Social Media in the Triangle

On the third Thursday of each month, the Research Triangle Park Foundation hosts panels of local speakers for a community event called RTP180. This month, the topic was Social Media, and speakers included representatives from local institutions and organizations, ranging from private startups to major academic universities to beer breweries.

The atmosphere at these events is always informal and fun, and the 5-minute presentations are just long enough to convey key points, but short enough to maintain the crowd’s interest. Social media is certainly a hot button topic in recent years, and it was interesting to see these organizations share how they utilize social media to convey unique and usable content to their audiences.  Common themes across the presentations focused on how best to brand yourself and your company, and how to reach the greatest number of users, how to select content that is usable to your audience, and how to determine which platforms are the best to use to accomplish your specific goals.


RTP180 attendees network and mingle in advance of the event.

Matthew Royse of Forsyth Tech gave some very specific pointers, gleaned from social media research, indicating the ideal character length for tweets (100), Facebook posts (40), and domain names (8), and suggested that a balance of 60% curated content and 40% unique user content would work best to attract and maintain the attention of an audience. He also noted that timing of posts is critical in reaching your targets, given that 80% of the country is in the Central and Eastern time zones, and many users are online during weekends, a time that businesses often don’t consider to be prime for posting new content.

Amanda Peralta, a Social Media fellow at Duke University, also gave an informative presentation on determining the most suitable social media platforms for a particular organization’s goals. For instance, Facebook is the best match for those who host frequent events or need to provide customer service information, while Twitter is better for conveying current information on events or for those who have niche areas of expertise. Instagram is the best choice for reaching younger demographics when you have a lot of visual content, and especially in cases where users are already Instagramming photos relevant to your location or events.

Several other speakers reiterated some of these themes and discussed how social media has impacted their own business growth. It was clear that social media engagement can play a critical factor in the dissemination of information, in branding and marketing, and in generating interest for organizations. Because it is still a new area that most are just becoming acclimated to, one speaker indicated that social media is much like the “Wild West,” meaning there are few solid rules, and still plenty of time and room for individuals to pave their own way.


Justin Miller from WedPics discusses how his photo social sharing startup has achieved such success.

However, Chris Cohen from Bands to Fans noted that one must be careful of spreading their self too thin across the many available social media platforms. While it may seem beneficial to be present on all forms of social media, he suggest that if you find one or two platforms that are suitable for you, and you consistently and frequently post relevant content, that many times your audience will actually spread that content to other platforms on your behalf, increasing your visibility while still limiting your time and effort. This sentiment was echoed by Peralta, who also suggested to “limit yourself to what you can do well.”

There was such a positive response to this particular session, that some suggested that RTP180 host a social media “bootcamp” so this conversation can continue and grow in depth. The RTP180 events continue on a monthly basis throughout the year and cover various topics that could be relevant to survey researchers and those interested in new technologies. Upcoming events include topics on Big Data and Health; all are free and open to the public, though an RSVP is required and most events fill to capacity quickly. Free refreshments and a post-session meal are offered to all attendees. To find out more about RTP180 and their schedule of events, visit

Future challenges, take-down notices & social media research

I took part in a panel at the SRA’s annual social media in social research conference on May 16th and took the opportunity to reflect on the challenges facing social media research in the future. 


You might be able to explain social media using doughnuts, but what about how we research that behaviour and the data it produces. Three years into our New Social Media, New Social Science? peer-led network we have over 600 researchers in our community and we’ve witnessed an explosion of interest in social media research in the social sciences.  Over the course of those three years researchers from around the world have come together in person and online to share their experiences, frustrations and achievements. We’ve identified a number of challenges.

There is no doubt that there are now more people talking about social media research, it has become part of mainstream methodological debate and researchers are developing new tools for exploring social media data and understanding the social media dimension of contemporary life. It’s hard to find any sector of life where the promise and potential of ‘big data’ haven’t been touted as the next big thing.

But we face a key methodological challenge. I’m struck by the fact that quite simply most social media data is ‘not quantitative data, rather qualitative data on a quantitative scale’ (Francesco D’Orazio) – we have yet to fully address the fact that a high proportion of social media traffic consists of pictures not text. The social science of images and visual data is not hugely well served by current approaches and tools which focus on text and numerical data. There are some researchers leading the charge in this area (see this from Dr. Farida Vis, for example, on the challenges of analysing visual data from social media) but we have much to learn from colleagues working in the digital humanities sphere.


This brings us to the collaborative challenge. I’m confident that the most powerful insight from social media research will come from transdisciplinary efforts drawing on the varied insights and skills of for example statisticians, qualitative researchers, digital curators, information scientists, machine learning experts and human geographers. We have a window of opportunity to forge a new shape and rhythm for our research methods and epistemologies, I’m not convinced we’re yet fulfilling the potential transformative nature of this moment.

We also face profound ethical and legal challenges. In a week when internet search giants have been legally required by an EU court to respect individual’s rights ‘to be forgotten’ we are talking about using social media data for research. We might feel that our social research is a benign endeavour contrasted to commercial harvesting of customer insight data but we all face similar ethical and legal challenges: whose data? whose consent? whose ownership? All complex issues, as shown by our recent NatCen research on the views of social media users about researchers use of their data. We have only just begun to scrape the surface of this debate and meanwhile data is being mined, harvested, analysed and reported in increasing volume. The critical moments which will shape and define the ethical and legal frameworks for the use of social media data will probably not come from social research but from the use of social media data in the commercial world or media realm, these industries practices may shape our future access to research data. Are we engaging enough with these sectors and issues?

And in a world where technology moves fast we face a capability challenge. How many of us are really au fait with the worlds we are researching on social media platforms? Which brings us to the connective or contextual challenge how can we research what we don’t understand or use? We know from our members that many methods lecturers, research supervisors, research commissioners, and research ethics board members do not feel adequately equipped to make rounded, informed decisions about the quality, ethics or value of social media research projects and proposals.

Finally, there is a synthesis challenge, how if at all can new forms of research and findings map onto, elaborate or further inform conventional social research data?

Of course challenges are hard, knotty things to tackle but they also give us great opportunities to really push the boundaries of our practice as social scientists. Social media research needs social science as much as it does data science, it needs anthropology and ethnography as well as big data analytics, it needs to reflect, explore and understand the context and communities which anchor and shape social media data. I’m up for the challenge, are you?

A version of this blog post was posted previously on the NSMNSS blog as well as here.

AAPOR Preview: Contacting Sample Members by Facebook or Email: What Works?

With over 1 billion users sharing data about themselves, their friends, and their tastes, Facebook looks to be a goldmine for social researchers and data collectors.  However, determining how and when to best harness social media sites like Facebook effectively and efficiently has proven challenging.  One area of survey research where social media appears to be potentially beneficial is in locating sample members in a longitudinal study.  Just like you might search for an old roommate on Facebook, survey researchers can search for panel members using Facebook’s integrated search tool.  Unlike when searching for an old roommate, however, survey researchers usually don’t know what a sample member looks like or where they are living.  This can make locating a sample member particularly challenging if they have a common name (and with over 1 billion users even an uncommon name is more common than you might think on Facebook).  One way to overcome this is to search for a sample member’s email address.  An e-mail address is typically unique to a particular person, and is often collected as part of a survey.  This, however, may beg the question “Why not just send them an e-mail in the first place?”

As Joe Murphy mentioned previously I’ll be participating in a panel at AAPOR looking at practical uses of social media in survey research.  In my presentation I’ll be providing results from an experiment that looks at the success rates of contacting sample members via Facebook vs. plain old e-mail.  Recent research has shown that people (especially younger people) are using e-mail less, and instead relying on other communication methods (e.g., texting, tweets, social media messaging).  This may lead one to think Facebook might be a more successful contacting method (at least with some parts of the population).  However, e-mail still remains the dominant communication method of the internet, and many more people have e-mail addresses than Facebook accounts.  So who will reply to our e-mail and who will reply to our Facebook messages.  Attend our panel on Friday and let’s find out.

AAPOR Preview: Practical Applications for Social Media in Survey Research

The social media phenomenon has garnered considerable attention among researchers over the last several years, including those in AAPOR.  The prospect of mining the vast online data that result from social media to answer research questions traditionally addressed with survey data is appealing, but this pursuit can overshadow more practical implications to alter the cost-timeliness-quality equation in the researcher’s favor.  This panel session at AAPOR 2014 will bring together researchers investigating social media has been evaluated for its fitness for use in supplementing one or more parts of the survey process.  These include using social media and search engines to recruit respondents for self-administered surveys and pretesting, using sentiment analysis of social media data to improve questionnaire design, locating and contacting sample members using social media as compared with more traditional methods, and the role of social media in reporting results and forming opinions on the research results themselves.

The panel will include Michael J. Stern (NORC), who will discuss the use of social media and search engines to recruit respondents for self-administered surveys.  Christine Pierce (Nielsen) will talk about social media sentiment analysis techniques to improve questionnaire design. RTI’s Bryan Rhodes will discuss contacting sample members by Facebook and email, providing insights on what works and when. Casey Tesfaye of the American Institute of Physics will present an analysis of the interpretations, misinterpretations and trajectories of published findings, anchored in the “women physics” keyword search on Twitter.  And Craig A. Hill (RTI) will discuss the levels of the sociality hierarchy—broadcast, conversational, and community—and implications for researchers working with social media.  Finally, Josh Pasek from the University of Michigan will discuss and comment on each of the presentations.

The panel will be from 8-9:30 AM PT on Friday May 16 in Platinum 7&8. If you are attending AAPOR, we hope to see you there.  If not, we hope you can follow and join the #AAPOR discussion on Twitter!

NatCen Social Media Research: What Users Want

fryAt the beginning of October 2013, there were reportedly 1.26 billion Facebook users worldwide. The number of Tweets sent per day is over 500 million. That’s a lot of communication happening every day! Importantly for researchers, it’s also being recorded, and because social media websites offer rich, naturally-occurring data, it’s no wonder researchers are increasingly turning to such websites to observe human behaviour, recruit participants, and interview online.

As technology constantly evolves, researchers must re-think their ethical practices. Existing guidelines could be adapted ad-hoc, but wouldn’t it be better to rethink the guidelines for this new paradigm? And what do social media users think about research that utilises social media? The work of the “New Social Media, New Social Science?” network in reviewing existing studies suggests that these questions have not yet been adequately answered.

In response, a group of NatCen researchers are soon to report data from a recent study on how social media users feel about their posts being used in research, and offer recommendations about how to approach ethical issues.

What do participants want?

A key ethical issue participants talked about was consent: participants wanted researchers to ask them before using their posts and information. Although it was acknowledged that “scraping” a large number of Tweets would pose practical problems for the researcher trying to gain consent, users would still like to be asked. Consent was seen as particularly important when the post contained sensitive or personal information (including photographs that pictured the user). An alternative view was that social media users shouldn’t expect researchers to gain consent because, by posting online, you automatically waive your right to ownership.

Participants’ views about consent were affected by other factors, including the platform being used. Twitter, for example, was seen as more public than Facebook so researchers wouldn’t necessarily need to ask for the user’s permission to incorporate a Tweet in a report.

Views about anonymity were less varied. Users felt anonymity should be afforded to all, especially if posts had been taken without consent. Users wanted to remain anonymous so that their posts wouldn’t be negatively judged, or because they were protecting identities they had developed in other contexts, such as at work.

Our participants were also concerned about the quality of information posted on social media. There was confusion about why researchers would want to use social media posts because participants felt that people didn’t always present a true reflection of themselves or their views. Participants noted, for example, how users post pictures of themselves drinking alcohol (which omits any mention of them having jobs or other, more serious things!), and that ”people either have more bravado, and ‘acting up’ which doesn’t reflect their real world self”. They expressed concern over this partial ‘self’ that can be presented on social media.

What does it mean?

Later this year, NatCen will publish a full report of our findings, so stay tuned! If you can’t wait, here’s a preview:

  • Consider that users’ posts and profiles may not be a reflection of their offline personality but an online creation or redefinition;
  •  Even if users are not utilizing privacy settings they still might expect you to ask permission to use their post(s);
  • Afford anonymity. Even if someone has let you know you can quote their username, you should learn how ‘traceable’ this is and let the user know (i.e. can you type their username into Google and be presented with a number of their social media profiles?). It’s our responsibility as researchers that the consent we get is informed consent.

Let us know at NatCen if you would like to receive an electronic copy of the report, or if you have any questions about the study.

Survey: What’s in a Word?

As those of us in the survey research field are aware, survey response rates in the United States and other countries have been in decline over the last couple decades.  The Pew Research Center sums up the concerning* state of affairs with a pretty eye-popping table showing response rates to their telephone surveys from 1997 (around 36%) to 2012 (around 9%).  Others have noted, and studied the same phenomenon.

So what’s really going on here?  There are plenty of explanations, including over-surveying**, controlled access, and a disinterested public.  But what else has changed about sampled survey respondents or their views towards surveys in recent years that might contribute to such a drop?  As a survey methodologist, my first instinct is to carry out a survey to find the answer.  But conducting a survey to ask people why they won’t do a survey can be like going fishing in a swimming pool.

One place many people*** are talking these days is on social media.  In the past decade, the percentage of Americans using social media has increased from 0 to “most.”  I was curious to see how the terms survey and surveys were being portrayed on online and social media.  Do those who use (or are exposed) these terms have the same things in mind as we “noble” researchers?  When we ask someone to take a survey, what thoughts might pop into his or her mind?  Social media is by no means the only place to look****, but there is a wealth of data out there and you can learn some interesting things pretty quickly.

Using Crimson Hexagon’s ForSight platform, I pulled social media posts that included the word survey or surveys from 2008 (the earliest data available) to today (January 8, 2014).  First I looked to see just how often the terms showed up by source.  Here’s what I found:


In sheer volume, Twitter seems to dominate the social media conversation about surveys, which is surprising given that only about 1 in 6 U.S. adults use it. Of course, just because the volume is so high doesn’t mean everyone is seeing these posts.  The surge in volume is quite dramatic late in 2012!  Maybe this had to do with the presidential election?  We’ll see… keep reading!  My next question was what are they saying when it comes to surveys?  I took a closer look at the period before that huge spike in 2012, focusing just on those co-occurring terms that pop up most frequently with survey(s).  I also split it out by Twitter and non-Twitter to see what comes up.


We see according, each, and online for Twitter posts and according, found, and new for all other social media.  Hmm, what could this mean?  Drilling down into each term, we can look at individual posts for each term.  I include just one example for each here just to give a flavor of what the data show:

Twitter 5/08-7/12

  • According to one national survey on drug use, each day…
  • D-I-Y Alert! Nor your usual survey job $3 each – Research: This job….
  • We want you 2 do online survey and research for us. Easy…

Other online media 5/08-7/12

  • Nonresidential construction expected to lag in 2010 according to survey…
  • Surprise! Survey found young users protect their privacy online
  • New survey-Oil dips on demand worry, consumer view supports

Among these sample posts, we survey results being disseminated from several kinds of surveys on both Twitter and other online media.  The Twitter posts, though, seem to have more to do with earning money online than other social media.  Next, I looked at August 2012 to today (January 8, 2014):


Among the other online media, there’s not much change here from the previous period.  People replaces found among top co-occurring terms, but the focus is still on survey results.  For Twitter, we see a new top 3 terms co-occurring with survey(s): earned, far, and taking.  Here’s what some of the Tweets from the more recent period look like:

Twitter 8/12-1/14

  • Awesomest week ever! I earned $281.24 just doing surveys this week :)
  • Cool! I got paid $112.29 so far from like surveys! Can’t wait for more!
  • What the heck – I got a free pizza for taking a survey!

Now, I know that most of this is pure Twitter spam***** and not every Tweet is read or even seen by the same number of people, but I do think the increasing predominance of ploys to sign up people for paid surveys on networks like Twitter is a sign that term survey is being corrupted in a way that, if it does not contribute to declining response rates, surely doesn’t help matters.  They leave an impression and if these are the messages some of our prospective respondents have in mind when we contact them with a survey request, we are facing an even steeper uphill battle that we might have thought.

So, this leads us back to the classic survey methods question: what should we do?  How do we differentiate the “good” surveys from the “bad” surveys among a public who likely finds the distinction less than salient and won’t bother to read a lead letter, let alone open a message that mentions the word survey? Should we come up with a new term?  Does study get across the task at hand for the respondent?  Would adding scientific before survey help keep our invitations out of trash cans, both physical and virtual?

What are your thoughts on the term survey? Leave a comment here, or discuss on your favorite listserv or social media platform.  If you do, I promise not to send you a survey about it!

*scary=the degree to which lower response rates equate to lower accuracy, which isn’t always the case

**Personally, I sympathize with respondents when I get a survey request on my receipt every time I buy a coffee or sign up for a webinar.  “Enough already with the surveys!  I’ve got surveys to write!”

***not all people, and not all kinds of people, but still many!

****A few years ago, Sara Zuckerbraun and I looked at the portrayal of surveys in a few select print news media.

*****Late 2012 appears to have been a golden age for Twitter spam about paid surveys.

Social Media, Sociality, and Survey Research: Community-based Online Research

Earlier, I posted about broadcast communication and conversational interaction, levels one and two in the sociality hierarchy presented in our new book, Social Media, Sociality, and Survey Research. We use the sociality hierarchy to organize our thinking about how individuals use digital media to communicate with each other. Broadcast use of social media includes things like Tweets, status updates, check-ins, and YouTube videos. Conversational use of social media includes using Facebook and mobile apps for data collection; it also includes conducting traditional survey interviews via technology like Skype and Second Life. My final post on our book is about level three of the sociality hierarchy, community-based interactions. Community-based research uses social and interactive elements of social media, like group affinity and membership, interactivity, altruism, and gamification to engage and capture data from research participants.

Four chapters in our book present research that relies on the structure of online communities to collect data. In “Crowdsourcing: A Flexible Method for Innovation, Data Collection, and Analysis in Social Science Research,” Michael Keating, Bryan Rhodes, and Ashley Richards show how crowdsourcing techniques can be used to supplement social research. Crowdsourcing does not rely on  probability-based sampling,  but  it does allow the researcher to invite diverse perspectives to the research process as well as offer quick, fast, high quality data collection. In “Collecting Diary Data on Twitter,” Richards, Sarah Cook, and I pilot test the use of Twitter to collect survey diary data, finding it to be an effective tool for getting immediate and ongoing access to research participants. In “Recruiting Participants with Chronic Conditions in Second Life,” Saira Haque and Jodi Swicegood connect with health and support networks in Second Life to recruit and interview patients with chronic medical conditions. Using existing social networks, community forums, and blogs, Haque and Swicegood were able to recruit respondents with chronic pain and diabetes, but were less successful identifying large numbers of respondents with cancer or HIV. In the final chapter, “Gamification of Market Research,” Jon Puleston describes survey design methods that gamify questionnaires for respondents. Gamification makes surveys more interactive, interesting and engaging. Gamification must be used with care, Puleston warns, because it does have an impact on the data, by expanding the breadth and detail of answers respondents give. More research is needed to determine whether this threatens the reliability and validity of survey response.

The community level of the sociality hierarchy is our broadest category and is likely the type of social media communication that will expand as technology continues to evolve and social media becomes more pervasive. As we discuss in the book, there are clear statistical challenges associated with attempting to understand population parameters with methods like crowdsourcing, which collects data from extremely motivated and technologically agile participants, and Twitter surveys, which access only about a fifth of the U.S. population (or for that matter, surveys of Second Life users, an even smaller community). However, community-based data collection adds a social element to online research, much like ethnography, participant observation, and focus groups, that may improve researchers’ understanding of respondents. Research enabled by online communities may represent the future of digital social research.

Social Media, Sociality, and Survey Research: Conversations via Social Media

This week, I’m writing about the sociality hierarchy, a framework we use in our new book, Social Media, Sociality, and Survey Research, to organize our thinking about how individuals use digital media to communicate with each other. My last post was on harnessing broadcast (one-way) communication, like Tweets, status updates, check-ins, and YouTube videos, for social research. Today’s post is about social media and other digital platforms and methods that allow researchers to engage respondents in a conversation, a more traditional two-way interaction between researcher and subject.

In our book, the examples we’ve compiled about applying conversational methods to social media platforms show how traditional survey methods can be transferred to these new platforms. The book contains four chapters presenting data collection techniques for conversational data. In “The Facebook Platform and the Future of Social Research” Adam Sage shows how a Facebook application can be developed to recruit respondents, collect survey data, link to social network data, and provide an incentive to participating in research.

In “Virtual Cognitive Interviewing Using Skype and Second Life” Brian Head, Jodi Swicegood and I introduce a methodology for using Skype and virtual worlds to conduct face-to-face interviews via the internet with research participants. We find both platforms feasible for conducting cognitive interviews. Skype and Second Life interviews generated observations of many errors in the interviews, particularly related to comprehension and judgment. Particular advantages of these technologies include lower cost and access to a geographically dispersed population.

Ashley Richards and I take further advantage of Second Life in “Second Life as a Survey Lab: Exploring the Randomized Response Technique in a Virtual Setting.” In that chapter, we test comprehension and compliance with the RRT. The RRT depends on a random event (such as a coin toss) that determines which question the respondent must answer. The interviewer does not know the outcome of the event, so the respondent’s privacy is protected. By controlling the coin toss (using Second Life code to make it only look random) we were able to determine that significant numbers of respondents did not properly follow instructions, due both to lack of comprehension and deliberate misreporting.

In our final chapter about the conversational level of the sociality hierarchy, David Roe, Yuying Zhang, and Michael Keating describe the decision process required in building a mobile survey panel to facilitate researchers engaging respondents in a conversation via their smartphones. Key elements of the decision process include the choice to build or buy an off-the-shelf mobile survey app, to design a standalone app or to develop web surveys optimized for mobile browsers, how to recruit panelists, and how to maintain panel engagement.

In the book we take a broad view of two-way, conversational communication and consider it as any application of traditional survey interactions between an interviewer (or an instrument) and a respondent translated to the online and social media context. Our key guidance is to take advantage of the wealth of survey methods literature and apply (while testing!) traditional assumptions in social media and other online environments. Tomorrow I’ll post about research at the third level of sociality: community-based interaction via social media and other digital environments, where users and participants share content, work collaboratively, and build community.