Building Social Media Collections
Presentation to Management Council
Brian Dietz & Jason Ronallo
Preponderance of Social Media
2.2 Billion Active Social Media Users
30% global usage
In 2015 in the US
65% of adults use at least one social media platform
Percentage of Adult Internet Users, by Platform
- 72% Facebook
- 28% Instagram
- 23% Twitter
Percentage of Young Adult Internet Users (18-29), by Platform
- 82% Facebook
- 55% Instagram
- 32% Twitter
Researchers are Taking Note
In a meta-analysis of studies using data from Twitter, there were least seventeen different disciplines represented in 382 studies spread over six years.
Michael Zimmer and Nicholas John Proferes, “A Topology of Twitter Research: Disciplines, Methods, and Ethics,” Aslib Journal of Information Management 66, no. 3 (2014): 250–61.
Twitter Research Data Grants
- Foodborne Gastrointestinal Illness (US)
- Disaster Information Analysis (Japan)
- Cancer Early Detection Campaigns (Netherlands)
- Modelling Urban Flooding in Jakarta (Australia)
But Why Archive Social Media Data?
Discourse Relevant to Archival Collections
How could we not want to preserve a vast record of everyday life and thoughts from tens of millions of people, however mundane?
Dan Cohen, Digital Ephemera and the Calculus of Importance.
Perceived Value of Social Media Data Among SCRC Researchers
Serious discourse occurs on social media?
45% agreed
22% strongly agreed
(67% combined)
Value in using social media data in research?
34% agreed
37% strongly agreed
(71% combined)
What Does This Content Represent?
Official Records
Everday Experience
Significant Events
Greater Representation in the Archival Record
- Increase diversity of voices in historic record
- Build more representative collections
Engagement With New Communities & Deeper Engagement With Existing Communities
My #HuntLibrary
My #HuntLibrary
- Crowdsourced storytelling
- Multiple Access Layers
- Battles, Voting, Moderation
- Award-winning
Archival Component
That's pretty legit! Appreciate the props #huntlibrary!
My #HuntLibrary User Study
75% listed contributing to the archive as a main motivator for participating.
“Even Better Than Winning an iPad!”
“New Voices and Fresh Perspectives”
2014-15 LSTA EZ Innovation Grant
- Administered by the NC State Library
- Collaboration between SCRC and DLI
- Guidance from Copyright & Digital Scholarship Center
- Significant contributions from student assistants
Project Goals
- Establish groundwork for a social media collecting program at NCSU Libraries
- Develop free, web-based documentary toolkit
- Develop open, easily deployable collecting environment
Not collecting all of Twitter and Instagram!
Historians of the English Civil War are deeply thankful that Humphrey Bartholomew had the presence of mind to save 50,000 pamphlets (once considered throwaway pieces of hack writing) from the seventeenth century and give them to a library at Oxford.
Dan Cohen, Digital Ephemera and the Calculus of Importance.
SCRC Collecting Strengths
Largely focused on NC State History
Identifying Content
- Targeted accounts
- Hashtags
- Keywords
Account-based Twitter Harvests
Colleges and Departments
Student Organizations
And About 460 Other Accounts
Hashtag-based Instagram and Twitter Harvests
Krispy Kreme Challenge
This Data Tells Part of the University's History
Documentary Toolkit
To help other institutions kickstart
their own collecting initiatives
- Environmental scan
- Research value
- Legal and ethical analysis
- Documentation
- Surveys
Contributions to the Profession
Open Source
Technical Requirements for Social Media Archiving Tools
- Social Feed Manager
- Lentil
Technical Requirements
- Social Feed Manager
- Lentil
Technical Requirements
- Social Feed Manager
- Lentil
Technical Requirements
- Social Feed Manager
- Lentil
Technical Requirements
- Social Feed Manager
- Lentil
Technical Requirements
- Social Feed Manager
- Lentil
- Linux:
sudo apt-get install git apache2 python-dev python-virtualenv postgresql libxml2-dev libxslt1-dev libpq-dev libapache2-mod-wsgi supervisor
Along with email, social media will probably provide the main source of information for researchers studying our current time. However, our institution just does not have the resources right now to collect and store the social media of other people or organizations.
NCSU Social Media Archives Toolkit survey of North Carolina Cultural Heritage Organizations
Social Media Combine
Virtualized social media harvesting environment
Server Virtualization
(not desktop virtualization)
Virtual Machines/Virtual Servers
Virtual Machine on Your Laptop
Repurposing Virtualization
Future Plans
- Continued collecting
- Access
- Best practices
- Outreach and campus partners
- Challenges of social media archiving
- Campus collaborators
Brian Dietz
Jason Ronallo
Associated content?
- Linked web pages
- Replies
- Videos and other media
- Retweeting account info
- Engagement metrics
Availability and access
- What is the "whole" dataset if it is constantly being revised?
- How do we redistribute unstable data?
- How can research results be reproduced?