Social media data is rich, pervasive and omnipresent. With people laying out their everyday light-hearted views – and deepest darkest secrets – for all to see, the amount of data readily available to anyone inclined to look is staggering. This raises important questions around the principles and legitimacy of analysing individuals’ data. Here, Tara Beard-Knowland and Steven Ginnis examine what can – and should – be done to ensure social media data research is as conscientious as it is revealing.
Social media companies, it seems, have been under fire from all sides in the last couple of years. Even the inventor of the World Wide Web, Sir Tim Berners-Lee, described the current era of the internet as a 'downward plunge to a dysfunctional future’, citing one of 2018's most notable events as evidence, the Cambridge Analytica scandal.
In fact, 2018 was a doozy of a year for anyone using social media data, with two key stories dominating the news agenda. In case you've been living under a rock (or are completely ‘off the grid’, in modern parlance), in March 2018 it came to light that a British political consulting firm owned by a hedge fund billionaire, Cambridge Analytica, had used an app created by Aleksandr Kogan to capture Facebook data from both individuals and their Facebook friends via online surveys. What is more, it went on to supply this data to specific political groups – including Trump’s electoral team and the winning Brexit campaign.
The second, slightly more wholesome, event was the implementation of the General Data Protection Regulation (GDPR), an EU law regulating data use and privacy rules for individuals in the EU and European Economic Area (EEA).
As a result, there has been much public debate about the use of social media data.
Tick this box to admit you haven’t read the T&Cs
Social media is a rich and often colourful data source that allows us to understand how people interact with the world, and crucially in their own day-to-day language. At Ipsos MORI, we recognised social media data’s potential early on and have been working with it for many years, successfully uncovering useful and actionable insights for our clients. Analysed correctly, its applications are almost endless.
However, we are very conscious of the ethics of social media research. In mid-2018, we asked people if they were aware that current social media terms and conditions allow the sharing of data for research purposes. Just 45% of people asked were aware that this is happening at an overall level, and 47% knew of the practice at an individual level – up from 38% in 2015. The rub here is that 60% of people felt that individuals' data should not be used for research purposes and a third (32%) thought the same even at an overall level.
However, we also know from our work on Data Science Ethics that the public do see the value of social media research when their data is treated responsibly and the research questions are appropriate. This reminds us that our research should be grounded in what is publicly acceptable, not what is technically possible.
GDPR addresses a number of aspects about the use of social media data and each social media platform has terms and conditions governing the data's use. In fact, platforms such as Synthesio, an Ipsos company , aggregate publicly available information. Synthesio dynamically manages this to ensure that the ‘right to be forgotten’ is possible.
Pragmatism over idealism?
Looking beyond the legalities to the ethics of using social media data specifically for research projects, the balance must be between pragmatism and integrity. Our overarching principle is: 'Would I be OK with this being done with my data? With my parents' or my children's data?' We always encourage people to ask for a second opinion if they're not sure. This tends to be a good litmus test for most projects involving social media data – there's lots we can still do, but we know where we need to be careful.
Having passed this litmus test, we adhere to some key principles in using social media data in how we report findings, bearing in mind that these are generally reported in a static environment such as PDF or PowerPoint where enforcing the right to be forgotten is much more complicated. Therefore, we avoid revealing things about 'participants' that would cause them harm in some way.
There are a wide variety of project objectives, meaning data that would be fine to reveal in some cases might not be suitable – or indeed helpful – to disclose in others. For example, if what is reported could cause someone to become uninsurable because we have revealed something about their health, then this shouldn’t be used. However, if the project is on behalf of a public body about health, then it may be legitimate to do so as long as it is aggregated and anonymised.
If a client wants to identify specific individuals, this should be treated with the utmost care. Ipsos MORI as a rule does not reveal details of any individual with fewer than 1,000 fans / followers and, from forums, never reveals user names at all. When reporting on specific quotes, we don't use quotes from – or could potentially be from – children. And, of course, we never report anything that could incriminate someone.
Finally, when it comes to the raw data itself, it should be stored securely and not transmitted to third parties without serious consideration. If it is shared with the client, this should be as anonymised as possible, with clear written evidence of how the data will be used – and specifically not for potentially harmful purposes such as targeted marketing – who will be allowed access to it, how it will be stored and at what point it will be destroyed after use.
Obviously, not IRL
People often have different personalities on different social media platforms to their own in real life, or only reveal part of their personality. If you consider your own posts, you might see that they display one aspect (curated or uncurated) of yourself. Conversely, other people are no different on social media than in their day-to-day lives. Therefore, in analysing and categorising posts, it is important to remember that what a person posts online does not define them, and that what you are categorising is a piece of content, not a person. Therefore, the key to gleaning real value from social media data is examining aggregated mentions, which paint a more holistic picture.
Comments will be moderated
The discipline of social intelligence is still emerging in terms of what, how and, in some cases, why we are doing it – and will continue to do so for years to come. In this context, we must create and adhere to some simple principles for the use of social media data in research – whether public or private – to ensure that we are protecting the interest of research participants. And that, in the long run, is the best outcome for everyone.
Ipsos MORI’s golden rule of ethics social media research:
Would I be happy for my social media data to be used in this way? For my parents' or children's social media data to be used?
Further guiding principles for analysis and reporting:
- If the topic of conversation likely to be sensitive (e.g. health or politics) consider what additional steps could be taken to anonymise or protect individuals' data.
- Align the purpose of the project with how it will be reported and keep the data anonymised.
- If the topic is likely to attract comment from those under 16, pay extra attention to any verbatims used. To the best of our ability, we should not quote any children in reporting.
- Do not engage in anything that could incriminate an individual.
- Bear in mind that what a person posts does not define who they are – when categorising a post, you are not categorising a person.
- Be careful with the raw data, even if user names and other personally identifiable information is removed. Ask someone outside of the project for an opinion on the reasonableness of any requests (including by clients) for the raw data.
Social media has ushered in a whole new category of business ethics. How does your organisation deal with them? What are your personal opinions?