Social Media Research & the Underbelly of the Internet
[One written for the FACE company blog - hence the slightly different tone. Original story here.]
An underrated skill in social media research is simply knowing what to search for.
Really? You’re interested in a particular brand, so surely you search for that word or phrase, right?
For brands such as Three, Apple, or the AA? Go ahead, give it a go! Just don’t be surprised when you get back a lot of content about “Three ways to boost your Twitter profile”, apple crumble recipes, and AA rated sovereign debt.
Searching the internet
We often describe setting up a social media research search for a brand as like doing a Google search. This is loosely accurate – like Google, a social media research tool ‘crawls’ news, blogs and forums for instances of your keywords. The research tool will also filter social media APIs (e.g. Twitter, Facebook) for instances of these words or phrases.
The problem is that the internet contains a lot more content than you think it does. Normally you never see it, and much of it isn’t even designed for human readers. Here are a few forms this takes:
Challenge 1: SEO spam
Being at the top of Google’s search results is a very valuable place to be if you want to get visitors to your website or online shop. This has made gaming Google results an industry in itself, called SEO: search engine optimisation.
SEO aims to guess Google’s search algorithms to manipulate clients‘ websites to the top of search listings. Google basically rates sites more highly the more in-bound links they have – i.e. the more popular they appear to be. So gaming Google results entails generating a lot of false links and content, with methods including:
- Fake news sites reprinting press releases and “content farms”, e.g. Demand Media or Articlesbase
- Promo blogs with high numbers of links to the client’s site, with random or copy-pasted text to make Google believe they are legitimate blogs rather than spam
- Legions of Twitter bots (automated accounts with an algorithm rather than a person generating their content) posting links to websites
- Using bots or real people (incentivized by micropayments) to post high volumes of blog comments with links to the client’s site
This gives rise to a lot of misleading digital data, all of it only designed to be “read” by Google’s algorithm rather than human eyes. To a reader – or anyone tracking a brand – it is useless. The problem is that it’s got your brand name in it, so any generic brand-name social media search will bring back this “noise” too.

Challenge 2: unexpected content
Do a Google search these days and the internet seems an ordered and relevant place. Even when you search for an ambiguous word with several meanings – let’s say “Orange”, the results are sensible – Orange the mobile company and the Wikipedia page on the fruit.
This is because Google have spent years refining their algorithms to ensure it brings back the most relevant content possible. This doesn’t just mean putting the most popular links at the top of results. Instead Google uses everything it knows about you – your previous searches, your Google profile and Gmail, your stored cookies and more – to deliver personally tailored results.
Social media research tools don’t however work this way. The APIs and scrapers collecting content return all keyword mentions, relevant or not. In searches we’ve run, some of the most unexpected things we’ve found have been:
- Searching for banks will bring back posts on “carder forums” – the sites where credit card fraudsters sell the card details they have stolen from databases.
- Almost everything is a word in Indonesian. You thought you were searching a specific and unambiguous acronym? No, it also means something in Indonesian – and volumes can be enormous because Indonesians are one of the most active populations in the world on Twitter.
- Pharmaceutical searches are near-impossible. Dubious medication sellers will include hundreds or thousands of drugs as keywords on their pages, whether or not they’re selling those products.
Challenge 3: not all relevant content is indexed
So far we’ve described some ways that irrelevant content or “noise” can get into your social media search. There’s also the opposite problem, however – not being able to ‘see’ certain types of social media content, particularly forums:
Message boards are part of the Internet known as the ‘Invisible Web’ and pose many problems to traditional search engine spiders. The dynamic content is usually very deep and hard to search. In addition, many of these sites change their locations, servers, or URLs almost daily presenting special searching challenges
[Boardreader]

This makes it essential to use a social media research tool that allows you to check which forums are tracked, and customize the panel of sources as needed.
Impact on social media research
What this means for social media research is that if you’re using an off-the-shelf monitoring package, you’re probably getting a lot of junk in your results. Brands are often keen on easy usability – type your brand name into the search, and get a volume figure and sentiment stats out. But without tailormade search syntax, those figures are almost certainly meaningless.
So how do you make your social media research search relevant?
1. Specific is better
Using broad search terms and then excluding keywords you’re not interested in doesn’t usually work very well. You’ll never be able to filter out all irrelevant content – language is too varied and dynamic. E.g. if you’re after the mobile brand, search for Orange AND mobile, not just “Orange”.
2. But you can filter irrelevant websites
Not that many big content farms exist – so we exclude everything from them by URL.
3. Also filter particularly spammy keywords
e.g. “Viagra” for anything medical.
4. Boolean search syntax
This is the logic that enables you to search for content including A and B, A or B, or including A but not B. It’s essential not only for designing your social media search strings, but for also searching within the dataset.
5. Test your search terms on Twitter
Enter your search phrases into Twitter Search to see whether you’ve got them right, or if they’re bring back unexpected or irrelevant content instead. Twitter search also helps you understand the volumes of content that’ll come back (is it multiple posts a second, or a couple per day?)
6. Get personal
If you’re specifically interested in what consumers are saying, searching with personal pronouns – e.g.“my iPhone” – will bring back a much more relevant dataset than “iPhone” on its own.
Which is to say, designing a relevant and accurate social media search can involve a surprising amount of time, thinking and ongoing refinement. Few brands have the time or expertise to do this in-house using an off-the-shelf monitoring tool. This is why our clients have come to us instead for our expertise in locating what matters in social media – the signal amongst the noise
Pingback: オークリー サングラス