Tagged: social media research

Where did she go? Collected blogging over the last 6 months

A few months since I’ve updated this blog. I’ve been meaning to write a post on ‘personal content strategies’, aka “How do you decide what to post where?” But the truth is I still haven’t figured that out. Here’s an aggregate of everything I’ve been up to in the last six months. [Photos to follow when I’ve fixed WordPress…]

1. March to May I was in New York with work, primarily working on a social TV project for Tumblr.

What we found: content lives longer on Tumblr. Like, really long! Not only is Tumblr the home of fandom where conversation sustains between episodes, the longest-living pieces of Tumblr content (animated GIF series) are practically immortal and circulate indefinitely as the hefty emotional punch they pack is re-lived, re-discovered and re-contextualised.

Press coverage in Ad Age, Econsultancy, Business Insider, Lost Remote, Media Bistro

2. I wrote a few things for the work blog

All more in a ‘content marketing’ vein rather than passion pieces, aka they’re solid and put forward the company’s POV on the value of social media research. These things do alright within the market research Twittersphere but don’t break through into Plannerland in the way that more cultural analyses of social media activity can.

3. More interesting blogging: I also wrote the end of my colleague Robert Parkin’s blog post about How To Detect Communities Using Social Network Analysis.

These ideas developed turned into a Tumblr post where I talk about how “Brands need to think more about how they can give to their communities” and the importance of opening up research as a way to share value.

A nice reply from Kenyatta continued the discussion – he ran the Dr Who Tumblr for BBC America for a time, so what he doesn’t know about working with fan communities isn’t worth… etc.

“It’s the stuff we do in supporting tv, music, and sports fandoms at everybodyatonce: find out where the fans are (both passive and active, existing and potential), find out all the different ways that they’re connected, and use that information as a map for creating or strengthening the edges between different clusters of nodes. While we don’t make these networks explicitly visible in the ways that hautepop suggests, we find other ways to surface the fandom to each other. Holding up a mirror in the form other fans is usually more empowering than showing them their own graph, but as this kind of information becomes more commonplace, that may change.”

4. Tumblr is where I’m mostly blogging these days. And funnily enough, I’m mostly blogging about Tumblr and its weird network effects. Key posts:

i) How do you find the connectors & influencers on Tumblr? Where someone asks a question, and I unpack all my tacit knowledge. The key point: “follower numbers matter a lot less than your position within Tumblr social networks – that is, it’s about community.”

ii) Anatomy of A Tumblr Trend: the semantic network map. Where I use Tumblr’s ‘related tags’ feature to map how Tumblr fashion subcultures are connected:

To put it bluntly, this is an astoundingly cool methodology I came up with. It finds a network where you might not expect one – not between who follows whom, or who retweets whom (as we’re familiar with from Twitter network analysis) but simply between ideas – subcultures, memes. This is why I call it a ‘semantic network map’ – a map of meanings. It gives us a whole set of new, quantitative tools to use what otherwise looks like a very fuzzy, qualitative thing – culture.

It is of course mediated through the Tumblr algorithm & its ‘term frequency – inverse document frequency’ weighting. So this modifies how far we might want to say this method gives us access to”folksonomy” – folk taxonomy.  It’s not an entirely “pure” view of how people think these concepts are connected, even though the algorithmic mediation (which prioritises links between tags that occur a lot together, but don’t appear a lot in the dataset overall) may very well be the best way to get sense out of the connections – avoiding the “all tags short-circuit back to Fashion” problem.

Anyway. If you find this interesting, come talk to me at hautepop@gmail.com, I’d love to discuss how this can be taken further.

5. Speaking of Tumblr subcultures, I’ve also been image-blogging over on Pinterest and a new Tumblr, street-goth – which I started in order to play directly with these subcultures, to find out how they work from the inside. Also to work out what I wanted to wear this autumn. Here’s my intro blog post, and do follow along if you want a lot of moody  black & white images of clothes.

Debating a rename to hautegoth, mostly because personal brand (and because what I’m blogging goes beyond the very now street goth menswear look) – but also out of a certain curiosity… Can I spark a recognisable Tumblr trend? Watch this space.

6. Nearly forgot! Five speaking events in the last six months, that’s fair going:

  • Debcon in Boston, on “perverse media studies” (a theoretical piece on the implications of ‘sticky’ and ‘viral’ properties)
  • moderated a panel at Theorizing The Web in NY called Ref(user): Moments of Resistance, about politics of social media interaction
  • London Social Media Cafe, on How Videos Go Viral (hint: it’s all about the network)
  • Marketing Week Live (25 June), viral video again
  • Connected World (10 July) on a panel with Tom Ewing & Paul Edwards about ‘cutting through the noise’


7. & finally I should have a press release coming out this month with Live Nation for a nice little statistical project demonstrating the relationship between social media activity and ticket sales. More TBC…

Reblogs and content sharing on Tumblr: a personal network analysis

First posted on hautepop.tumblr.com (of course!) on 16 December 2012

Tumblr is a weird social network.

Like Twitter its content is (very largely) public, and yet like Facebook it’s opaque to social analysis. Follower counts aren’t public or accessible. Ditto who’s following each blog (you can’t even really see all your own followers), and you don’t know who’s following anyone else either. Tumblr Analytics does exist, but not for ordinary users – instead you have to pay Tumblr partner Union Metrics $499 per month and what you get appears to be top-level and aggregate. (since corrected: it’s awesome, complete & exportable)

This leaves Tumblr a kind of “here be dragons” among social networks, which is unusual in an age so obsessed with them. That is, its social norms are not known; there isn’t any data about how its users behave and use the network (even for most Union Metrics subscribers there are no benchmarks). Rumours spread – Tumblr’s young, it’s photo-y and meme-driven, it’s full of weird tightly-interknit subcultures who reblog each other endlessly – but there is no data to support this, no way of discerning fact from – at best – intuition about one’s little own corner of the thing.

This is a research problem, and I just so happen to be a social media researcher working a company (FACE) which builds its own social media research platform (Pulsar). *

But even that doesn’t make Tumblr analytics easy. Social data firehose provider Gnip (one of the two sellers of the Twitter firehose, alongside Datasift) do provide a Tumblr firehose to companies like mine – that is, an API stream of all the posts on Tumblr. Great! Except, as they actually note in their post on Taming the Social Media Firehose, there are complications:

1. The main way of querying the API is through keyword-based search, looking for words or phrases in the body of the post, or titles or tags. But 84% of posts are photos, and most don’t contain any or sufficient keywords to identify what’s in the image. Consequently only 20% of content can be identified with text-based filtering, leaving 4 in 5 posts “dark”. **

2. Do you want to understand patterns of sharing content on Tumblr? How far do posts travel, what’s the average reblog rate, and of course who are the key social hubs for content diffusion? But analysing reblogs is a more manual process:

“There is a list of all of the notes (likes, reblogs) associated with a post appended to that post wherever it shows up on Tumblr. Each post activity record in the firehose can contain reblog info. It will have a count, a link to the blog this entry was a reblog of and a link to the root entry. To build the blog note list that a user would see at the bottom of a liked or reblogged entry, you have to trace each entry in the stream (i.e. keep a history or know what you want to watch) or scrape the notes section of a page.”
[Gnip, Taming the Social Media Firehose]

Which is to say understanding reblogs is doable, but a hassle. It seems to work most easily with a priori identification of what you want to track. If you want to analyse the network in order to work out what’s most important – this would seem to be a little harder.


If analysing patterns of reblogging is a manual process, then, let’s do it manually. I’m lucky enough that my Tumblr provides a nice starter-size dataset. By being Tumblr Technology spotlighted I’ve acquired a fairly generous total of 38,700 followers. This audience scale (though likely largely inactive) means that some of my content has picked up enough reblogs to make mapping the pattern of them interesting.

The images below show who’s reblogged two of my posts: Adidas and Chanel. Both are playing with a Tumblr trend I’d seen of people putting brand logos on apparently non-related images – see more here. As I hoped, I played with the tropes of this meme well enough to get some reblogs – 62 for the Adidas post, and 64 for the Chanel one. Far from Tumblr viral, but also manually analysable in an evening…



5 topline findings:

1. Both posts were reblogged 3x as much as they were ‘liked’

The ratios are 71.3% reblogs for Adidas, and 76.2% for Chanel.

Hypothesis: photo posts may get a lot of reblogging – jokes & quotes somewhere in the middle – and I reckon text-based posts (and links) may perhaps get more likes than reblogs. Images are more diffuse in meaning yet may also be more emotionally evocative, lending themselves to reinterpretation and reuse through reblogging more readily than monovalent text.

2. Does content diffusion occur through hubs – that is, are there any influencers in my network who spur much of the reblogging?

Some variation here. For the Chanel post, there are no big hubs:

  • My original post gains 35 direct, first-hop reblogs out of the 64 total – that’s 55%.
  • 2 users, c-rystalcastles and my0wnstunts, get 3 reblogs for their posts
  • 5 users get 2 reblogs and 13 users get 1 further reblogs
  • Meaning that 67% of the rebloggers gained no further reblogs themselves

For the Adidas post however:

  • Original post gets 24 out of 62 total reblogs
  • skullc0de (a 17-year-old German girl and streetwear brand fan) is a hub, driving 10 further reblogs
  • 3 users gain 4 reblogs each, 2 users get 3 reblogs, and 10 users get 1 further reblog each
  • Again, about 73% of rebloggers gained no further reblogs themselves

Which suggests:

  1. The diffusion of these posts tapped into no hubs more influential than myself – as I’d expect, given my decent audience size plus some hypothesised ‘author bias’ that means you’re more likely to share an original post you see than a reblogged one.In Twitter terms, I wasn’t reblogged by a William Gibson or Warren Ellis amplification type who would spark many more reblogs than the original author directly achieved.
  2. If you’re Tumblring for attention (and in some way, most of us are), reblogging a post doesn’t get you that much attention. In this micro-dataset, 70% of reblogs don’t spur any further reblog-engagement.
  3. Rate of reblogging very likely to correlate with audience size.

3. How far content travels: 6 hops in 80-something shares

By this, I mean that content is reblogged a maximum of 6 times beyond me for each post, e.g. A reblogs it from me, then B reblogs it from A, and C from B etc. This is the same for both posts, which is interesting.

Each network also has a few instances of content being reblogged 4 or 5 times. This suggests a relatively ‘fluid’ motivation for reblogging – overall, approx. half (45% or 61%) of people aren’t reblogging direct from the content creator. This suggests that a relationship with the author isn’t a compulsory motivation for reblogging in this case – liking content, whether direct or via a friend, is just as big.

(However, this may be skewed by my audience size relative to followers.)

4. Only one user reblogged both posts

Hypothesis: I don’t have a consistently engaged audience for this kind of content.

I can see I do have a consistently engaged audience in terms of likes on text-based posts, but image-sharing seems more fickle – just what you snatch in the moment from the stream. I suspect that teenage image-bloggers are following a lot of people to build up their own audiences (as evidenced by ‘please reblog’ and ‘team followback’ type posts), and as such only see a small proportion of their followers’ total content.

I can also intuit this by the sequence of reblogs – those via a particular source are clustered closely together, suggesting there was just a brief burst of time when it showed up in their followers’ streams. That said, I wouldn’t want to underestimate how many hours some Tumblr users are putting in on the platform.

5. There are missing links

The network visualisations immediately look wrong – there is one author of the original post (me), so all the nodes should come from one central point (me). But as you can see, for the top post (Adidas) there are in fact two entirely separate network graphs, whereas for Chanel below there is a weird unconnected isolate floating off to the left.

In the list of reblogs on the original post, I can see that “elements234 reblogged this from uggzm1nt” without uggzm1nt having ever reblogged it from me previously. Weird! I suspect reblogs via users who subsequently delete their profiles are themselves deleted from the post engagement data. e.g. A user Alice who reblogged the post from me, then uggzm1nt subsequently reblogged Alice, then Alice deleted herself.

While the right thing to do from a userdata point of view, the social network analyst has to hypothesise ghost users at critical bridge points in order to make the network make sense. Hmm.


So. Still no benchmarks – still not even an accurate picture of my overall Tumblr network, or any knowledge of how my normal types of posts get shared. (Generally, much less – say 10 likes & 5 shares.) But quite interesting – in summary, analysing these two posts shows:

    Tumblr is harder to analyse than Twitter

  • Keyword analysis only gets you 20% of it – you need network & image analysis to really understand it
  • Though popular for me, these posts were shared by only a tiny proportion of my nominal audience (followers total) – about 0.4%. Though I’m probably not typical
  • People may reblog an image post at 3x the frequency they ‘like’ it. The currency is sharing, not hearts
  • Rebloggers aren’t ultra-loyal – they reblog for the sake of the content, not the author
  • Tumblr users delete themselves and leave holes in your network data, the rats!

Next steps are of course to analyse all my Tumblr posts, all YOUR Tumblr posts, the most reblogged post ever (11,184,542 notes!) or all the posts getting over 1000 reblogs in X period of time.

With any luck I might be doing some of this in 2013 at FACE with Francesco D’Orazio (and a dev team, and firehose access…). TBC?


* If you would like to talk to us about social media research more seriously, get in touch: Jessica@Facegroup.com.

** Gnip comment they can do character- and object-recognition in photo posts – nice! But I think it’s fair to reckon that this is difficult and limited, so doesn’t make the other 80% of posts easily transparent or knowable.

Mapping the Brand Graph: a study of @O2’s Twitter audience

Another post via the FACE company blog – see the story in full here: Mapping the Brand Graph: a study of the O2 audience on Twitter (FACE and O2 @ Warc #Datacentric 2011, London).

This has been some of the most interesting research I’ve done all year and certainly the most technically challenging, so I wanted to share it here too.

In short, FACE and O2 presented at the WARC Datacentric conference in December 2011. To quote Fran’s write-up:

The objective of the O2 Brand Graph pilot was to mine social media data in a way that would allow us to connect it to audience studies. What follows is an initial exploration of how we can you use social media to augment a segmentation model with real-time data.

Whilst tracking social media by keywords allows us to get an understanding of how a specific topic is discussed online, tracking social media by users allows us to build a map of an audience, its hubs, its behaviours and its interests.

We called it the Brand Graph: the conjunction of the Social Graph (defined here as the network of people who are within 2 degrees of separation from the brand through social media channels) and the Interest Graph (the network of interests, topics, activities and behaviours associated with the nodes of the social graph).

What can you do with it?

  • Dynamically understand who your audience is and how is it changing, in real-time;
  • Dynamically understand what your audience is about, what makes an interesting topic and how broader cultural conversations affect it;
  • Segment your audience in clusters based on topics of interest, passions, life stages, professions, online behaviours etc.;
  • Plan and fine tune the content of your social media strategy;
  • Engage with your audience in the right way (channels, mechanics, times of the day, tone of voice etc.);
  • Assess the impact of your strategies in real-time.
  • Going forward, we see the brand graph becoming one of the key tools to build a seamless connection between your brand and its audience

So, how did we go about building the O2 Brand Graph?

Sample: We defined our sample as the entire audience of O2 on Twitter, i.e. 58.339+ Twitter users who were following @O2 (as of November 2011).

Methodologies: Statistical analysis, Semantic analysis, Network analysis, Netnography and Content analysis.
We then analysed the static data of 58,339 profiles on Twitter gathering insights around 10 key dimensions:

  • To get this information we had to map 58,339 users following @O2 and who was following each of the 58.339 users.
  • We ended up plotting a graph of 1 million nodes, 1 million primary connections and 574,278 horizontal connections within the graph.
  • We then analysed the static data of 58,339 profiles on Twitter gathering insights around 10 key dimensions.
  • Finally, we analysed 3,120,371 public tweets, 122,220 tweets/day (avg), generated by the @O2 followers over one month (November 2011).

[Source: Mapping the Brand Graph: a study of the O2 audience on Twitter (FACE and O2 @ Warc #Datacentric 2011, London).]

Here’s the conference presentation:

Social Media Research & the Underbelly of the Internet

[One written for the FACE company blog – hence the slightly different tone. Original story here.]

An underrated skill in social media research is simply knowing what to search for.

Really? You’re interested in a particular brand, so surely you search for that word or phrase, right?

For brands such as Three, Apple, or the AA? Go ahead, give it a go! Just don’t be surprised when you get back a lot of content about “Three ways to boost your Twitter profile”, apple crumble recipes, and AA rated sovereign debt.

Searching the internet

We often describe setting up a social media research search for a brand as like doing a Google search. This is loosely accurate – like Google, a social media research tool ‘crawls’ news, blogs and forums for instances of your keywords. The research tool will also filter social media APIs (e.g. Twitter, Facebook) for instances of these words or phrases.

The problem is that the internet contains a lot more content than you think it does. Normally you never see it, and much of it isn’t even designed for human readers. Here are a few forms this takes:

Challenge 1: SEO spam

Being at the top of Google’s search results is a very valuable place to be if you want to get visitors to your website or online shop. This has made gaming Google results an industry in itself, called SEO: search engine optimisation.

SEO aims to guess Google’s search algorithms to manipulate clients‘ websites to the top of search listings. Google basically rates sites more highly the more in-bound links they have – i.e. the more popular they appear to be. So gaming Google results entails generating a lot of false links and content, with methods including:

  • Fake news sites reprinting press releases and “content farms”, e.g. Demand Media or Articlesbase
  • Promo blogs with high numbers of links to the client’s site, with random or copy-pasted text to make Google believe they are legitimate blogs rather than spam
  • Legions of Twitter bots (automated accounts with an algorithm rather than a person generating their content) posting links to websites
  • Using bots or real people (incentivized by micropayments) to post high volumes of blog comments with links to the client’s site

This gives rise to a lot of misleading digital data, all of it only designed to be “read” by Google’s algorithm rather than human eyes. To a reader – or anyone tracking a brand – it is useless. The problem is that it’s got your brand name in it, so any generic brand-name social media search will bring back this “noise” too.

Challenge 2: unexpected content

Do a Google search these days and the internet seems an ordered and relevant place. Even when you search for an ambiguous word with several meanings – let’s say “Orange”, the results are sensible – Orange the mobile company and the Wikipedia page on the fruit.

This is because Google have spent years refining their algorithms to ensure it brings back the most relevant content possible. This doesn’t just mean putting the most popular links at the top of results. Instead Google uses everything it knows about you – your previous searches, your Google profile and Gmail, your stored cookies and more – to deliver personally tailored results.

Social media research tools don’t however work this way. The APIs and scrapers collecting content return all keyword mentions, relevant or not. In searches we’ve run, some of the most unexpected things we’ve found have been:

  1. Searching for banks will bring back posts on “carder forums” – the sites where credit card fraudsters sell the card details they have stolen from databases.
  2. Almost everything is a word in Indonesian. You thought you were searching a specific and unambiguous acronym? No, it also means something in Indonesian – and volumes can be enormous because Indonesians are one of the most active populations in the world on Twitter.
  3. Pharmaceutical searches are near-impossible. Dubious medication sellers will include hundreds or thousands of drugs as keywords on their pages, whether or not they’re selling those products.

Challenge 3: not all relevant content is indexed

So far we’ve described some ways that irrelevant content or “noise” can get into your social media search. There’s also the opposite problem, however – not being able to ‘see’ certain types of social media content, particularly forums:

Message boards are part of the Internet known as the ‘Invisible Web’ and pose many problems to traditional search engine spiders. The dynamic content is usually very deep and hard to search. In addition, many of these sites change their locations, servers, or URLs almost daily presenting special searching challenges

This makes it essential to use a social media research tool that allows you to check which forums are tracked, and customize the panel of sources as needed.

Impact on social media research

What this means for social media research is that if you’re using an off-the-shelf monitoring package, you’re probably getting a lot of junk in your results. Brands are often keen on easy usability – type your brand name into the search, and get a volume figure and sentiment stats out. But without tailormade search syntax, those figures are almost certainly meaningless.

So how do you make your social media research search relevant?

1. Specific is better

Using broad search terms and then excluding keywords you’re not interested in doesn’t usually work very well. You’ll never be able to filter out all irrelevant content – language is too varied and dynamic. E.g. if you’re after the mobile brand, search for Orange AND mobile, not just “Orange”.

2. But you can filter irrelevant websites

Not that many big content farms exist – so we exclude everything from them by URL.

3. Also filter particularly spammy keywords

e.g. “Viagra” for anything medical.

4. Boolean search syntax

This is the logic that enables you to search for content including A and B, A or B, or including A but not B. It’s essential not only for designing your social media search strings, but for also searching within the dataset.

5. Test your search terms on Twitter

Enter your search phrases into Twitter Search to see whether you’ve got them right, or if they’re bring back unexpected or irrelevant content instead. Twitter search also helps you understand the volumes of content that’ll come back (is it multiple posts a second, or a couple per day?)

6. Get personal

If you’re specifically interested in what consumers are saying, searching with personal pronouns – e.g.“my iPhone” – will bring back a much more relevant dataset than “iPhone” on its own.

Which is to say, designing a relevant and accurate social media search can involve a surprising amount of time, thinking and ongoing refinement. Few brands have the time or expertise to do this in-house using an off-the-shelf monitoring tool. This is why our clients have come to us instead for our expertise in locating what matters in social media – the signal amongst the noise