First posted on hautepop.tumblr.com (of course!) on 16 December 2012
Tumblr is a weird social network.
Like Twitter its content is (very largely) public, and yet like Facebook it’s opaque to social analysis. Follower counts aren’t public or accessible. Ditto who’s following each blog (you can’t even really see all your own followers), and you don’t know who’s following anyone else either. Tumblr Analytics does exist, but not for ordinary users – instead you have to pay Tumblr partner Union Metrics $499 per month a
nd what you get appears to be top-level and aggregate. (since corrected: it’s awesome, complete & exportable)
This leaves Tumblr a kind of “here be dragons” among social networks, which is unusual in an age so obsessed with them. That is, its social norms are not known; there isn’t any data about how its users behave and use the network (even for most Union Metrics subscribers there are no benchmarks). Rumours spread – Tumblr’s young, it’s photo-y and meme-driven, it’s full of weird tightly-interknit subcultures who reblog each other endlessly – but there is no data to support this, no way of discerning fact from – at best – intuition about one’s little own corner of the thing.
This is a research problem, and I just so happen to be a social media researcher working a company (FACE) which builds its own social media research platform (Pulsar). *
But even that doesn’t make Tumblr analytics easy. Social data firehose provider Gnip (one of the two sellers of the Twitter firehose, alongside Datasift) do provide a Tumblr firehose to companies like mine – that is, an API stream of all the posts on Tumblr. Great! Except, as they actually note in their post on Taming the Social Media Firehose, there are complications:
1. The main way of querying the API is through keyword-based search, looking for words or phrases in the body of the post, or titles or tags. But 84% of posts are photos, and most don’t contain any or sufficient keywords to identify what’s in the image. Consequently only 20% of content can be identified with text-based filtering, leaving 4 in 5 posts “dark”. **
2. Do you want to understand patterns of sharing content on Tumblr? How far do posts travel, what’s the average reblog rate, and of course who are the key social hubs for content diffusion? But analysing reblogs is a more manual process:
“There is a list of all of the notes (likes, reblogs) associated with a post appended to that post wherever it shows up on Tumblr. Each post activity record in the firehose can contain reblog info. It will have a count, a link to the blog this entry was a reblog of and a link to the root entry. To build the blog note list that a user would see at the bottom of a liked or reblogged entry, you have to trace each entry in the stream (i.e. keep a history or know what you want to watch) or scrape the notes section of a page.”
[Gnip, Taming the Social Media Firehose]
Which is to say understanding reblogs is doable, but a hassle. It seems to work most easily with a priori identification of what you want to track. If you want to analyse the network in order to work out what’s most important – this would seem to be a little harder.
If analysing patterns of reblogging is a manual process, then, let’s do it manually. I’m lucky enough that my Tumblr provides a nice starter-size dataset. By being Tumblr Technology spotlighted I’ve acquired a fairly generous total of 38,700 followers. This audience scale (though likely largely inactive) means that some of my content has picked up enough reblogs to make mapping the pattern of them interesting.
The images below show who’s reblogged two of my posts: Adidas and Chanel. Both are playing with a Tumblr trend I’d seen of people putting brand logos on apparently non-related images – see more here. As I hoped, I played with the tropes of this meme well enough to get some reblogs – 62 for the Adidas post, and 64 for the Chanel one. Far from Tumblr viral, but also manually analysable in an evening…
5 topline findings:
1. Both posts were reblogged 3x as much as they were ‘liked’
The ratios are 71.3% reblogs for Adidas, and 76.2% for Chanel.
Hypothesis: photo posts may get a lot of reblogging – jokes & quotes somewhere in the middle – and I reckon text-based posts (and links) may perhaps get more likes than reblogs. Images are more diffuse in meaning yet may also be more emotionally evocative, lending themselves to reinterpretation and reuse through reblogging more readily than monovalent text.
2. Does content diffusion occur through hubs – that is, are there any influencers in my network who spur much of the reblogging?
Some variation here. For the Chanel post, there are no big hubs:
- My original post gains 35 direct, first-hop reblogs out of the 64 total – that’s 55%.
- 2 users, c-rystalcastles and my0wnstunts, get 3 reblogs for their posts
- 5 users get 2 reblogs and 13 users get 1 further reblogs
- Meaning that 67% of the rebloggers gained no further reblogs themselves
For the Adidas post however:
- Original post gets 24 out of 62 total reblogs
- skullc0de (a 17-year-old German girl and streetwear brand fan) is a hub, driving 10 further reblogs
- 3 users gain 4 reblogs each, 2 users get 3 reblogs, and 10 users get 1 further reblog each
- Again, about 73% of rebloggers gained no further reblogs themselves
- The diffusion of these posts tapped into no hubs more influential than myself – as I’d expect, given my decent audience size plus some hypothesised ‘author bias’ that means you’re more likely to share an original post you see than a reblogged one.In Twitter terms, I wasn’t reblogged by a William Gibson or Warren Ellis amplification type who would spark many more reblogs than the original author directly achieved.
- If you’re Tumblring for attention (and in some way, most of us are), reblogging a post doesn’t get you that much attention. In this micro-dataset, 70% of reblogs don’t spur any further reblog-engagement.
- Rate of reblogging very likely to correlate with audience size.
3. How far content travels: 6 hops in 80-something shares
By this, I mean that content is reblogged a maximum of 6 times beyond me for each post, e.g. A reblogs it from me, then B reblogs it from A, and C from B etc. This is the same for both posts, which is interesting.
Each network also has a few instances of content being reblogged 4 or 5 times. This suggests a relatively ‘fluid’ motivation for reblogging – overall, approx. half (45% or 61%) of people aren’t reblogging direct from the content creator. This suggests that a relationship with the author isn’t a compulsory motivation for reblogging in this case – liking content, whether direct or via a friend, is just as big.
(However, this may be skewed by my audience size relative to followers.)
4. Only one user reblogged both posts
Hypothesis: I don’t have a consistently engaged audience for this kind of content.
I can see I do have a consistently engaged audience in terms of likes on text-based posts, but image-sharing seems more fickle – just what you snatch in the moment from the stream. I suspect that teenage image-bloggers are following a lot of people to build up their own audiences (as evidenced by ‘please reblog’ and ‘team followback’ type posts), and as such only see a small proportion of their followers’ total content.
I can also intuit this by the sequence of reblogs – those via a particular source are clustered closely together, suggesting there was just a brief burst of time when it showed up in their followers’ streams. That said, I wouldn’t want to underestimate how many hours some Tumblr users are putting in on the platform.
5. There are missing links
The network visualisations immediately look wrong – there is one author of the original post (me), so all the nodes should come from one central point (me). But as you can see, for the top post (Adidas) there are in fact two entirely separate network graphs, whereas for Chanel below there is a weird unconnected isolate floating off to the left.
In the list of reblogs on the original post, I can see that “elements234 reblogged this from uggzm1nt” without uggzm1nt having ever reblogged it from me previously. Weird! I suspect reblogs via users who subsequently delete their profiles are themselves deleted from the post engagement data. e.g. A user Alice who reblogged the post from me, then uggzm1nt subsequently reblogged Alice, then Alice deleted herself.
While the right thing to do from a userdata point of view, the social network analyst has to hypothesise ghost users at critical bridge points in order to make the network make sense. Hmm.
So. Still no benchmarks – still not even an accurate picture of my overall Tumblr network, or any knowledge of how my normal types of posts get shared. (Generally, much less – say 10 likes & 5 shares.) But quite interesting – in summary, analysing these two posts shows:
- Tumblr is harder to analyse than Twitter
- Keyword analysis only gets you 20% of it – you need network & image analysis to really understand it
- Though popular for me, these posts were shared by only a tiny proportion of my nominal audience (followers total) – about 0.4%. Though I’m probably not typical
- People may reblog an image post at 3x the frequency they ‘like’ it. The currency is sharing, not hearts
- Rebloggers aren’t ultra-loyal – they reblog for the sake of the content, not the author
- Tumblr users delete themselves and leave holes in your network data, the rats!
Next steps are of course to analyse all my Tumblr posts, all YOUR Tumblr posts, the most reblogged post ever (11,184,542 notes!) or all the posts getting over 1000 reblogs in X period of time.
With any luck I might be doing some of this in 2013 at FACE with Francesco D’Orazio (and a dev team, and firehose access…). TBC?
* If you would like to talk to us about social media research more seriously, get in touch: Jessica@Facegroup.com.
** Gnip comment they can do character- and object-recognition in photo posts – nice! But I think it’s fair to reckon that this is difficult and limited, so doesn’t make the other 80% of posts easily transparent or knowable.