I’m going to make the argument that an individual’s up/downvote behavior tells you more about they’re psychological makeup than nearly any other platform. This is the real value of reddit.
Twitter and Facebook are more tied to identity, but also have some degree of pseudo-anonymity. Users like and share and retweet publicly, so it shows the face the user chooses to show. With reddit, the votes are confidential (to other users at least) and the accounts are less tied to identity (at least overtly), so people are much more comfortable expressing their true opinions about posts or comments, because they’re less worried about what other people think they think. Consequently, the behavior is more indicative of the person’s underlying psychic life.
If I had access to reddit’s backend, and took a list of all the posts and comments a user has upvoted, downvoted, or hidden, I would know a lot about that user. For a moderately active reddit user, especially lurkers who vote often, I would have thousands if not tens of thousands of opinions that person either agreed or disagreed with. With machine learning on this data, I would be able to predict all sorts of things about this person: their politics, their religion, their interests, their temperament or disposition, their preferences and views, down to some pretty specific detail.
This would give me a very accurate psychological profile on a user, far more useful than anything Cambridge Analytica came away with, I’d say at least an order of magnitude. And no one’s even talking about what’s happening with this data.
It’s easier to judge the more nuanced parts of a person’s mindset if you present them with a thread – and like 50 different replies within it and they answer “yes/no|like/hate” for every one – each thread is an “#” question personality quiz.
Yeah, I know. Reddit is getting people to willingly fill out psychological questionnaires about a very broad range of topics that capture their interest, and people are doing it for free. The questionnaires CA got a hold of were primitive compared to this.
I guess I’ve always known this, but the realization is just now hitting me how much information and knowledge this actually provides. With a massive corpus of preferential data about each user, and the whole dataset about all the users in the system, the machine learning implications are huge. You can apply filters to cancel out noise or filter out bots from the dataset; recognize common or unique patterns on narrow or broad scales; predict affinity for certain pieces of content for either individuals or sets of individuals; and in combination with entities, you could match this specific psychological data with the user’s identity. The possibilities are nearly endless.
Reddit is the perfect petri dish for studying how ideas, how memes in the original Dawkinsian sense, actually propagate and mutate across the network. There have been academic studies done on reddit content, but those are just scratching the surface, as all the vote data was anonymized. If the owners and partners of reddit are smart with how they use this data, and can keep the knowledge that they have this data away from the users, they are sitting on an absolute goldmine.
I just wonder who’s using it and to what ends.