What is Big Data?

Will Ellis
Last Updated on April 25, 2022
No matter who you are, whether you are a celebrity making Tik-Toks or a crackpot living in a cave, not many people know who you are relative to the size of the human race.

But oddly enough, it is highly likely that quite a few companies know who you are. How does that even work? Well, it’s complicated.

There is an industry of companies that specializes in collecting data. Consider Twitter, for example. Twitter does not show you the most recent tweets. Instead, it actually shows you the tweets that are most likely to get you using Twitter more.

It does not want you to see what you want to see. It wants to retain you as a user. To that end, it employs a variety of different programs to track your activity.

Twitter tracks what tweets you like, obviously. But it also tracks what links you click on. That makes sense as well. No one is completely surprised by that. However, some people are surprised to find that Twitter also tracks when you scroll past things, and when you stop scrolling past things.

This means that Twitter knows what you are stopping to read, and for how long. Twitter has also been proven to track what is being searched, typed, and displayed in other tabs in your browser. This is illegal in many countries, and it is only in the terms and conditions of Twitter in a few countries.

But this information is not just used to pick and choose which tweets to show you. Twitter sells this information to advertisers to make it easier to advertise to you. And it is not the only social media platform that does this. Twitter, Facebook, Amazon, YouTube, they all take part in this scheme.

This is the industry known as “Big Data”.

What Does Big Data Look For? 🔎️

Different people have different reactions to the existence of Big Data. Some people react with horror, some with indifference, and others with exclamations that they knew they were being spied on all along.

But after the initial moment of reaction, you might start to realize something: Big Data is not just a pair of words that describes the industry. It also describes the commodity being traded. You need to know what these companies are trading if you want to advocate for yourself as owner of your own data.

The Six Vs of Big Data

Computer Data

There are six main elements of what Big Data collects on you. We will go over each of them briefly. Generally speaking though, there are two types of data that Big Data collects: Specifying Factors (information that narrows down your interests and behaviours) and Mitigating Factors (evidence that dilutes the meaning of the Specifying Factor). Some data can be one or the other, but never both.

Volume 📑️

This describes the amount of data that is collected from you. But it is a reflection of how much data you consume. This is because most apps and websites (though not all) can only collect data on you so long as you are using them. The more data you consume through them, the more they gather on your habits. 

Variety 📚️

This is related to volume, at least most of the time. The reason is that variety describes the different places that your data is pulled from. The more various your social media apps, the more variety you offer. This is because many of the companies that broker in this data are connected to multiple apps.

For instance, Google knows what you search through their search engine (and just about everything else you do if you use Chrome), as well as what videos you watch on YouTube. They know what recommendations you look at and which ones you ignore. They know what you comment on as well.

And this does not just impact you based on the different apps you use. Because all of these apps will sell to similar advertising companies, those companies will end up collecting more data on you based on the variety of different apps you use, even if those apps are unrelated to each other.

Velocity 📱️

This term can be deceptive. For the most part, it relates to how quickly the data is being gathered on you. This is related more to the infrastructure of data-gathering methods than anything else.

In short, that means if an app is constantly collecting data on you, then the faster its internet connection, the more data it can collect. Velocity is the term for the relative speed at which data is collected, considering factors like internet speed, app optimization, and device power.

Veracity 🤔️

This is one of the big Mitigating Factors in Big Data. Veracity is all about how close to the truth the data being collected really is. You can pull all sorts of data from someone’s behaviour to conclude that they enjoy vegan food. Are they vegan, or does it just seem like it because of their new girlfriend?

Veracity can be as negative far more than it is positive. That means it reveals when something is possibly untrue far more frequently than confirming that something is true.

Value 💰️

This is the V that most people will come up with first if they ask. It is the relative financial value of the data collected on you. This comes up frequently, but not in the way you might expect. Generally, money is made off of your internet habits idly by putting ads in front of you.

This makes all internet behaviour monetizable. But it also relates to your willingness to buy things.

Variability 🗃️

And finally, variability. This is a term for describing the different ways data can be used. Sometimes that data is used to sell you things. But it can also be used to recommend you online content or inform decisions for user experience designers. For instance, if there is a button that you never ever use in an app, then a designer will probably be interested to know that so they can place that button elsewhere.

Conclusion 💡️

That is a very brief overview of Big Data and what they look for. This should give you an idea of what you are trying to protect online, from your different habits to the things you look at.

