How big MNC’s manages their Big Data?

Milind Rastogi
5 min readSep 20, 2020

In this article I will be telling you how big multinational companies like Google , Facebook and Amazon stores , manages and manipulates their thousands of terabytes of data or to be specific how they manage big data.

Before explaining the data management of these companies let me explain what actually the term Big Data means.

What is Big Data ?

Well most of the people think that big data is a technology used for managing high volume of data , but actually big data is a problem faced by many multinational companies like Facebook , Google ,Amazon etc. The data generated by these companies are beyond their storage capacity and the data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. So that’s the reason why big data is a problem for many MNC’s.

Now let’s see how facebook , google ,amazon manages their data

Facebook

Facebook generates 4 petabytes of data per day — that’s a million gigabytes. And it’s system produces around 2.5 billion pieces of content every day .All that data is stored in what is known as the Hive, which contains about 300 petabytes of data. This enormous amount of content generation is without a doubt connected to the fact that Facebook users spend more time on the site than users spend on any other social network, putting in about an hour a day.

For big data management Facebook has designs its own servers and networking. It designs and builds its own data centers. Its staff writes most of its own applications and creates virtually all of its own middleware. Everything about its operational IT unites it in one extremely large system that is used by internal and external folks alike

Google

Google is the world’s most “data-oriented” company. It is a part of the largest implementers of Big Data technologies.

Map all the Internet data. Identify what most uses. More clicked. More interacted. What is most beneficial? These are the main data tasks of Google.

Based on the SEARCH service, its first product, over time, Google creates a lot of other data products. Google Apps, Google Docs, Google Maps, YouTube, Translator, and so on.

Some people estimated that google’s database size is about 10 exabytes , which is 10 million terabytes.

So the question is how google manages such a huge amount of data ?

The answer is implementing computing tools and technologies equal to Hadoop and BigQuery (Google’s NoSQL technology).

Amazon

Amazon uses largest number of server for hosting their data they host around 1,000,000,000 gigabytes of data across more than 1,400,000 servers.

Amazon generates data two-fold. The major retailer is collecting and processing data about its regular retail business, including customer preferences and shopping habits. But it is also important to remember that Amazon offers cloud storage opportunities for the enterprise world.

Amazon S3 — on top of everything else the company handles — offers a comprehensive cloud storage solution that naturally facilitates the transfer and storage of massive data troves. Because of this, it’s difficult to truly pinpoint just how much data Amazon is generating in total.

Instead, it’s better to look at the revenue flowing in for the company which is directly tied to data handling and storage. The company generates more than $258,751.90 in sales and service fees per minute.

Problems faced for managing big Data

In this article I will be discussing two problem of big data

  1. Volume: As we know that for storing big data we need more amount of storage. And we may think that buying a bigger harddisk or data server will solve the problem. But the problem in this solution is that companies don’t know how much data they will be storing. As bigger companies generates lots of data so it won’t be possible for them to buy one single harddisk or data server. They may end up buying lots of harddisk and still their complete data cannot be stored.
  2. Velocity: Generally data is stored inside a harddisk but the storing and retrieving of data takes lot of time in harddisk. This storing and retrieving of data in a storage is known as Innput/output operations. So storing lots of data in a one single harddisk takes lot of time and this cannot be the solution for storing big data.

How to solve big Data problem?

One way to solve big data is to use Distributed Storage System . This system follows master-slave topology .In this system lots of storage resources are connected to the one main master storage. This master node is known as Name node and the slave storage nodes are known as data node. All these data nodes provide their storage to the one single master node. Their can be thousands of data node providing their resources to the master or Name node.

With the help of this system we can solve both the volume and velocity problem. We now don’t require the unlimited numbers of harddisk for the data storage instead we now require thousands of data node which can be added on the basis of storage requirement to the Name node. Thus the problem of the volume can be easily solved.

Velocity problem can also be solved with this system as for doing input/output operations we can divide the task between the different nodes. As thousands of data nodes will be storing and retrieving data so the speed of I/O operation will automatically increase.

Tools for solving big data problem

  1. Hadoop
  2. MongoDB
  3. Cassandra
  4. Drill
  5. Elasticsearch

--

--