BIG DATA IMPORTANCE AND USE CASES

Aniket kashyap
7 min readSep 22, 2020

what is Big data, why to learn big data, why no one can escape from it. We will also discuss , why the industry is shifting from legacy system to big data, why it is the biggest paradigm shift IT industry has ever seen, why, why and why???

BIG DATA

The conventional way in which we can define big data is, It is a set of extremely large data so complex and unorganized that it defies the common and easy data management methods that were designed and used up until this rise in data.

The term “big data” refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. They don’t fit into regular database network

How are we contributing to the creation of Big Data?

500 million Instagram stories posted per day. 95 million photos and videos are shared on Instagram per day

Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and around 200 billion tweets per year.

The total number of people who use YouTube — 1,300,000,000. 300 hours of video are uploaded to YouTube every minute! Almost 5 billion videos are watched on Youtube every single day.

Google handles 3.8 million searches per minute on average across the globe. That comes out to 228 million searches per hour, 5.6 billion searches per day, or 2 trillion searches per year! That’s a lot of searches!

Over 6 billion texts are sent every day. (CTIA) (Tweet this!) 13. Over 180 billion texts are sent every month.

There are 293.6 billion emails sent and received daily.

As a users, we are only focused on the outcome of what we are performing on the web .We don’t dwell on what happens behind the scenes. Per day we have contributed to the vast amount of big data. Now imagine, the number of people spending time on the Internet visiting different web pages, uploading pictures, and what not.

All of this adds up to the stockpile of data.

Characteristics of Big Data

The 3 Vs of big data -: Big data is collection of data from various sources, often characterized by what’s become known as the 3V’s: volume, variety and velocity. over time, other Vs have been added to description of big data:

Volume -: Organisations have to constantly scale their storage solutions since big data clearly requires large amount of space to be stored.

Velocity -: Since big data is being generated every second, organisations need to respond in real time to deal with it.

Variety -: Big data comes in variety of forms. It could be structured or unstructured, or even in different formats such as text format, videos, images, and more.

Veracity -: Big data, as large as it is, can contain wrong data too. Uncertainty of data is something organisations have to consider while dealing with big data.

Value -: Just collecting big data and storing it is of no consequence unless the data is analyzed and a useful output is produced.

We don’t have much space , but we have lots of data ,this problem is known as Big data. Solution of this problem is Distributed storage.

What is Distributed Storage?

A distributed storage system is infrastructure that can split data across multiple physical servers, and often across more than one data center. It typically takes the form of a cluster of storage units, with a mechanism for data synchronization and coordination between cluster nodes.

Distributed storage is a concept. If u want to implement any concept u need a product. There are lots of software available in market but one of the most widely used software in corporate world is HADOOP.

What is HADOOP?

Hadoop is the solution to above Big Data problems. It is the technology to store massive datasets on a cluster of cheap machines in a distributed manner. Not only this it provides Big Data analytics through distributed computing framework.

Hadoop Distributed File System (HDFS) — It is the storage layer of Hadoop

Here we can easily solve both the problems of volume and velocity. The data is split and stored in different storage units and each bit of data is being processed at same time

Master is a high-end machine where as slaves are inexpensive computers. The Big Data files get divided into the number of blocks. Hadoop stores these blocks in a distributed fashion on the cluster of slave nodes. On the master, we have metadata stored.

HDFS has two daemons running for it. They are :

NameNode : NameNode performs following functions –

  • NameNode Daemon runs on the master machine.
  • It is responsible for maintaining, monitoring and managing DataNodes.
  • It records the metadata of the files like the location of blocks, file size, permission, hierarchy etc.
  • Namenode captures all the changes to the metadata like deletion, creation and renaming of the file in edit logs.
  • It regularly receives heartbeat and block reports from the DataNodes.

DataNode: The various functions of DataNode are as follows –

  • DataNode runs on the slave machine.
  • It stores the actual business data.
  • It serves the read-write request from the user.
  • DataNode does the ground work of creating, replicating and deleting the blocks on the command of NameNode.
  • After every 3 seconds, by default, it sends heartbeat to NameNode reporting the health of HDFS.

How is Big data being used in companies?

Big data case studies!

  • 182 million total Netflix users
  • people watch Netflix for ~90 minutes per day

With huge user base and long hours of streaming. Netflix can collect data from everyone on viewing patterns

>Time spends on shows/movies , and location .

> Types of entertainment(documentary, comedy, drama, horror, etc.).

> Favorite starring cast.

> When users stop watching a show etc.

> Engage users better on current shows

> Recommend shows they may like

> And create a show people will like

As such, Big Data analytics is the fuel that fires the ‘recommendation engine’ designed to serve this purpose. More recently, Netflix started positioning itself as a content creator, not just a distribution method. Unsurprisingly, this strategy has been firmly driven by data. Netflix’s recommendation engines and new content decisions are fed by data points such as what titles customers watch, how often playback stopped, ratings are given, etc. The company’s data structure includes Hadoop, Hive and Pig with much other traditional business intelligence.

Netflix shows us that knowing exactly what customers want is easy to understand if the companies just don’t go with the assumptions and make decisions based on Big Data.

In 2012, Facebook has revealed that it is generating around 500+ terabytes of data every day. In which 2.7 billion were likes and around 300 million photos per day. Another exciting thing is Facebook is scanning around 105 terabytes of data per each half hour.

The main business strategy of Facebook is to understand who their users are, by understanding their user’s behaviors, interests, and their geographic locations, facebook shows customized ads on their user’s timeline. How it is possible?

There are around billion levels of unstructured data has been generated every day, which contains images, text, video, and everything. With the help of Deep Learning Methodology ( AI), Facebook brings structure for unstructured data.

A deep learning analysis tool can learn to recognize the images which contain pizza, without actually telling how a pizza would look like?. This can be done by analyzing the context of the large images that contain pizza. By recognizing the similar images the deep learning tool will segregate the images that contain pizza. This is how data Facebook is bringing a structure to the unstructured data.

Big Data in Banking Sector

The amount of data in banking sectors is skyrocketing every second. According to GDC prognosis, this data is estimated to grow 700% by 2020.

study and analysis of big data can help detect -

  • The misuse of credit cards
  • Misuse of debit cards
  • Venture credit hazard treatment
  • Business clarity
  • Customer statistics alteration
  • Money laundering
  • Risk Mitigation

Thankyou!!

have a good day……

--

--