Big Data is very popular for handling different varieties of data. Variety here talks about type of data to be stored and processed.
Types of Data Variety:
- Strutured Data
- Unstructured Data
- Semi-Structured Data
Structured Data
Data which has some structure or more precisely we can say data which has rigid schema.
Example:
Relational Databases - IBM DB2, Teradata, Oracle Databases, etc
All above databases have rigid structure and we have follow the structure while loading the data. If we consider table in relational database, it will be created with column definition and whenever we have to load data into this table, column definition has to followed, we cannot enter character data into Integer column.
Some Useful Teminologies:
whenever we have strictly follow the schema then it is termed as
Rigid Schema.
Whenever we have to strictly follow the table schema while loading or writing the data , is termed as
Schema on Write.
Whenever we have to follow schema while reading or accessing data, is termed as
Schema on Read.
UnStructured Data
Data which has no structure or more precisely we can say data which has no schema.
Example:
Text data, FaceBook posts, Tweeter tweets, Images, videos, logs (Web logs, Audit logs, System logs), emails, Sensor data, CCTV footage, market events, data from socila feeds, mobile phone calls call center conversatons, etc.
Today world has close to 90% of unstructured data and its growing at very high speed.
90% of unstructured data is created within last 3-4 years only. This sudden data explosion has gifted Big data Technologies very high growth.
In 2010, Big data market was ~ $3.2 Billion
In 2015, Forcasted ~ $17 Billion
In 2017, ~ $20 Billion
(Dont rely on above forcasted values as collected from some websites, these are approxiamations only)
Semi-Structured Data
Data which has some structure or we can say data which has some schema but it is not required to follow the schema rigidly.
Example: Excel sheets
In excel sheets, we can store data in form of rows and columns. we can declare the definition for columns as we do in relational database table declaration, but here we can enter character data in numeric column and numeric data in character columns.
Data with structure or schema but not required to follow rigidly while loading data is termed as Semi Structured data.