Pages

Saturday 22 June 2013

Data Variety



Big Data is very popular for handling different varieties of data. Variety here talks about type of data to be stored and processed.



Types of Data Variety:

  1. Strutured Data
  2. Unstructured Data
  3. Semi-Structured Data



Structured Data

Data which has some structure or more precisely we can say data which has rigid schema.

Example:
Relational Databases   -  IBM DB2, Teradata, Oracle Databases, etc

All above databases have rigid structure and we have follow the structure while loading the data. If we consider table in relational database, it will be created with column definition and whenever we have to load data into this table, column definition has to followed, we cannot enter character data into Integer column.


Some Useful Teminologies:
whenever we have strictly follow the schema then it is termed as Rigid Schema.

Whenever we have to strictly follow the table schema while loading  or writing the data , is termed as Schema on Write.

Whenever we have to follow schema while reading or accessing data, is termed as Schema on Read.



UnStructured Data

Data which has no structure or more precisely we can say data which has no schema.

Example:
Text data, FaceBook posts, Tweeter tweets, Images, videos, logs (Web logs, Audit logs, System logs), emails, Sensor data, CCTV footage, market events, data from socila feeds, mobile phone calls call center conversatons, etc.


Today world has close to 90% of unstructured data and its growing at very high speed.
90% of unstructured data is created within last 3-4 years only. This sudden data explosion has gifted Big data Technologies very high growth.

In 2010, Big data market was ~ $3.2 Billion
In 2015, Forcasted ~ $17 Billion
In 2017, ~ $20 Billion
(Dont rely on above forcasted values as collected from some websites, these are approxiamations only)



Semi-Structured Data

Data which has some structure or we can say data which has some schema but it is not required to follow the schema rigidly.

Example:  Excel sheets

In excel sheets, we can store data in form of rows and columns. we can declare the definition for columns as we do in relational database table declaration, but here we can enter character data in numeric column and numeric data in character columns.

Data with structure or schema but not required to follow rigidly while loading data is termed as Semi Structured data.




4 comments:

oracle fusion procurement training said...

core java is the essential for java with a view to being utilized in any java era without this nobody can bounce on any enhance java generation. in which as improve java is specialization in a few area, which includes networking, net, com or database dealing with.
So analyze extra approximately on line it courses on oracle fusion procurement online training
thank for the sharing useful information
oracle fusion procurement online training
oracle fusion procurement training

oracle fusion SCM said...

Hi,
this is a very interesting topic.
oracle fusion SCM online training

Unknown said...

Thanks for sharing such a useful post. It is really good. Keep posting. It’s a great article which enriches my knowledge.

Hadoop Courses in Chennai

Deepa said...
This comment has been removed by the author.