High Level Data Flow System on Map Reduce.
It's a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.
Pigs infrastructure layer consists of -
- A compiler that produces sequences of Map-Reduce programs.
Pig Origin -
Pig Data Types -
Scalar Types
INT
LONG
FLOAT
DOUBLE
CHARARRAY
BYTEARRAY
BOOLEAN
Complex Types
MAP
TUPLE
BAGS
Pig Features -
- Pig's language layer currently consists of a textual language called Pig Latin.
Pig Origin -
- PIG was originally created at Yahoo! To answer a similar need to Hive.
- Many Developer did not have the Java and/or MapReduce knowledge required to write standard MapReduce programs.
- But they still needed the Query language.
Solution They got was - PIG
Pig With MapReduce -
Running Pig -
Pig Engine - Parser, Optimizer, distributed query execution.
Grunt Shell - Pig’s interactive shell to enter Pig commands.
Script File - Place Pig commands in a script file & run a script.
Embedded Program - Embed Pig commands in a host language & run the program.
Scalar Types
INT
LONG
FLOAT
DOUBLE
CHARARRAY
BYTEARRAY
BOOLEAN
Complex Types
MAP
TUPLE
BAGS
Pig Features -
Pig provides many features which allow developers to perform sophisticated data analysis without writing MapReduce programs.
Pig vs Hive -
Both have strengths & weaknesses, so its better to spend some time investigating each to make informed decision to choose either Pig or Hive depending on the requirement and type of data to be stored & processed.
1 comment:
Give article is informative, add some info with examples..
Post a Comment