Pages

Wednesday 20 August 2014

Pig Basic Understanding



High Level Data Flow System on Map Reduce.
It's a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.


Pigs infrastructure layer consists of -
-  A compiler that produces sequences of Map-Reduce programs.
-  Pig's language layer currently consists of a textual language called Pig Latin.


Pig Origin -
  1. PIG was originally created at Yahoo! To answer a similar need to Hive.
  2. Many Developer did not have the Java and/or MapReduce knowledge required to write standard MapReduce programs.
  3. But they still needed the Query language.
 Solution They got was -  PIG


Pig With MapReduce -
















Running Pig -
Pig Engine       - Parser, Optimizer, distributed query execution.
Grunt Shell      - Pig’s interactive shell to enter Pig commands.
Script File      - Place Pig commands in a script file & run a script.
Embedded Program - Embed Pig commands in a host language & run the program.





Pig Data Types -



Scalar Types
INT
LONG
FLOAT
DOUBLE
CHARARRAY
BYTEARRAY
BOOLEAN



Complex Types
MAP
TUPLE
BAGS








Pig Features


Pig provides many features which allow developers to perform sophisticated data analysis without writing MapReduce programs. 


Pig vs Hive
Both have strengths & weaknesses, so its better to spend some time investigating each to make informed decision to choose either Pig or Hive depending on the requirement and type of data to be stored & processed.








1 comment:

Anonymous said...

Give article is informative, add some info with examples..