Hadoop Guru: Pig Basic Understanding

High Level Data Flow System on Map Reduce.

It's a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs.

Pigs infrastructure layer consists of -

- A compiler that produces sequences of Map-Reduce programs.

- Pig's language layer currently consists of a textual language called Pig Latin.

Pig Origin -

PIG was originally created at Yahoo! To answer a similar need to Hive.
Many Developer did not have the Java and/or MapReduce knowledge required to write standard MapReduce programs.
But they still needed the Query language.

Solution They got was - PIG

Pig With MapReduce -

Running Pig -

Pig Engine - Parser, Optimizer, distributed query execution.

Grunt Shell - Pig’s interactive shell to enter Pig commands.

Script File - Place Pig commands in a script file & run a script.

Embedded Program - Embed Pig commands in a host language & run the program.

Pig Data Types -

Scalar Types
INT
LONG
FLOAT
DOUBLE
CHARARRAY
BYTEARRAY
BOOLEAN

Complex Types
MAP
TUPLE
BAGS

Pig Features -

Pig provides many features which allow developers to perform sophisticated data analysis without writing MapReduce programs.

Pig vs Hive -

Both have strengths & weaknesses, so its better to spend some time investigating each to make informed decision to choose either Pig or Hive depending on the requirement and type of data to be stored & processed.

Hadoop Guru

Pages

Wednesday, 20 August 2014

Pig Basic Understanding

1 comment: