Hive is Petabyte scale dataware house system on Hadoop.
Hadoop based system for querying & managing structured data.
Its used to Query Big Data in SQL fashion.
For Execution Hive uses - Map/Reduce
For Storage Hive uses - HDFS
For Metadata - RDBMS
Origin of Hive -
Hive was designed by Facebook for querying from petabytes of data. There was sudden data explosion at Facebook which was impossible to store in traditional DBMS & query.
Hive made users job extremly esay to query data stored on HDFS.
Hive now became parallel DBMS which uses Hadoop for its storage & execution architecture.
Why Hive -
Hive is another dataware house system designed because existing Dataware house systems do not meet all the requirement in scalable , agile & cost effeciant way.
Programming model used in Hadoop is - MapReduce. Its very difficult to write Map-Reduce program for every small or big reports. Also it's requires highly skilled resources to write such a complex code.
Using Hive one can simply issue the query as simple & similar we do in SQL. But here Hive generates Map Reduce code for user based on Query issued.
Advantages of Hive -
Hive can work with very large data (100's to Terabytes).
Hive can work on large hadoop cluster (100's of Nodes).
Data stored on Hive has defined Schema.
Hive is used for Batch jobs also (Load & Query).
Where not to use Hive -
If you need responses in seconds.
If you don't want to impose a schema.
If traditional DBMS already can do the job.
If your data is measured in GB's or even less.
If you don't have enough time & highly skilled resources.
Hive Entities -
Database, Table, Partitions, Bucketing Columns.
MORE....
Hive Data Types -
Primitive Data Types
TINYINT 1 Byte Signed Integer
SMALLINT 2 Byte Signed Integer
INT 4 Byte Signed Integer
BIGINT 8 Byte Signed Integer
BOOLEAN True or False (Boolean)
FLOAT Single precision floating bytes
DOUBLE Double precision floating point
STRING Sequence of charaters (within Sigle or double quotes)
TIMESTAMP java.sql.Timestamp format
etc...
Collection Data Types
STRUCT Similar to Structure in C.
MAP (Key,Value) pair
ARRAY Ordered sequence of similar data types.
Hive operations -
DDL operations
[CREATE/ALTER/DROP] [TABLE/VIEW/PARTITION]
CREATE TABLE AS SELECT
DML operations
INSERT OVERWRITE
Queries...
Sub-Queries within "FROM" clause.
Joins [Inner join & Outer (Left, Right & Full outer join)]
Multi-Table insert
Sampling
Interfaces
JDBC/ODBC/THRIFT
Hive UDF's -
Hive user defined functions
UDF's allows user to extend HiveQL.
Builtin UDF's are available in Hive (e.g SUM, COUNT, .... etc)
Custom UDF's created by User. These can be created by user based on requirement.
Benefits of UDF's
It's possible to extend HiveQL functionality.
Integration of Systems.
Controlling the Map & Reduce stages.
Formatting output data before sending to Pig, Mahout or other programs.
MORE on UDFs....
2 comments:
not good
Some what better not good but not bad
Post a Comment