Friday, 27 November 2015

Hive Interview Questions : Hive Lateral View Keyword Use



Question:- Consider a scenario , we have table in hive containing one column as INT and one column as ARRAY . Display all values as one on one mapping/ horizontally.



Solution:

1. Below is the data set to explain the example. '\t' is the field delimiter and Control+B is the collection items delimiter.














2.  Create a managed table in the database using below query on the hive shell:-


hive > CREATE TABLE arrays(emp_id INT, dep_id ARRAY<STRING>, address_id ARRAY<STRING>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY  '\002'; 




3.  Load data in to table using the below query:-

hive > LOAD DATA LOCAL INPATH '/<pathtofile>/array.txt' overwrite INTO TABLE arrays;

4.  To check the data has been loaded successfully run below query:-

hive > select * from arrays;

Below output will be produced-



So far we have successfully created our data. Now before moving forward explanation of Lateral View keyword is required.

Lateral view is used with user-defined table generating functions such as explode(). explode() is built in table generate function that produces multiple rows for one row as input.

Use Case 1: Query to print department id horizontally with employee id

hive > select emp_id,dep_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id;

output:-




Use Case 2: Query to print multiple columns horizontally with employee id

hive > select emp_id,dep_id,address_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id LATERAL VIEW EXPLODE(address_id) arrays as address_id;

output:-



Use Case 3: As you can see that row with emp_id 4 is eliminated in output produced this is because that was having null in address_id. To get that row as well we need to use LATERAL View with OUTER.

hive > select emp_id,dep_id,address_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id LATERAL VIEW OUTER EXPLODE(address_id) arrays as address_id;

output:- As you can see that employee id with 4 is coming as well. 




Input file is attached with the post.




I hope you like my explanation. Please leave a comment if you like it.





Wednesday, 18 November 2015

Exception in using UDF


Some time may you face below issue while using customizes UDF in hive

java.io.FileNotFoundException: File does not exist: hdfs:


here is complete stack trace

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hivetmp/amit.pathak/9381feb3-6c5f-469b-b6b1-9af55abbdabd/udf.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)


If you want to view exact problem you can check this issueLink




This issue comes mainly when you  use UDF in join or Create table tablename as function.

To fix above issue I have two ways.

1) - Use add file command instead of add jar (As using file it make sure your data exist in distributed cache )

Before Changes ::

add jar '/user/hive/udf.jar';
create temporary function convertToJulian as 'com.convertToJulian';

After Changes ::

add file '/user/hive/udf.jar';
create temporary function convertToJulian as 'com.convertToJulian';

2)- Have same file structure on local as well as on hadoop. 

Like if you stored your UDF in below local file system

/user/hive/amit/udf.jar

So you also need to create same directory structure in hadoop filesystem and then put your udf jar in that directory.