Friday, 27 November 2015

Hive Interview Questions : Hive Lateral View Keyword Use



Question:- Consider a scenario , we have table in hive containing one column as INT and one column as ARRAY . Display all values as one on one mapping/ horizontally.



Solution:

1. Below is the data set to explain the example. '\t' is the field delimiter and Control+B is the collection items delimiter.














2.  Create a managed table in the database using below query on the hive shell:-


hive > CREATE TABLE arrays(emp_id INT, dep_id ARRAY<STRING>, address_id ARRAY<STRING>) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY  '\002'; 




3.  Load data in to table using the below query:-

hive > LOAD DATA LOCAL INPATH '/<pathtofile>/array.txt' overwrite INTO TABLE arrays;

4.  To check the data has been loaded successfully run below query:-

hive > select * from arrays;

Below output will be produced-



So far we have successfully created our data. Now before moving forward explanation of Lateral View keyword is required.

Lateral view is used with user-defined table generating functions such as explode(). explode() is built in table generate function that produces multiple rows for one row as input.

Use Case 1: Query to print department id horizontally with employee id

hive > select emp_id,dep_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id;

output:-




Use Case 2: Query to print multiple columns horizontally with employee id

hive > select emp_id,dep_id,address_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id LATERAL VIEW EXPLODE(address_id) arrays as address_id;

output:-



Use Case 3: As you can see that row with emp_id 4 is eliminated in output produced this is because that was having null in address_id. To get that row as well we need to use LATERAL View with OUTER.

hive > select emp_id,dep_id,address_id from arrays LATERAL VIEW EXPLODE(dep_id) arrays as dep_id LATERAL VIEW OUTER EXPLODE(address_id) arrays as address_id;

output:- As you can see that employee id with 4 is coming as well. 




Input file is attached with the post.




I hope you like my explanation. Please leave a comment if you like it.





Wednesday, 18 November 2015

Exception in using UDF


Some time may you face below issue while using customizes UDF in hive

java.io.FileNotFoundException: File does not exist: hdfs:


here is complete stack trace

java.io.FileNotFoundException: File does not exist: hdfs://localhost:54310/usr/local/hivetmp/amit.pathak/9381feb3-6c5f-469b-b6b1-9af55abbdabd/udf.jar
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)


If you want to view exact problem you can check this issueLink




This issue comes mainly when you  use UDF in join or Create table tablename as function.

To fix above issue I have two ways.

1) - Use add file command instead of add jar (As using file it make sure your data exist in distributed cache )

Before Changes ::

add jar '/user/hive/udf.jar';
create temporary function convertToJulian as 'com.convertToJulian';

After Changes ::

add file '/user/hive/udf.jar';
create temporary function convertToJulian as 'com.convertToJulian';

2)- Have same file structure on local as well as on hadoop. 

Like if you stored your UDF in below local file system

/user/hive/amit/udf.jar

So you also need to create same directory structure in hadoop filesystem and then put your udf jar in that directory.









Thursday, 15 October 2015

Java MultiThreading Interview Question



I am publishing some interview question which asked to me(and my friends) while I am applying for new change with java profile.

            Right now I am adding only question will add ans in time to time.

 
Multi threading :  

1.   What is context switching in multi-threading?  

Ans . Switching context from one thread to another thread is called context switching. Mostly OS uses round robin preempted mechanism for context switching.
Context switching should not be very frequently, it may degrade performance, because it will waste more time in switching context instead doing actual processing in threads


 2. Difference between deadlock and livelock,  starvation?

DeadLock - It’s situation where one thread waiting for a resource that is locked by other resources.

Livelock -  Scenario is same like deadlock but the basic difference is that in this case both thread pretend to solve deadlock issue . Eg two mens stuck in tunnel in which at one time only one can pass.

Starvation - Starvation is case where one thread get neglected by thread manager 


3. What thread-scheduling algorithm is used in Java?

Java does not its own scheduling algorithm, it uses underlying operating system thread scheduling algorithms. Operating system uses Round Robin scheduling , Preemptive strategy for scheduling threads 


4. What is thread-scheduler in Java?
    Thread scheduler is part of OS which which mainly control thread execution.
    There is no guarantee that which runnable thread will be chosen to run by thread schedule.


 5. How do you handle unhandled exception in thread?

Ans : To handle unhandled exception we need to add own uncaughtExceptionHandler.
If we haven't implemented any handler then it will throw exception like below
                Exception in thread "Thread-0" java.lang.RuntimeException at Main$1.run(Main.java:11) at java.lang.Thread.run(Thread.java:619)

Thread t = new Thread(new Runnable()
 {
   public void run()
    {
      throw new RuntimeException();
 } });
 t.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {   
  public void uncaughtException(Thread t, Throwable e)
 {
        System.out.println("exception " + e + " from thread " + t);
} });  
     t.start();


To set handler for all threads use a static method Thread.setDefaultUncaughtExceptionHandler.

6. What is thread-group, why it's advised not to use thread-group in Java? 

Ans. When you want to execute bunch of thread at once rather than individual then Thread group comes handy.
It’s provide a mechanism for collecting multiple threads into a single object .


The runtime system puts a thread into a thread group during thread construction.


-you cannot move a thread to a new group after the thread has been created


public Thread(ThreadGroup group, Runnable target)
public Thread(ThreadGroup group, String name)
public Thread(ThreadGroup group, Runnable target, String name)


Example ::
ThreadGroup myThreadGroup = new ThreadGroup("My Group of Threads");
Thread myThread = new Thread(myThreadGroup, "a thread for my group");

7. Why Executor framework is better than creating and managing thread by application
 
8.  Difference between Executor and Executors in Java?
  
9. How to find which thread is taking maximum cpu in windows and Linux server?.

10. What is ThreadLocal class used for ?
Ans. It’s  alternative way to get ThreadSafety in java. It’s eliminate synchronization requirement by sharing explicit copy of object to each thread



11.
How will you design your own custom thread pool in Java ? Don't use the One provided by JDK ?


 Ans. Need to follow below steps. 
 
1)-  Take variable blockingQueue (To save task).
2)- variable like list of type thread
3)- Boolean to check status of thread
4)- Create parameterized constructor   with value of task and thread
5)- Create Enque and Deque Method for same
 

12. Why wait, sleep, notify and notifyAll methods are in Object class not in Thread class?

13. Implement runnable functionality for your own.
14. If any thread have class level locking as well as instance level locking then both are mutually exclusive or not ?

15 what is difference between yield() and joins().

16. What are the thread state.

17. Best way to handle exception thrown by Callable in multi threading?
18. Difference between Countdown Latch and Countdown Barrier.  

19.Sleep VS wait?
  
20. How do you make a class synchronized ?

21. Best way to handle exception using callable? 

22. Implement BlockingQueue to solve producer consumer problem ?

23. Implementation of countDownLatch?

  
 

Java Collection and DataStructure Interview Questions







1. What is the strategy to handle ConcurrentModificationException in your Java program.

2. How hashmap identify index for element to insert?

3. What is  default size of bucket in hashmap? and formula of rehash size?
4. Comparable VS Comparator
5. Write Comparator for employee class having id and name?
6. ArrayList VS LinkedList
7. Can we add null to TreeSet? if No why?

8.
   ArrayList<String> s = new ArrayList<String>(); 
   ArrayList<Integer> s1 = new ArrayList<Integer>();

    System.out.println(s1.equals(s));

why ans is always equals
 

9. Implement own linked list.

10. Implement own linked list.

11. What is the difference between HashMap and Synchronized map.

12. how does blocking queue works? 

13. Which collection  should use where you keep elements like list. With out using AL or LL ? 

14. How to write your own doubly link list? How do you handle boundary conditions? 

15. Why java have “for loop”, foreach loop, iterator and ListIterator for traversing ArrayList,
  List out difference between them and give valid reason what you prefer.


Ans
 for loop - Good to traversing Arraylist only, If you are not sure underlying data structure do not use for loop for traversing as for linked list N^2 time complexity
For Each Loop - can call methods of element

Iterator : 
// 1 - can call methods of element // 2 - can use iter.remove() to remove the current element from the list

ListIterator :
// 1 - can call methods of element // 2 - can use iter.remove() to remove the current element from the list // 3 - can use iter.add(...) to insert a new element into the list // between element and iter->next() // 4 - can use iter.set(...) to replace the current element




16. What will be behavior of a Java Program where infinite recursion takes place with limited Stack memory ? Does the program complete or Stackoverflow exception is thrown ?

17. How to handle multiple request using concurrency utilities?