16 Apr 2016

Hadoop Interview Questions

Capgemini Hadoop Developer Interview Questions

  1. What is speculative execution in Hadoop?
  2. How big data problems are solved in retail sector?
  3. What is the largest amount of data that you have handled?

Amazon Hadoop Developer Interview Questions

  1. What is the difference between TextInput format and KeyValue format in Hadoop?
  2. Log file contains entries like user A visited page 1, user B visited page 3, user C visited page 2, user D visited page no 4 . How will you implement a Hadoop job for this to answer the following queries in real-time –   Which page was visited by user C more than 4 times in a day and Which page was visited by only one user exactly 3 times in a day?
  3. What is the advantage of having a Distributed Cache in Hadoop?
  4. You have a file that contains 200 billion URLs. How will you find the first unique URL using Hadoop MapReduce?
  5. What is InputSplit in Hadoop?
  6. How will you scale a system to handle huge amounts of unstructured data?
  7. Assume that the web server creates a log file with timestamp and query. How will you design the Hadoop architecture (explaining how you will store the data) that can help you return top 15 queries made in the last 12 hours.
  8. You have a huge file (in GB’s) that contains data in multiple languages. Find n most frequently occurring patterns in a text file using Hadoop MapReduce.

MindTree Hadoop Developer Interview Questions

  1. What is heap error and how can you fix it?
  2. How many joins does MapReduce have and when will you use each type of join?
  3. What are sinks and sources in Apache Flume when working with Twitter data?
  4. How many JVMs run on a DataNode and what is their use?
  5. If you have configured Java version 8 for Hadoop and Java version 7 for Apache Spark, how will you set the environment variables in the basic configuration file?
  6. Differentiate between bash and basic profile.

Infosys Hadoop Developer Interview Questions

  1. Implement word count program in Apache Hive.
  2. Differentiate between Bucketing and Partitioning and when will you use each of these.
  3. How can you implement global sort and partitioning logic in Apache Hive?

Apple Hadoop Developer Interview Questions

  1. There are 100,000 files spread across multiple servers which need to be processed. How will you do that using Hadoop?
  2. What are the Map and Reduce functions in the standard Hadoop “Hello World” word count program?

Bloomberg LP Hadoop Interview Questions

  1. How will you manage multiple nodes together without having a master node in your architecture design?

Intuit Hadoop Developer Interview Questions

  1. Find the occurrence of every word (the number of pages on which the word is coming) in a huge file or book using Hadoop MapReduce.

Accenture Hadoop Developer Interview Questions

  1. Can you load 3TB of data in Apache Hive?

Microsoft Hadoop Developer Interview Questions

  1. Explain the working of Hadoop architecture with various components.
  2. Why do you need HBase when you can use Hive to query Hadoop?

Expedia Hadoop Developer Interview Questions

  1. Every day a new log file is created that contains User ID details. Given a range of n days, how will you find the top 5 users?

Google Hadoop Developer Interview Questions

  1. There is a table employee (employee_id int, employee_name varchar, employee_salary decimal, employee_manager_id int). We want to get the details of those employees that have salary more than their manager or do not have a manager at all. Implement the mapper and reducer functions to achieve this using Hadoop.
  2. Can you design a counter across all the Google servers using Hadoop stack?

Twitter Hadoop Interview Questions

  1. Suggest an algorithm to design Twitter trends.
  2.  Will you use Apache Pig or Hadoop MapReduce for ad-hoc and scheduled jobs?

Facebook Hadoop Interview Questions

  1. There is a huge file that cannot fit into the memory, you have to calculate the number of unique words present in the file. Assume that you have more than one system available and the problem can be distributed.
  2. How does Facebook handle single point of failure problem?
  3. Do you know about the AvatarNode implementation at Facebook?
  4. Facebook decides to award the user with an Audi who submits the billionth search query on a particular day by displaying a banner on their search results page. Considering the scale of Facebook, how will you implement it?
  5. How does Facebook store user’s status updates and likes?
  6. All Facebook messages sent from desktop and Mobile are persisted on which database?

TCS Hadoop Developer Interview Questions

  1. What is the difference between data and big data?
  2. Which object will you use to track the progress of a job?

Hadoop Developer Interview Questions asked at other Top Tech Companies like Cognizant, CTS, Wipro

  1. What Hadoop components will you use to design a Craiglist based architecture?
  2. Why cannot you use Java primitive data types in Hadoop MapReduce?
  3. Can HDFS blocks be broken?
  4. Does Hadoop replace data warehousing systems?
  5. How will you protect the data at rest?
  6. Propose a design to develop a system that can handle ingestion of both periodic data and real-time data.
  7. A folder contains 10000 files with each file having size greater than 3GB.The files contain users, their names and date. How will you get the count of all the unique users from 10000 files using Hadoop?
  8. File could be replicated to 0 Nodes, instead of 1. Have you ever come across this message? What does it mean?
  9.  How do reducers communicate with each other?
  10. How can you backup file system metadata in Hadoop?
  11. What do you understand by a straggler in the context of MapReduce?

1 comment: