Hadoop Oozie

Topics 

  • What is and Why Oozie? 
  • Workflow definition file 
  • Workflow application 
  • Control nodes 
  • Action nodes – MapReduce node 
  • Steps for building and running Workflow application 
  • Action nodes – Other Action nodes 
  • Oozie coordination

What is and Why Oozie?

Why Oozie? 

  • In real-life situation, it is common that multiple Hadoop jobs need to be performed in controlled and scheduled manner 
    • MapReduce job 
    • Pig job 
    • Streaming job 
    • HDFS operations (mkdir, chmod, etc) 
  • Ad-hoc solutions were used before Oozie but not adequate 
    • Shell scripts 
    • Cron 
    • Custom job control 
  • There needs to be a common and simple solution 
    • Why Oozie was born
  • A server-based workflow scheduling system to manage Hadoop jobs 
  • Oozie server receives Job submission request from Oozie client 
    • Oozie server is built over Tomcat 
  • Oozie server is made of two main parts
    • Workflow engine – stores and runs workflows composed of different types of Hadoop jobs 
    • Coordination engine – runs workflow jobs based on predefined schedules and data availability

You must have an active subscription to download PDF and Lab Zip of this course topic.Please click the "Subscribe" button or the "Login" button if you already have an account.