Hadoop Oozie


  • What is and Why Oozie? 
  • Workflow definition file 
  • Workflow application 
  • Control nodes 
  • Action nodes – MapReduce node 
  • Steps for building and running Workflow application 
  • Action nodes – Other Action nodes 
  • Oozie coordination

What is and Why Oozie?

Why Oozie? 

  • In real-life situation, it is common that multiple Hadoop jobs need to be performed in controlled and scheduled manner 
    • MapReduce job 
    • Pig job 
    • Streaming job 
    • HDFS operations (mkdir, chmod, etc) 
  • Ad-hoc solutions were used before Oozie but not adequate 
    • Shell scripts 
    • Cron 
    • Custom job control 
  • There needs to be a common and simple solution 
    • Why Oozie was born
  • A server-based workflow scheduling system to manage Hadoop jobs 
  • Oozie server receives Job submission request from Oozie client 
    • Oozie server is built over Tomcat 
  • Oozie server is made of two main parts
    • Workflow engine – stores and runs workflows composed of different types of Hadoop jobs 
    • Coordination engine – runs workflow jobs based on predefined schedules and data availability

