Hadoop PIG Part I


  • What is and Why Pig? 
  • Executing Pig
  • Pig Latin concepts 
  • Pig Latin script: operators & functions 
  • Pig Latin script: structure 
  • Pig Latin relational operators 
  • Pig examples

What is and Why Apache Pig? 

  • Apache Pig is a platform for analyzing large data sets and is made of 
    • Pig Latin: a high-level, data flow, scripting language for expressing data analysis
    • Pig runtime: infrastructure for evaluating these scripts 
  • Pig scripts written in Pig Latin are automatically converted into MapReduce jobs by the Pig runtime 
  • Pig scripts are much easier to create and run than MapReduce code 
    • Designed to be used by data analysts without the help of MapReduce Java developers 
    • Pig is widely used as an alternative to MapReduce 
    • Extensively used in Yahoo, LinkedIn, Twitter, NetFlix, etc

