PIG Functions
Eval Functions
Load/Store Functions
Math Functions
String Functions
Tuple, Bag, Map Functions
User Defined Functions (UDFs)
Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing. Pig UDFs can currently be implemented in three languages: Java, Python, JavaScript and Ruby. Registering UDFs Registering Java UDFs: —register_java_udf.pig register ‘your_path_to_piggybank/piggybank.jar’; divs = load ‘NYSE_dividends’ as (exchange:chararray, symbol:chararray, date:chararray, dividends:float); Registering Python UDFs (The Python script must be in your current directory): –register_python_udf.pig register ‘production.py’ using jython as bballudfs; players = load ‘baseball’ as (name:chararray, team:chararray, pos:bag{t:(p:chararray)}, bat:map[]); Writing UDFs Java UDFs: package myudfs; import java.io.IOException; import org.apache.pig.EvalFunc; import org.apache.pig.data.Tuple;
public class UPPER extends EvalFunc { Public String exec(Tuple input) throws IOException { If (input == null || input.size() == 0) return null; try{ String str = (String)input.get(0); return str.toUpperCase(); }catch(Exception e){ throw new IOException(“Caught exception processing input row “, e); } } } Python UDFs # usr/bin/python #Square – Square of a number of any data type @outputSchemaFunction(“squareSchema”) — Defines a script delegate function that defines schema for this function depending upon the input type. def square(num): return ((num)*(num)) @schemaFunction(“squareSchema”) –Defines delegate function and is not registered to Pig. def squareSchema(input): return input #Percent- Percentage @outputSchema(“percent:double”) –Defines schema for a script UDF in a format that Pig understands and is able to parse def percent(num, total): return num * 100 / total