Java Spark User Defined Functions

Dulaj Rajitha
2 min readApr 9, 2019
Create Your Own Functions

A Spark Dataset/ Dataframe is a distributed collection of data which can apply map, flatMap, filter, reduce , etc functionalities.

The functions we can found on spark.sql functions is limited. We will face scenarios where we need to create a new column value using existing column or multiple columns. That’s where the custom UDF comes to the play.

Steps

Register UDF

Initially we will have to register the UDF with a name with spark SQL context.

We can do that as of the following.

sparkSession
.sqlContext()
.udf()
.register( "sampleUDF", sampleUdf(), DataTypes.DoubleType );

Here the first argument is the name of the UDF that is going to be used when calling the UDF. The second argument will the UDF function and that can be a method or a lambda.

Call UDF

Then you can call the defined function using the name provided and column from the dataframe.

dataframe
.withColumn( "newColumn",
functions.callUDF( "sampleUDF", dataframe.col( column1 ) ) );

functions.callUDF method will produce a new column and withColumn method will add a new column to the dataframe.

Code sample

Sample UDF with single argument and Double return type

UDF types

  • UDF0: UDF with no arguments
  • UDF1 to UDF22: UDF with one or more arguments upto 22

Serializable requirements

All the UFD methods should be serializable.

For example, if we use a class field in the above sample UDF method, the program will break due to cannot serialize error.

UDF as lambdas

UDF functions can register as lambdas as well and having a named method is not necessary.

sparkSession.sqlContext()
.udf()
.register( "sampleUDFLambda", ( String s1 ) -> {
if ( s1 != null )
{
return s1.length() * 1.0;
}
else
{
return 0.0;
}
}, DataTypes.DoubleType )

Please refer the inbuilt function provided by Apache Spark Sql Functions library here.

Please leave a comment below if you have any questions or feedback! I’d also love to hear about your UDF application using Apache Spark as well.

If you like this post, follow me on Medium for more similar posts.

If you have any concerns or questions, please use the comment section below.

--

--