Spark Dataset Operations in java

Spark Dataset Operations in java

I am gonna demonstrate step by step setup of spark project in this post and explore few basics Spark dataset operations in Java.

Create Maven project with POM:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns=""

Project structure:

Create Bean definition:

public class People implements Serializable {
private String name;
private Long age;
public People() {
public People(String name, Long age) { = name;
this.age = age;
public String getName() {
return name;
public void setName(String name) { = name;
public Long getAge() {
return age;
public void setAge(Long age) {
this.age = age;

Create people.json file in resource directory:

Filter content of dataset:

public class Application {
    public static void main(String[] args) {
       SparkSession session=SparkSession.builder().appName("dataset example").getOrCreate();
         * Define encoder, used to convert data to binary format in jvm
        Encoder encode= Encoders.bean(People.class);
         * Load dataset from json
        Dataset ds=
        ds.filter((FilterFunction<People>)s-> (s.getAge()>30)).show();

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.