how to save Spark RDD output in single file with header using java

how to save Spark RDD output in single file with header using java

Below code snippet shows how to save RDD output input single file with header: SparkConf conf = new SparkConf().setAppName("test").setMaster("local[2]"); JavaSparkContext jsc = new JavaSparkContext(conf);   JavaRDD headerRDD = jsc.parallelize(Arrays.asList(new String[]{"name,address,city"}), 1);   JavaRDD dataRDD=….;   //Make sure s.toString and header are in sync dataRDD= dataRDD.map(s->s.toString()); //Joined RDD JavaRDD joinedRDD= headerRDD.union(dataRDD);…

Spark Dataset Operations in java

I am gonna demonstrate step by step setup of spark project in this post and explore few basics Spark dataset operations in Java. Create Maven project with POM: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.ts.spark</groupId> <artifactId>api</artifactId> <version>1.0-SNAPSHOT</version>   <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.0.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId>…