how to save Spark RDD output in single file with header using java

how to save Spark RDD output in single file with header using java

Below code snippet shows how to save RDD output input single file with header:

        SparkConf conf = new SparkConf().setAppName("test").setMaster("local[2]");
        JavaSparkContext jsc = new JavaSparkContext(conf);
 
        JavaRDD headerRDD = jsc.parallelize(Arrays.asList(new String[]{"name,address,city"}), 1);
 
        JavaRDD dataRDD=....;
 
        //Make sure s.toString and header are in sync
        dataRDD= dataRDD.map(s->s.toString());
        //Joined RDD
       JavaRDD joinedRDD= headerRDD.union(dataRDD);
 
        joinedRDD.repartition(1).saveAsTextFile("<output>");
 
</output>

2 thoughts on “how to save Spark RDD output in single file with header using java

    1. rahul Post author

      Header and RDD row schema must share the common schema. E.g. When Header contains N fields then RDD row should have N number of fields; however, there will not be any error when this condition is not met.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.