java

how to save Spark RDD output in single file with header using java

Below code snippet shows how to save RDD output input single file with header: SparkConf conf = new SparkConf().setAppName("test").setMaster("local[2]"); JavaSparkContext jsc = new JavaSparkContext(conf);   JavaRDD headerRDD = jsc.parallelize(Arrays.asList(new String[]{"name,address,city"}), 1);   JavaRDD dataRDD=….;   //Make sure s.toString and header are in sync dataRDD= dataRDD.map(s->s.toString()); //Joined RDD JavaRDD joinedRDD= headerRDD.union(dataRDD);…

Spark Dataset Operations in java

I am gonna demonstrate step by step setup of spark project in this post and explore few basics Spark dataset operations in Java. Create Maven project with POM: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.ts.spark</groupId> <artifactId>api</artifactId> <version>1.0-SNAPSHOT</version>   <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.0.0</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.11</artifactId>…

Hive NoClassDefFoundError auxiliary path issue

Hive NoClassDefFoundError error auxiliary path issue is very common. Sometimes even you add jar into classpath using below hive command, hive throws NoClassDefFound error- 1 2 add jar /xxx/hive-customserde.jar; add jar /xxx/solr-solrj.jar;add jar /xxx/hive-customserde.jar; add jar /xxx/solr-solrj.jar; Above commands will add resource to hive class path but suppose your custom…