Classe KafkaUtils non trouvée dans le streaming Spark

Question

Je viens de commencer avec Spark Streaming et j'essaie de créer un exemple d'application qui compte les mots d'un flux Kafka. Bien qu'il compile avec sbt package, lorsque je l'exécute, je reçois NoClassDefFoundError. Ce post semble avoir le même problème, mais la solution est pour Maven et je n’ai pas été en mesure de le reproduire avec certitude.

KafkaApp.scala:

import org.Apache.spark._ import org.Apache.spark.streaming._ import org.Apache.spark.streaming.kafka._ object KafkaApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("kafkaApp").setMaster("local[*]") val ssc = new StreamingContext(conf, Seconds(1)) val kafkaParams = Map( "zookeeper.connect" -> "localhost:2181", "zookeeper.connection.timeout.ms" -> "10000", "group.id" -> "sparkGroup" ) val topics = Map( "test" -> 1 ) // stream of (topic, ImpressionLog) val messages = KafkaUtils.createStream(ssc, kafkaParams, topics, storage.StorageLevel.MEMORY_AND_DISK) println(s"Number of words: %{messages.count()}") } }

build.sbt:

name := "Simple Project" version := "1.1" scalaVersion := "2.10.4" libraryDependencies ++= Seq( "org.Apache.spark" %% "spark-core" % "1.1.1", "org.Apache.spark" %% "spark-streaming" % "1.1.1", "org.Apache.spark" %% "spark-streaming-kafka" % "1.1.1" ) resolvers += "Akka Repository" at "http://repo.akka.io/releases/"

Et je le soumets avec:

bin/spark-submit \ --class "KafkaApp" \ --master local[4] \ target/scala-2.10/simple-project_2.10-1.1.jar

Erreur:

14/12/30 19:44:57 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@192.168.5.252:65077/user/HeartbeatReceiver Exception in thread "main" Java.lang.NoClassDefFoundError: org/Apache/spark/streaming/kafka/KafkaUtils$ at KafkaApp$.main(KafkaApp.scala:28) at KafkaApp.main(KafkaApp.scala) at Sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at Sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.Java:57) at Sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.Java:43) at Java.lang.reflect.Method.invoke(Method.Java:606) at org.Apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:329) at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: Java.lang.ClassNotFoundException: org.Apache.spark.streaming.kafka.KafkaUtils$ at Java.net.URLClassLoader$1.run(URLClassLoader.Java:366) at Java.net.URLClassLoader$1.run(URLClassLoader.Java:355) at Java.security.AccessController.doPrivileged(Native Method) at Java.net.URLClassLoader.findClass(URLClassLoader.Java:354) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:425) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:358)

Tathagata Das · Accepted Answer

spark-submit ne met pas automatiquement le paquet contenant KafkaUtils. Vous devez avoir dans votre projet JAR. Pour cela, vous devez créer un uber-jar tout compris, en utilisant sbt Assembly . Voici un exemple build.sbt.

https://github.com/tdas/spark-streaming-external-projects/blob/master/kafka/build.sbt

Vous devez évidemment également ajouter le plugin Assembly à SBT.

https://github.com/tdas/spark-streaming-external-projects/tree/master/kafka/project

Sandeep · Answer

Essayez d’inclure tous les fichiers de dépendance lors de la soumission de l’application:

./spark-submit --name "SampleApp" - client en mode déploiement - maître spark: // hôte: 7077 --class com.stackexchange.SampleApp --jars $ SPARK_INSTALL_DIR/spark-streaming-kafka_2.10-1.3 0 -SNAPSHOT.jar

Vibhuti · Answer

Suivre build.sbt a fonctionné pour moi. Vous devez également placer le plug-in sbt-Assembly dans un fichier du répertoire projects/.

build.sbt

name := "NetworkStreaming" // https://github.com/sbt/sbt-Assembly/blob/master/Migration.md#upgrading-with-bare-buildsbt libraryDependencies ++= Seq( "org.Apache.spark" % "spark-streaming_2.10" % "1.4.1", "org.Apache.spark" % "spark-streaming-kafka_2.10" % "1.4.1", // kafka "org.Apache.hbase" % "hbase" % "0.92.1", "org.Apache.hadoop" % "hadoop-core" % "1.0.2", "org.Apache.spark" % "spark-mllib_2.10" % "1.3.0" ) mergeStrategy in Assembly := { case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard case m if m.toLowerCase.matches("meta-inf.*\.sf$") => MergeStrategy.discard case "log4j.properties" => MergeStrategy.discard case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines case "reference.conf" => MergeStrategy.concat case _ => MergeStrategy.first }

projet/plugins.sbt

addSbtPlugin("com.eed3si9n" % "sbt-Assembly" % "0.14.1")

Gi1ber7 · Answer

Utiliser Spark 1.6 me permet de ne pas avoir à gérer autant de pots externes ... Cela peut être assez compliqué à gérer ...

Nilesh · Answer

rencontrer le même problème, je l'ai résolu en construisant le pot avec des dépendances.

ajoutez le code ci-dessous à pom.xml

<build> <sourceDirectory>src/main/Java</sourceDirectory> <testSourceDirectory>src/test/Java</testSourceDirectory> <plugins> <!-- Bind the maven-Assembly-plugin to the package phase this will create a jar file without the storm dependencies suitable for deployment to a cluster. --> <plugin> <artifactId>maven-Assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <mainClass></mainClass> </manifest> </archive> </configuration> <executions> <execution> <id>make-Assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build>

paquet mvn soumet le "example-jar-with-dependencies.jar"

Walker Rowe · Answer

Vous pouvez également télécharger le fichier jar et le placer dans le dossier Spark lib, car il n’est pas installé avec Spark, au lieu de vous battre pour tenter de parier que SBT build.sbt fonctionne.

http://central.maven.org/maven2/org/Apache/spark/spark-streaming-kafka-0-10_2.10/2.1.1/spark-streaming-kafka-0-10_2.10-2.1. 1.jar

copiez-le dans:

/usr/local/spark/spark-2.1.0-bin-hadoop2.6/jars/

Suresh · Answer

Dépendance externe ajoutée, projet -> propriétés -> Chemin de construction Java -> Bibliothèques -> ajouter des fichiers JAR externes et ajouter le fichier JAR requis.

cela a résolu mon problème.

Sandeep Sompalle · Answer

import org.Apache.spark.streaming.kafka.KafkaUtils

utilisez le ci-dessous dans build.sbt

name := "kafka" version := "0.1" scalaVersion := "2.11.12" retrieveManaged := true fork := true //libraryDependencies += "org.Apache.spark" % "spark-streaming_2.11" % "2.2.0" //libraryDependencies += "org.Apache.spark" % "spark-streaming-kafka-0-8_2.11" % "2.1.0" libraryDependencies += "org.Apache.spark" %% "spark-core" % "2.2.0" //libraryDependencies += "org.Apache.spark" %% "spark-sql" % "2.2.0" libraryDependencies += "org.Apache.spark" %% "spark-streaming" % "2.2.0" // https://mvnrepository.com/artifact/org.Apache.spark/spark-streaming-kafka-0-8 libraryDependencies += "org.Apache.spark" %% "spark-streaming-kafka-0-8" % "2.2.0" % "provided" // https://mvnrepository.com/artifact/org.Apache.spark/spark-streaming-kafka-0-8-Assembly libraryDependencies += "org.Apache.spark" %% "spark-streaming-kafka-0-8-Assembly" % "2.2.0"

Cela résoudra le problème