web-dev-qa-db-fra.com

Pourquoi le travail Spark échoue avec "Code de sortie: 52"

J'ai eu Spark échec du travail avec une trace comme celle-ci:

./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container id: container_1455622885057_0016_01_000008
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Exit code: 52
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr:Stack trace: ExitCodeException exitCode=52: 
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.util.Shell.runCommand(Shell.Java:545)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.util.Shell.run(Shell.Java:456)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.Java:722)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.Java:211)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.Java:302)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at org.Apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.Java:82)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at Java.util.concurrent.FutureTask.run(FutureTask.Java:262)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1145)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:615)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-      at Java.lang.Thread.run(Thread.Java:745)
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-
./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container exited with a non-zero exit code 52

Il m'a fallu un certain temps pour comprendre ce que signifie le "code de sortie 52", donc je mets cela ici pour le bénéfice d'autres personnes

16
Virgil

Le code de sortie 52 provient de org.Apache.spark.util.SparkExitCode, et c'est val OOM=52 - c'est-à-dire une OutOfMemoryError. Ce qui est logique puisque je trouve également cela dans les journaux de conteneur:

16/02/16 17:09:59 ERROR executor.Executor: Managed memory leak detected; size = 4823704883 bytes, TID = 3226
16/02/16 17:09:59 ERROR executor.Executor: Exception in task 26.0 in stage 2.0 (TID 3226)
Java.lang.OutOfMemoryError: Unable to acquire 1248 bytes of memory, got 0
        at org.Apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.Java:120)
        at org.Apache.spark.shuffle.sort.ShuffleExternalSorter.acquireNewPageIfNecessary(ShuffleExternalSorter.Java:354)
        at org.Apache.spark.shuffle.sort.ShuffleExternalSorter.insertRecord(ShuffleExternalSorter.Java:375)
        at org.Apache.spark.shuffle.sort.UnsafeShuffleWriter.insertRecordIntoSorter(UnsafeShuffleWriter.Java:237)
        at org.Apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.Java:164)
        at org.Apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at org.Apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.Apache.spark.scheduler.Task.run(Task.scala:89)
        at org.Apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at Java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.Java:1145)
        at Java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.Java:615)
        at Java.lang.Thread.run(Thread.Java:745)

(Notez que je ne suis pas vraiment sûr à ce stade si le problème est dans mon code ou en raison de fuites de mémoire Tungsten, mais c'est un problème différent)

23
Virgil