web-dev-qa-db-fra.com

aws: le cluster EMR échoue "ERROR UserData: erreur rencontrée lors de la tentative d'obtention des données utilisateur" lors de la soumission de la tâche spark

Démarrage réussi du cluster aws EMR, mais toute soumission échoue avec:

19/07/30 08:37:42 ERROR UserData: Error encountered while try to get user data
Java.io.IOException: File '/var/aws/emr/userData.json' cannot be read
    at com.Amazon.ws.emr.hadoop.fs.shaded.org.Apache.commons.io.FileUtils.openInputStream(FileUtils.Java:296)
    at com.Amazon.ws.emr.hadoop.fs.shaded.org.Apache.commons.io.FileUtils.readFileToString(FileUtils.Java:1711)
    at com.Amazon.ws.emr.hadoop.fs.shaded.org.Apache.commons.io.FileUtils.readFileToString(FileUtils.Java:1748)
    at com.Amazon.ws.emr.hadoop.fs.util.UserData.getUserData(UserData.Java:62)
    at com.Amazon.ws.emr.hadoop.fs.util.UserData.<init>(UserData.Java:39)
    at com.Amazon.ws.emr.hadoop.fs.util.UserData.ofDefaultResourceLocations(UserData.Java:52)
    at com.Amazon.ws.emr.hadoop.fs.util.AWSSessionCredentialsProviderFactory.buildSTSClient(AWSSessionCredentialsProviderFactory.Java:52)
    at com.Amazon.ws.emr.hadoop.fs.util.AWSSessionCredentialsProviderFactory.<clinit>(AWSSessionCredentialsProviderFactory.Java:17)
    at com.Amazon.ws.emr.hadoop.fs.rolemapping.DefaultS3CredentialsResolver.resolve(DefaultS3CredentialsResolver.Java:22)
    at com.Amazon.ws.emr.hadoop.fs.guice.CredentialsProviderOverrider.override(CredentialsProviderOverrider.Java:25)
    at com.Amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.executeOverriders(GlobalS3Executor.Java:130)
    at com.Amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.Java:86)
    at com.Amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.Java:184)
    at com.Amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.doesBucketExist(AmazonS3LiteClient.Java:90)
    at com.Amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.ensureBucketExists(Jets3tNativeFileSystemStore.Java:139)
    at com.Amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.Java:116)
    at com.Amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.initialize(S3NativeFileSystem.Java:508)
    at com.Amazon.ws.emr.hadoop.fs.EmrFileSystem.initialize(EmrFileSystem.Java:111)
    at org.Apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.Java:2859)
    at org.Apache.hadoop.fs.FileSystem.access$200(FileSystem.Java:99)
    at org.Apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.Java:2896)
    at org.Apache.hadoop.fs.FileSystem$Cache.get(FileSystem.Java:2878)
    at org.Apache.hadoop.fs.FileSystem.get(FileSystem.Java:392)
    at org.Apache.spark.deploy.DependencyUtils$.org$Apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:190)
    at org.Apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:146)
    at org.Apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:144)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at org.Apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:144)
    at org.Apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:354)
    at org.Apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$3.apply(SparkSubmit.scala:354)
    at scala.Option.map(Option.scala:146)
    at org.Apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:354)
    at org.Apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
    at org.Apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.Apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.Apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.Apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

serData.json ne fait pas partie de mon application, il semble que ce soit des emr internes.

Des idées ce qui ne va pas? Je soumets des emplois via des demandes dynamiques. Configuration du cluster: 2 nœuds principaux m4.large 7 nœuds de tâche m5.4xlarge 1 nœud maître m5.xlarge

4
Normal

J'ai rencontré le même problème dans AWS EMR emr-5.24.1 (spark 2.4.1), mais les travaux ne sont jamais échoués.

0
hnahak