java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

spark-issues | Jonathan Kelly (JIRA) | 1 year ago
  1. 0

    When using cluster deploy mode, the classpath of the SparkSubmit process that gets launched only includes the Spark assembly and not spark.driver.extraClassPath. This is of course by design, since the driver actually runs on the cluster and not inside the SparkSubmit process. However, if the SparkSubmit process, minimal as it may be, needs any extra libraries that are not part of the Spark assembly, there is no good way to include them. (I say "no good way" because including them in the SPARK_CLASSPATH environment variable does cause the SparkSubmit process to include them, but this is not acceptable because this environment variable has long been deprecated, and it prevents the use of spark.driver.extraClassPath.) An example of when this matters is on Amazon EMR when using an S3 path for the application JAR and running in yarn-cluster mode. The SparkSubmit process needs the EmrFileSystem implementation and its dependencies in the classpath in order to download the application JAR from S3, so it fails with a ClassNotFoundException. (EMR currently gets around this by setting SPARK_CLASSPATH, but as mentioned above this is less than ideal.) I have tried modifying SparkSubmitCommandBuilder to include the driver extra classpath whether it's client mode or cluster mode, and this seems to work, but I don't know if there is any downside to this. Example that fails on emr-4.0.0 (if you switch to setting spark.(driver,executor).extraClassPath instead of SPARK_CLASSPATH): spark-submit --deploy-mode cluster --class org.apache.spark.examples.JavaWordCount s3://my-bucket/spark-examples.jar s3://my-bucket/word-count-input.txt Resulting Exception: Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2626) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2639) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:233) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:327) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:366) at org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:364) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:364) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119) at org.apache.spark.deploy.yarn.Client.run(Client.scala:907) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:966) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072) ... 27 more

    Apache's JIRA Issue Tracker | 1 year ago | Jonathan Kelly
    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
  2. 0

    [jira] [Updated] (SPARK-10789) Cluster mode SparkSubmit classpath only includes Spark classpath

    spark-issues | 1 year ago | Jonathan Kelly (JIRA)
    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
  3. 0

    Hadoop Ingestion With an EMR cluster

    Google Groups | 2 years ago | Torche Guillaume
    java.lang.reflect.InvocationTargetException
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    EMRFS error when running with Tez on EMR

    Google Groups | 2 years ago | Mike DeLaurentis
    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
  6. 0

    Issues Google Cloud Storage connector on Spark

    Stack Overflow | 2 years ago | poiuytrez
    java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem not found

  1. Nikolay Rybak 1 times, last 1 month ago
2 unregistered visitors
Not finding the right solution?
Take a tour to get the most out of Samebug.

Tired of useless tips?

Automated exception search integrated into your IDE

Root Cause Analysis

  1. java.lang.ClassNotFoundException

    Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

    at org.apache.hadoop.conf.Configuration.getClassByName()
  2. Hadoop
    Path.getFileSystem
    1. org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
    2. org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
    3. org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2626)
    4. org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2639)
    5. org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:90)
    6. org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2678)
    7. org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2660)
    8. org.apache.hadoop.fs.FileSystem.get(FileSystem.java:374)
    9. org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
    9 frames
  3. Spark Project YARN Stable API
    Client$$anonfun$prepareLocalResources$5.apply
    1. org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:233)
    2. org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:327)
    3. org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:366)
    4. org.apache.spark.deploy.yarn.Client$$anonfun$prepareLocalResources$5.apply(Client.scala:364)
    4 frames
  4. Scala
    List.foreach
    1. scala.collection.immutable.List.foreach(List.scala:318)
    1 frame
  5. Spark Project YARN Stable API
    Client.main
    1. org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:364)
    2. org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:629)
    3. org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:119)
    4. org.apache.spark.deploy.yarn.Client.run(Client.scala:907)
    5. org.apache.spark.deploy.yarn.Client$.main(Client.scala:966)
    6. org.apache.spark.deploy.yarn.Client.main(Client.scala)
    6 frames
  6. Java RT
    Method.invoke
    1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    4. java.lang.reflect.Method.invoke(Method.java:606)
    4 frames
  7. Spark
    SparkSubmit.main
    1. org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
    2. org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    3. org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    4. org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
    5. org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    5 frames