java.lang.NoSuchMethodException: org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics()

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • I'm using Maven and Eclipse to build my project. When I import the CDK (https://github.com/egonw/cdk) jar-files that I need, and setup the SparkContext and try for instance reading a file (by simply "val lines = sc.textFile(filePath)") I get the following errors in the log: {quote} [main] DEBUG org.apache.spark.rdd.HadoopRDD - SplitLocationInfo and other new Hadoop classes are unavailable. Using the older Hadoop location info code. java.lang.ClassNotFoundException: org.apache.hadoop.mapred.InputSplitWithLocationInfo at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:191) at org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:381) at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391) at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:390) at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:159) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) at org.apache.spark.rdd.RDD.foreach(RDD.scala:765) {quote} later in the log: {quote} [Executor task launch worker-0] DEBUG org.apache.spark.deploy.SparkHadoopUtil - Couldn't find method for retrieving thread-level FileSystem input data java.lang.NoSuchMethodException: org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() at java.lang.Class.getDeclaredMethod(Class.java:2009) at org.apache.spark.util.Utils$.invoke(Utils.scala:1733) at org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) at org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:178) at org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {quote} There has also been issues related to "HADOOP_HOME" not being set etc., but which seems to be intermittent and only occur sometimes. After testing different versions of both CDK and Spark, I've found out that the Spark version 0.9.1 seems to get things to work. This will not solve my problem though, as I will later need to use functionality from the MLlib that are only in the newer versions of Spark.
    via by Staffan Arvidsson,
  • GitHub comment 24#118637588
    via GitHub by dinomite
    ,
    • java.lang.NoSuchMethodException: org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() at java.lang.Class.getDeclaredMethod(Class.java:2009) at org.apache.spark.util.Utils$.invoke(Utils.scala:1733) at org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) at org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:178) at org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

    Users with the same issue

    tyson925
    tyson925123 times, last one,
    Unknown visitor1 times, last one,
    Unknown visitor1 times, last one,
    Unknown visitor1 times, last one,
    Unknown visitor1 times, last one,
    48 more bugmates