org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

Stack Overflow | sourabh pandey | 5 months ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    Read, sort and count 20GB CSV file stored in HDFS by using pyspark RDD

    Stack Overflow | 5 months ago | sourabh pandey
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

    Root Cause Analysis

    1. org.apache.spark.SparkException

      Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

      at java.nio.HeapCharBuffer.<init>()
    2. Java RT
      CharsetDecoder.decode
      1. java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
      2. java.nio.CharBuffer.allocate(CharBuffer.java:331)
      3. java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
      3 frames
    3. Hadoop
      Text.toString
      1. org.apache.hadoop.io.Text.decode(Text.java:412)
      2. org.apache.hadoop.io.Text.decode(Text.java:389)
      3. org.apache.hadoop.io.Text.toString(Text.java:280)
      3 frames