org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

Stack Overflow | sourabh pandey | 6 months ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    Read, sort and count 20GB CSV file stored in HDFS by using pyspark RDD

    Stack Overflow | 6 months ago | sourabh pandey
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

    Root Cause Analysis

    1. org.apache.spark.SparkException

      Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: GC overhead limit exceeded

      at java.nio.HeapCharBuffer.<init>()
    2. Java RT
      CharsetDecoder.decode
      1. java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
      2. java.nio.CharBuffer.allocate(CharBuffer.java:331)
      3. java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
      3 frames
    3. Hadoop
      Text.toString
      1. org.apache.hadoop.io.Text.decode(Text.java:412)
      2. org.apache.hadoop.io.Text.decode(Text.java:389)
      3. org.apache.hadoop.io.Text.toString(Text.java:280)
      3 frames