Expected string 'null' but missed at character 'u' at offset 6376

Stack Overflow | iCoder | 6 months ago
  1. 0

    Java - Issue with data extraction from PDF (PDFBox - 2.02)

    Stack Overflow | 6 months ago | iCoder Expected string 'null' but missed at character 'u' at offset 6376
  2. 0

    I am trying to extract text from PDFs. Extracting text from the test file causes exceptions to be thrown. The first: Exception in thread "main" java.lang.RuntimeException: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext( at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext( at org.apache.pdfbox.util.PDFStreamEngine.processSubStream( at org.apache.pdfbox.util.PDFStreamEngine.processSubStream( at org.apache.pdfbox.util.PDFStreamEngine.processStream( at org.apache.pdfbox.util.PDFTextStripper.processPage( at org.apache.pdfbox.util.PDFTextStripper.processPages( at org.apache.pdfbox.util.PDFTextStripper.writeText( Caused by: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.cos.COSNumber.get( at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken( at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000( at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext( Code to cause above exception: PDFTextStripper ts = new PDFTextStripper(); PrintWriter out = new PrintWriter(new FileWriter(new File ("020747.txt"))); PDDocument doc = PDDocument.load(new File("020747.pdf").toURI().toURL(), true); ts.setForceParsing(true); ts.writeText(doc, out); Using the following code causes a different exception until org.apache.pdfbox.baseParser.pushBackSize is increased (only tested 1024768). After it is increased I get basically the same exception as above PrintWriter out = new PrintWriter(new FileWriter(new File("020747.txt"))); PDFParser parser = new PDFParser(new FileInputStream(new File("020747.pdf"))); parser.parse(); PDFTextStripper ts = new PDFTextStripper(); ts.setForceParsing(true); ts.writeText(parser.getPDDocument(), out);

    Apache's JIRA Issue Tracker | 3 years ago | William Palmer
    java.lang.RuntimeException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 16574
  3. 0
    samebug tip
    Download the winutils.exe for your Hadoop version: . Save it to HADOOP_HOME/bin
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0
    samebug tip
    Check for bad records in the input data (like '(null)')
  6. 0
    samebug tip
    Bad input data (not properly separated)

    1 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis


      Expected string 'null' but missed at character 'u' at offset 6376

      at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString()
    2. Apache PDFBox
      1. org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(
      2. org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(
      3. org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(
      4. org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(
      5. org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(
      5 frames
    3. org.apache.pdfbox
      1. org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(
      2. org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(
      3. org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(
      4. org.apache.pdfbox.text.PDFTextStreamEngine.processPage(
      5. org.apache.pdfbox.text.PDFTextStripper.processPage(
      6. org.apache.pdfbox.text.PDFTextStripper.processPages(
      7. org.apache.pdfbox.text.PDFTextStripper.writeText(
      8. org.apache.pdfbox.text.PDFTextStripper.getText(
      8 frames
    4. main
      1. main.Test.readPDF(
      2. main.Test.main(
      2 frames
    5. Java RT
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(
      4. java.lang.reflect.Method.invoke(
      4 frames
    6. IDEA
      1. com.intellij.rt.execution.application.AppMain.main(
      1 frame