java.io.IOException: Expected string 'null' but missed at character 'u' at offset 6376

Stack Overflow | iCoder | 7 months ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    Java - Issue with data extraction from PDF (PDFBox - 2.02)

    Stack Overflow | 7 months ago | iCoder
    java.io.IOException: Expected string 'null' but missed at character 'u' at offset 6376
  2. 0

    I am trying to extract text from PDFs. Extracting text from the test file http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf causes exceptions to be thrown. The first: Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:187) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:194) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335) Caused by: java.io.IOException: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:104) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:351) at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:182) Code to cause above exception: PDFTextStripper ts = new PDFTextStripper(); PrintWriter out = new PrintWriter(new FileWriter(new File ("020747.txt"))); PDDocument doc = PDDocument.load(new File("020747.pdf").toURI().toURL(), true); ts.setForceParsing(true); ts.writeText(doc, out); Using the following code causes a different exception until org.apache.pdfbox.baseParser.pushBackSize is increased (only tested 1024768). After it is increased I get basically the same exception as above PrintWriter out = new PrintWriter(new FileWriter(new File("020747.txt"))); PDFParser parser = new PDFParser(new FileInputStream(new File("020747.pdf"))); parser.parse(); PDFTextStripper ts = new PDFTextStripper(); ts.setForceParsing(true); ts.writeText(parser.getPDDocument(), out);

    Apache's JIRA Issue Tracker | 3 years ago | William Palmer
    java.lang.RuntimeException: java.io.IOException: Unknown dir object c=')' cInt=41 peek=')' peekInt=41 16574

    1 unregistered visitors

    Root Cause Analysis

    1. java.io.IOException

      Expected string 'null' but missed at character 'u' at offset 6376

      at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString()
    2. Apache PDFBox
      PDFStreamParser.parseNextToken
      1. org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1017)
      2. org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1000)
      3. org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:879)
      4. org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:651)
      5. org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:175)
      5 frames
    3. org.apache.pdfbox
      PDFTextStripper.getText
      1. org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:479)
      2. org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446)
      3. org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149)
      4. org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:136)
      5. org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391)
      6. org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319)
      7. org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266)
      8. org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227)
      8 frames
    4. main
      Test.main
      1. main.Test.readPDF(Test.java:170)
      2. main.Test.main(Test.java:76)
      2 frames
    5. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      4. java.lang.reflect.Method.invoke(Method.java:498)
      4 frames
    6. IDEA
      AppMain.main
      1. com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
      1 frame