java.io.IOException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • I am trying to extract text from PDFs. Extracting text from the test file http://digitalcorpora.org/corp/nps/files/govdocs1/020/020747.pdf causes exceptions to be thrown. The first: Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:187) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.hasNext(PDFStreamParser.java:194) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:255) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:455) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:379) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:335) Caused by: java.io.IOException: Value is not an integer: 636121514401477526485946144 at org.apache.pdfbox.cos.COSNumber.get(COSNumber.java:104) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:351) at org.apache.pdfbox.pdfparser.PDFStreamParser.access$000(PDFStreamParser.java:46) at org.apache.pdfbox.pdfparser.PDFStreamParser$1.tryNext(PDFStreamParser.java:182) Code to cause above exception: PDFTextStripper ts = new PDFTextStripper(); PrintWriter out = new PrintWriter(new FileWriter(new File ("020747.txt"))); PDDocument doc = PDDocument.load(new File("020747.pdf").toURI().toURL(), true); ts.setForceParsing(true); ts.writeText(doc, out); Using the following code causes a different exception until org.apache.pdfbox.baseParser.pushBackSize is increased (only tested 1024768). After it is increased I get basically the same exception as above PrintWriter out = new PrintWriter(new FileWriter(new File("020747.txt"))); PDFParser parser = new PDFParser(new FileInputStream(new File("020747.pdf"))); parser.parse(); PDFTextStripper ts = new PDFTextStripper(); ts.setForceParsing(true); ts.writeText(parser.getPDDocument(), out);
    via by William Palmer,
    • java.io.IOException: Expected string 'null' but missed at character 'u' at offset 6376 at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1017) at org.apache.pdfbox.pdfparser.BaseParser.readExpectedString(BaseParser.java:1000) at org.apache.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:879) at org.apache.pdfbox.pdfparser.BaseParser.parseCOSArray(BaseParser.java:651) at org.apache.pdfbox.pdfparser.PDFStreamParser.parseNextToken(PDFStreamParser.java:175) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:479) at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:446) at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:149) at org.apache.pdfbox.text.PDFTextStreamEngine.processPage(PDFTextStreamEngine.java:136) at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:391) at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:319) at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:266) at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:227) at main.Test.readPDF(Test.java:170) at main.Test.main(Test.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)

    Users with the same issue

    Unknown visitor1 times, last one,