java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName

Atlassian JIRA | Rodrigo Girardi Adami [Atlassian] | 1 year ago
  1. 0

    Confluence is throwing this error message in the logs: {code} 2015-06-11 08:24:18,444 WARN [Indexer: 4] [apache.pdfbox.cos.COSDocument] getObjectsByType java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName:user | action: reindex java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294) at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:656) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:59) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) {code} and It seems to be throwing an out of memory for the indexer due to this bug as well: {code} - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName: user | action: reindex java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.io.ByteArrayOutputStream.grow(Unknown Source) at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source) at java.io.ByteArrayOutputStream.write(Unknown Source) at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:308) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248) at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183) at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:107) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) {code} This is caused by a bug in the PDFBOX stated here: https://issues.apache.org/jira/browse/PDFBOX-1756 Confluence 5.7.3, 5.8.2 and 5.8.4 ships the version 1.8.4 of pdfbox, which is affected by the bug. h3. Workaround 1) Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR 2) Update the pdfbox plugin manually in Confluence_install\confluence\WEB-INF\lib folder by replacing the original pdf plugin with a version [1.8.6|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/]

    Atlassian JIRA | 1 year ago | Rodrigo Girardi Adami [Atlassian]
    java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
  2. 0

    Confluence is throwing this error message in the logs: {code} 2015-06-11 08:24:18,444 WARN [Indexer: 4] [apache.pdfbox.cos.COSDocument] getObjectsByType java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName:user | action: reindex java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName at org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294) at org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:656) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:59) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) {code} and It seems to be throwing an out of memory for the indexer due to this bug as well: {code} - referer: http://URL/admin/search-indexes.action | url: /admin/reindex.action | userName: user | action: reindex java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.io.ByteArrayOutputStream.grow(Unknown Source) at java.io.ByteArrayOutputStream.ensureCapacity(Unknown Source) at java.io.ByteArrayOutputStream.write(Unknown Source) at org.apache.pdfbox.filter.FlateFilter.decompress(FlateFilter.java:172) at org.apache.pdfbox.filter.FlateFilter.decode(FlateFilter.java:98) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:308) at org.apache.pdfbox.cos.COSStream.doDecode(COSStream.java:248) at org.apache.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:183) at org.apache.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:107) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:251) at org.apache.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:235) at org.apache.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:215) at org.apache.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:456) at org.apache.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:381) at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:340) {code} This is caused by a bug in the PDFBOX stated here: https://issues.apache.org/jira/browse/PDFBOX-1756 Confluence 5.7.3, 5.8.2 and 5.8.4 ships the version 1.8.4 of pdfbox, which is affected by the bug. h3. Workaround 1) Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR 2) Update the pdfbox plugin manually in Confluence_install\confluence\WEB-INF\lib folder by replacing the original pdf plugin with a version [1.8.6|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/]

    Atlassian JIRA | 1 year ago | Rodrigo Girardi Adami [Atlassian]
    java.lang.ClassCastException: org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName
  3. 0

    Tika 1.2 PDF parse error - org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSDictionary

    Stack Overflow | 4 years ago | Phani Kumar
    org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@1fbfd6<mailto:org.apache.tika.parser.pdf.PDFParser@1fbfd6>
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    RE: Tika 1.2 PDF parse error - org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSDictionary

    tika-user | 4 years ago | Phani Kumar Samudrala
    org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@2a15cd
  6. 0

    RE: Tika 1.2 PDF parse error - org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSDictionary

    tika-user | 4 years ago | Markus Jelsma
    org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser@1fbfd6<mailto:org.apache.tika.parser.pdf.PDFParser@1fbfd6>

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.ClassCastException

      org.apache.pdfbox.cos.COSString cannot be cast to org.apache.pdfbox.cos.COSName

      at org.apache.pdfbox.cos.COSDocument.getObjectsByType()
    2. Apache PDFBox
      PDDocument.load
      1. org.apache.pdfbox.cos.COSDocument.getObjectsByType(COSDocument.java:294)
      2. org.apache.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:656)
      3. org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:244)
      4. org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1219)
      4 frames
    3. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:59)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41)
      2 frames