com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

Atlassian JIRA | Lauretha Rura [Atlassian] | 1 year ago
  1. 0

    [CONF-38552] Bug in the PDFBox Plugin Causes Blockage during Content Reindexing - Atlassian JIRA

    atlassian.com | 1 year ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  2. 0

    h5. Summary [Bug - PDFBOX-2522|https://issues.apache.org/jira/browse/PDFBOX-2522] in the PDFBox plugin causes blockage during content reindexing. Confluence 5.7.x and 5.8.x which ships with PDFBox version 1.8.4 is affected by the bug. h5. Steps to Reproduce # Add the problematic .pdf attachment to your Confluence page. # Try to manually [rebuild the contents index|https://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-Rebuildingthesearchindex] of your Confluence instance. # Notice that during content reindexing the following warnings are thrown in the {{<Confluence-Home>/logs/atlassian-confluence.log}} file which then blocks the indexing process: {noformat} 2015-07-14 16:44:49,004 WARN [scheduler_Worker-9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: xxx.pdf v.1 (22544563)) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457) ... Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:115) at javax.crypto.CipherInputStream.read(CipherInputStream.java:233) at javax.crypto.CipherInputStream.read(CipherInputStream.java:209) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:312) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:413) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:386) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:361) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:192) at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:158) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1600) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:946) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:64) ... 42 more Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436) at javax.crypto.Cipher.doFinal(Cipher.java:2048) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:112) {noformat} h5. Workaround # Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR # Update the PDFBox plugin manually in {{<Confluence-Installation>/confluence/WEB-INF/lib}} folder by replacing the original PDFBox plugin with a version [1.8.8|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/].

    Atlassian JIRA | 1 year ago | Lauretha Rura [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  3. 0

    h5. Summary [Bug - PDFBOX-2522|https://issues.apache.org/jira/browse/PDFBOX-2522] in the PDFBox plugin causes blockage during content reindexing. Confluence 5.7.x and 5.8.x which ships with PDFBox version 1.8.4 is affected by the bug. h5. Steps to Reproduce # Add the problematic .pdf attachment to your Confluence page. # Try to manually [rebuild the contents index|https://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-Rebuildingthesearchindex] of your Confluence instance. # Notice that during content reindexing the following warnings are thrown in the {{<Confluence-Home>/logs/atlassian-confluence.log}} file which then blocks the indexing process: {noformat} 2015-07-14 16:44:49,004 WARN [scheduler_Worker-9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: xxx.pdf v.1 (22544563)) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457) ... Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:115) at javax.crypto.CipherInputStream.read(CipherInputStream.java:233) at javax.crypto.CipherInputStream.read(CipherInputStream.java:209) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:312) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:413) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:386) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:361) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:192) at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:158) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1600) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:946) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:64) ... 42 more Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436) at javax.crypto.Cipher.doFinal(Cipher.java:2048) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:112) {noformat} h5. Workaround # Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR # Update the PDFBox plugin manually in {{<Confluence-Installation>/confluence/WEB-INF/lib}} folder by replacing the original PDFBox plugin with a version [1.8.8|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/].

    Atlassian JIRA | 1 year ago | Lauretha Rura [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    [CONF-18962] Some pdf files don't get correctly indexed - Atlassian JIRA

    atlassian.com | 8 months ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  6. 0

    [CONF-18962] Some pdf files don't get correctly indexed - Atlassian JIRA

    atlassian.com | 8 months ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. com.atlassian.bonnie.search.extractor.ExtractorException

      Error getting content of PDF document

      at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText()
    2. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41)
      2 frames
    3. com.atlassian.confluence
      DefaultConfluenceIndexManager$IndexTaskWriter.apply
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40)
      2. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155)
      3. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      4. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49)
      5. com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43)
      6. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473)
      7. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457)
      7 frames