com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

Atlassian JIRA | Lauretha Rura [Atlassian] | 2 years ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    [CONF-38552] Bug in the PDFBox Plugin Causes Blockage during Content Reindexing - Atlassian JIRA

    atlassian.com | 2 years ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  2. 0

    h5. Summary [Bug - PDFBOX-2522|https://issues.apache.org/jira/browse/PDFBOX-2522] in the PDFBox plugin causes blockage during content reindexing. Confluence 5.7.x and 5.8.x which ships with PDFBox version 1.8.4 is affected by the bug. h5. Steps to Reproduce # Add the problematic .pdf attachment to your Confluence page. # Try to manually [rebuild the contents index|https://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-Rebuildingthesearchindex] of your Confluence instance. # Notice that during content reindexing the following warnings are thrown in the {{<Confluence-Home>/logs/atlassian-confluence.log}} file which then blocks the indexing process: {noformat} 2015-07-14 16:44:49,004 WARN [scheduler_Worker-9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: xxx.pdf v.1 (22544563)) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457) ... Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:115) at javax.crypto.CipherInputStream.read(CipherInputStream.java:233) at javax.crypto.CipherInputStream.read(CipherInputStream.java:209) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:312) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:413) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:386) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:361) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:192) at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:158) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1600) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:946) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:64) ... 42 more Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436) at javax.crypto.Cipher.doFinal(Cipher.java:2048) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:112) {noformat} h5. Workaround # Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR # Update the PDFBox plugin manually in {{<Confluence-Installation>/confluence/WEB-INF/lib}} folder by replacing the original PDFBox plugin with a version [1.8.8|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/].

    Atlassian JIRA | 2 years ago | Lauretha Rura [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  3. 0

    h5. Summary [Bug - PDFBOX-2522|https://issues.apache.org/jira/browse/PDFBOX-2522] in the PDFBox plugin causes blockage during content reindexing. Confluence 5.7.x and 5.8.x which ships with PDFBox version 1.8.4 is affected by the bug. h5. Steps to Reproduce # Add the problematic .pdf attachment to your Confluence page. # Try to manually [rebuild the contents index|https://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-Rebuildingthesearchindex] of your Confluence instance. # Notice that during content reindexing the following warnings are thrown in the {{<Confluence-Home>/logs/atlassian-confluence.log}} file which then blocks the indexing process: {noformat} 2015-07-14 16:44:49,004 WARN [scheduler_Worker-9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: xxx.pdf v.1 (22544563)) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457) ... Caused by: java.io.IOException: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:115) at javax.crypto.CipherInputStream.read(CipherInputStream.java:233) at javax.crypto.CipherInputStream.read(CipherInputStream.java:209) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.encryptData(SecurityHandler.java:312) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptStream(SecurityHandler.java:413) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decrypt(SecurityHandler.java:386) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.decryptObject(SecurityHandler.java:361) at org.apache.pdfbox.pdmodel.encryption.SecurityHandler.proceedDecryption(SecurityHandler.java:192) at org.apache.pdfbox.pdmodel.encryption.StandardSecurityHandler.decryptDocument(StandardSecurityHandler.java:158) at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1600) at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:946) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:64) ... 42 more Caused by: javax.crypto.IllegalBlockSizeException: Input length must be multiple of 16 when decrypting with padded cipher at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:913) at com.sun.crypto.provider.CipherCore.doFinal(CipherCore.java:824) at com.sun.crypto.provider.AESCipher.engineDoFinal(AESCipher.java:436) at javax.crypto.Cipher.doFinal(Cipher.java:2048) at javax.crypto.CipherInputStream.getMoreData(CipherInputStream.java:112) {noformat} h5. Workaround # Disable the indexing of PDF attachments using [this guide|https://confluence.atlassian.com/x/gYCIAw] OR # Update the PDFBox plugin manually in {{<Confluence-Installation>/confluence/WEB-INF/lib}} folder by replacing the original PDFBox plugin with a version [1.8.8|http://archive.apache.org/dist/pdfbox/1.8.6/pdfbox-1.8.6.jar] or newer. Download the newer version [here|http://archive.apache.org/dist/pdfbox/].

    Atlassian JIRA | 2 years ago | Lauretha Rura [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    [CONF-18962] Some pdf files don't get correctly indexed - Atlassian JIRA

    atlassian.com | 1 year ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

    Root Cause Analysis

    1. com.atlassian.bonnie.search.extractor.ExtractorException

      Error getting content of PDF document

      at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText()
    2. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:96)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:41)
      2 frames
    3. com.atlassian.confluence
      DefaultConfluenceIndexManager$IndexTaskWriter.apply
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:40)
      2. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.extractWithLuceneExtractors(ConfluenceDocumentBuilder.java:155)
      3. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      4. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:49)
      5. com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:43)
      6. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:473)
      7. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$IndexTaskWriter.apply(DefaultConfluenceIndexManager.java:457)
      7 frames