com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

Atlassian JIRA | Tim Wong [Atlassian] | 7 years ago
  1. 0

    Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}

    Atlassian JIRA | 7 years ago | Tim Wong [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  2. 0

    Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}

    Atlassian JIRA | 7 years ago | Tim Wong [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  3. 0

    [CONF-18962] Some pdf files don't get correctly indexed - Atlassian JIRA

    atlassian.com | 8 months ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  6. 0

    My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.NullPointerException

      No message provided

      at org.pdfbox.pdmodel.PDPageNode.getAllKids()
    2. PDFBox - Java PDF Library
      PDFTextStripper.writeText
      1. org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
      2. org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
      3. org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162)
      4. org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
      4 frames
    3. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
      2 frames
    4. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45)
      1 frame
    5. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    6. com.atlassian.confluence
      BulkWriteIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
      3. com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40)
      4. com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44)
      4 frames
    7. com.atlassian.bonnie
      LuceneConnection.withWriter
      1. com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331)
      1 frame
    8. com.atlassian.confluence
      DefaultConfluenceIndexManager$BatchUpdateAction.perform
      1. com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20)
      2. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344)
      2 frames
    9. com.atlassian.bonnie
      LuceneConnection.withBatchUpdate
      1. com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405)
      1 frame
    10. com.atlassian.confluence
      DefaultConfluenceIndexManager.flushQueue
      1. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143)
      2. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109)
      2 frames
    11. Java RT
      Method.invoke
      1. sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      2. sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
      3. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      4. java.lang.reflect.Method.invoke(Method.java:597)
      4 frames
    12. Spring AOP
      ReflectiveMethodInvocation.proceed
      1. org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304)
      2. org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
      3. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
      3 frames
    13. Spring Tx
      TransactionInterceptor.invoke
      1. org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
      1 frame
    14. Spring AOP
      JdkDynamicAopProxy.invoke
      1. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
      2. org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
      2 frames
    15. Unknown
      $Proxy36.flushQueue
      1. $Proxy36.flushQueue(Unknown Source)
      1 frame
    16. com.atlassian.confluence
      AbstractClusterAwareQuartzJobBean.executeInternal
      1. com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29)
      2. com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63)
      3. com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
      3 frames
    17. Spring Context Support
      QuartzJobBean.execute
      1. org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      1 frame
    18. quartz
      SimpleThreadPool$WorkerThread.run
      1. org.quartz.core.JobRunShell.run(JobRunShell.java:202)
      2. org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525)
      2 frames