com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

Atlassian JIRA | Andrew Moise | 7 years ago
  1. 0

    My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  2. 0

    My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  3. 0

    [CONF-18962] Some pdf files don't get correctly indexed - Atlassian JIRA

    atlassian.com | 8 months ago
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}

    Atlassian JIRA | 7 years ago | Tim Wong [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document
  6. 0

    Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}

    Atlassian JIRA | 7 years ago | Tim Wong [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.NullPointerException

      No message provided

      at org.pdfbox.pdmodel.PDPageNode.getAllKids()
    2. PDFBox - Java PDF Library
      PDFTextStripper.writeText
      1. org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
      2. org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
      3. org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162)
      4. org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220)
      4 frames
    3. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39)
      2 frames
    4. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43)
      1 frame
    5. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    6. com.atlassian.confluence
      AddDocumentIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41)
      2 frames
    7. com.atlassian.bonnie
      TempIndexWriter.perform
      1. com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72)
      1 frame
    8. com.atlassian.confluence
      DefaultObjectQueueWorker$1.doInTransactionWithoutResult
      1. com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
      2. com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
      3. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73)
      4. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61)
      4 frames
    9. Spring Tx
      TransactionTemplate.execute
      1. org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33)
      2. org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127)
      2 frames
    10. com.atlassian.confluence
      DefaultObjectQueueWorker.run
      1. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50)
      1 frame
    11. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      3. java.lang.Thread.run(Thread.java:595)
      3 frames