com.atlassian.bonnie.search.extractor.ExtractorException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more
    via by Andrew Moise,
  • My site's content index is only partially built, resulting in missing pages in search results. I see http://jira.atlassian.com/browse/CONF-18452 has been filed to fix the failure to completely index when there's a problem with a particular page, but I also wanted to file bugs about the underlying issues. This issue is a problem indexing a particular .pdf document: 2010-02-22 11:10:43,006 WARN [Indexer: 9] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Wii_Progr amming_Guidelines.pdf v.1 (5341238) jlokey) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more
    via by Andrew Moise,
  • Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}
    via by Tim Wong [Atlassian],
  • Sample pdf attached to ticket. {code} 2010-05-27 11:13:00,410 WARN [DefaultQuartzScheduler_Worker-1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: Project Milestones as of 6-25.pdf v.1 (786433) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:344) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:143) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:304) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy36.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:29) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:202) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:525) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:50) ... 30 more (END) {code}
    via by Tim Wong [Atlassian],
  • I've installed terrier 3.5 on windows xp and started desktop_terrier. After that, I choose a directory to index and started indexing. After about 50 documents terrier throws an execption, because it was not able to index a special pdf-dcument (some other pdfs worked). Is there any chance to tell terrier to skip such exceptions and to go on with indexing ? here is the execption/log: Set TERRIER_HOME to be D:\Java\terrier WARNING: The file terrier.properties was not found at location D:\Java\terrier\etc\terrier.properties Assuming the value of terrier.home from the corresponding system property. INFO - Deleting: D:\Java\terrier\var\index\data_1.direct.bf: true INFO - Deleting: D:\Java\terrier\var\index\data_1.document.fsarrayfile: true INFO - Deleting: D:\Java\terrier\var\index\data_1.meta.idx: true INFO - Deleting: D:\Java\terrier\var\index\data_1.meta.zdata: true INFO - creating the data structures data_1 INFO - BlockIndexer creating direct index INFO - NEXT: D:\Virtual Machines\host\Privat\_dokumente ..... java.lang.NullPointerException at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:254) at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:773) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:139) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:211) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:185) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:161) at org.terrier.indexing.PDFDocument.getReader(PDFDocument.java:111) at org.terrier.indexing.FileDocument.<init>(FileDocument.java:130) at org.terrier.indexing.PDFDocument.<init>(PDFDocument.java:68) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.terrier.indexing.SimpleFileCollection.makeDocument(SimpleFileCollection.java:342) at org.terrier.indexing.SimpleFileCollection.getDocument(SimpleFileCollection.java:303) at org.terrier.indexing.BlockIndexer.createDirectIndex(BlockIndexer.java:357) at org.terrier.indexing.Indexer.index(Indexer.java:346) at org.terrier.applications.desktop.DesktopTerrier.runIndex(DesktopTerrier.java:1129) at org.terrier.applications.desktop.DesktopTerrier.access$1100(DesktopTerrier.java:114) at org.terrier.applications.desktop.DesktopTerrier$8$1.run(DesktopTerrier.java:498) ERROR - An unexpected exception occured while indexing. Indexing has been aborted. java.lang.NullPointerException at org.terrier.indexing.tokenisation.EnglishTokeniser$EnglishTokenStream.next(EnglishTokeniser.java:97) at org.terrier.indexing.tokenisation.EnglishTokeniser$EnglishTokenStream.next(EnglishTokeniser.java:76) at org.terrier.indexing.FileDocument.getNextTerm(FileDocument.java:221) at org.terrier.indexing.BlockIndexer.createDirectIndex(BlockIndexer.java:371) at org.terrier.indexing.Indexer.index(Indexer.java:346) at org.terrier.applications.desktop.DesktopTerrier.runIndex(DesktopTerrier.java:1129) at org.terrier.applications.desktop.DesktopTerrier.access$1100(DesktopTerrier.java:114) at org.terrier.applications.desktop.DesktopTerrier$8$1.run(DesktopTerrier.java:498)
    via by Ulrich Kaemmerer,
  • We have found that update the pdfbox library to the last stable version (1.2.1) solve all our current issues with pdf text extraction and improve performance. This could help people that want rely on the DSpace "out-of-box" pdf extractor without using XPDF. Below some samples of exception that go away updating the pdfbox version. Patch attached against trunk r5439 == java.io.IOException: Error: Could not find font(COSName{F1.0}) in map={} at org.pdfbox.util.operator.SetTextFont.process(SetTextFont.java:83) at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) === java.lang.ClassCastException: org.pdfbox.cos.COSArray cannot be cast to org.pdfbox.cos.COSDictionary at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:70) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:243) at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170) at org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101) at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) ==== java.io.IOException: Unknown colorspace array type:COSName{DeviceRGB} at org.pdfbox.pdmodel.graphics.color.PDColorSpaceFactory.createColorSpace(PDColorSpaceFactory.java:116) at org.pdfbox.pdmodel.PDResources.getColorSpaces(PDResources.java:264) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:193) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) === java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) === java.util.zip.ZipException: unknown compression method at java.util.zip.InflaterInputStream.read(Unknown Source) at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235) at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170) at org.pdfbox.pdfparser.PDFObjectStreamParser.<init>(PDFObjectStreamParser.java:66) at org.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:450) at org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:908) at org.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:489) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:204) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) === java.lang.ArrayIndexOutOfBoundsException at java.lang.System.arraycopy(Native Method) at java.io.PushbackInputStream.unread(Unknown Source) at org.pdfbox.pdfparser.BaseParser.parseCOSString(BaseParser.java:524) at org.pdfbox.pdfparser.BaseParser.parseDirObject(BaseParser.java:873) at org.pdfbox.pdfparser.PDFObjectStreamParser.parse(PDFObjectStreamParser.java:94) at org.pdfbox.cos.COSDocument.dereferenceObjectStreams(COSDocument.java:451) at org.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:908) at org.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:489) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:204) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139) === java.io.EOFException: Unexpected end of ZLIB input stream at java.util.zip.InflaterInputStream.fill(Unknown Source) at java.util.zip.InflaterInputStream.read(Unknown Source) at org.pdfbox.filter.FlateFilter.decode(FlateFilter.java:97) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:290) at org.pdfbox.cos.COSStream.doDecode(COSStream.java:235) at org.pdfbox.cos.COSStream.getUnfilteredStream(COSStream.java:170) at org.pdfbox.pdfparser.PDFStreamParser.<init>(PDFStreamParser.java:101) at org.pdfbox.cos.COSStream.getStreamTokens(COSStream.java:132) at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:202) at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174) at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336) at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216) at org.dspace.app.mediafilter.PDFFilter.getDestinationStream(PDFFilter.java:139)
    via by Andrea Bollini,
    • com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:65) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.lang.NullPointerException at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194) at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182) at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:162) at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:220) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:49) ... 16 more
    No Bugmate found.