com.atlassian.bonnie.search.extractor.ExtractorException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • h3. Symptoms I think that closed issue CONF-18733 is not really resolved. We have many entries in our log showing these kind of issues: {code} 2012-10-12 22:30:34,235 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125) at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 18 more {code} h4. Resolution The underlying issue was resolved in Confluence 3.5.11, so the warning message does not occur for csv files uploaded in Confluence 3.5.1.0 or above. However, any csv files that were uploaded in versions older that Confluence 3.5.11 will still have the incorrect Content-type, and when Confluence performs a re-index those warning messages still occur. *To fix the Content-type and resolve the warning message for older documents, you will need to run this command against your database:* {code} UPDATE PUBLIC.ATTACHMENTS SET CONTENTTYPE = 'text/csv' WHERE TITLE LIKE '%.csv' AND CONTENTTYPE LIKE '%excel%'; {code} This only needs to be performed once, to update older documents. Any documents uploaded from Confluence 3.5.11 or above will not suffer from this problem. For more information, please see [this KB article|https://confluence.atlassian.com/display/CONFKB/com.atlassian.bonnie.search.extractor.ExtractorException%3A+Error+reading+content+of+Excel+document].
    via by Michael Jositz,
  • h3. Symptoms I think that closed issue CONF-18733 is not really resolved. We have many entries in our log showing these kind of issues: {code} 2012-10-12 22:30:34,235 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125) at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 18 more {code} h4. Resolution The underlying issue was resolved in Confluence 3.5.11, so the warning message does not occur for csv files uploaded in Confluence 3.5.1.0 or above. However, any csv files that were uploaded in versions older that Confluence 3.5.11 will still have the incorrect Content-type, and when Confluence performs a re-index those warning messages still occur. *To fix the Content-type and resolve the warning message for older documents, you will need to run this command against your database:* {code} UPDATE PUBLIC.ATTACHMENTS SET CONTENTTYPE = 'text/csv' WHERE TITLE LIKE '%.csv' AND CONTENTTYPE LIKE '%excel%'; {code} This only needs to be performed once, to update older documents. Any documents uploaded from Confluence 3.5.11 or above will not suffer from this problem. For more information, please see [this KB article|https://confluence.atlassian.com/display/CONFKB/com.atlassian.bonnie.search.extractor.ExtractorException%3A+Error+reading+content+of+Excel+document].
    via by Michael Jositz,
  • h3. Description of The Bug When Confluence space key configured to end with a file format, the attached space logo would consider as the file format configured by lucene instead of picture. Example of spacekey affected * TDOC * TXLS * TPDF h3. How to reproduce the bug # Create a space with a spacekey that end with a file format for example: TDOC # Upload a space logo for the previously created space # Trigger the indexing to run # {{atlassian-confluence.log}} will throw the following stack trace depending by the configured space key #* *TDOC* {code:title=TDOC}2012-04-03 15:59:00,156 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TDOC v.1 (1540097) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 31 more Caused by: java.lang.IllegalStateException: java.nio.charset.UnsupportedCharsetException: UTF-7 at asposewobfuscated.mf.vb(Unknown Source) at asposewobfuscated.mf.uX(Unknown Source) at asposewobfuscated.mf.U(Unknown Source) at com.aspose.words.ha.iE(Unknown Source) at com.aspose.words.ha.g(Unknown Source) ... 37 more Caused by: java.nio.charset.UnsupportedCharsetException: UTF-7 at java.nio.charset.Charset.forName(Charset.java:505) ... 42 more{code} #* *TXLS* {code:title=TXLS}2012-04-03 16:10:49,099 WARN [http-8417-5] [confluence.velocity.introspection.AnnotationBoxingUberspect] lookupMethod Velocity template accessing deprecated method com.atlassian.confluence.util.GeneralUtil#format - /com/atlassian/confluence/plugins/schedule/admin/viewscheduledjobs-item.vm[line 23, column 47] -- referer: http://localhost:8417/confluence/admin/scheduledjobs/viewscheduledjobs.action | url: /confluence/admin/scheduledjobs/viewscheduledjobs.action | userName: admin | action: viewscheduledjobs 2012-04-03 16:10:49,136 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TXLS v.1 (1540099) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 30 more{code} #* *TPDF* {code:title=TPDF} 2012-04-03 16:54:02,891 WARN [scheduler_Worker-3] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TPDF v.1 (1540100) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Error: Header doesn't contain versioninfo at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:40) ... 30 more{code} h3. Notes Workaround is not possible as changing the space key is not a usable solution and changing the attachment name on database level would remove it as the configured space logo.
    via by Septa Cahyadiputra [Atlassian],
  • h3. Description of The Bug When Confluence space key configured to end with a file format, the attached space logo would consider as the file format configured by lucene instead of picture. Example of spacekey affected * TDOC * TXLS * TPDF h3. How to reproduce the bug # Create a space with a spacekey that end with a file format for example: TDOC # Upload a space logo for the previously created space # Trigger the indexing to run # {{atlassian-confluence.log}} will throw the following stack trace depending by the configured space key #* *TDOC* {code:title=TDOC}2012-04-03 15:59:00,156 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TDOC v.1 (1540097) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 31 more Caused by: java.lang.IllegalStateException: java.nio.charset.UnsupportedCharsetException: UTF-7 at asposewobfuscated.mf.vb(Unknown Source) at asposewobfuscated.mf.uX(Unknown Source) at asposewobfuscated.mf.U(Unknown Source) at com.aspose.words.ha.iE(Unknown Source) at com.aspose.words.ha.g(Unknown Source) ... 37 more Caused by: java.nio.charset.UnsupportedCharsetException: UTF-7 at java.nio.charset.Charset.forName(Charset.java:505) ... 42 more{code} #* *TXLS* {code:title=TXLS}2012-04-03 16:10:49,099 WARN [http-8417-5] [confluence.velocity.introspection.AnnotationBoxingUberspect] lookupMethod Velocity template accessing deprecated method com.atlassian.confluence.util.GeneralUtil#format - /com/atlassian/confluence/plugins/schedule/admin/viewscheduledjobs-item.vm[line 23, column 47] -- referer: http://localhost:8417/confluence/admin/scheduledjobs/viewscheduledjobs.action | url: /confluence/admin/scheduledjobs/viewscheduledjobs.action | userName: admin | action: viewscheduledjobs 2012-04-03 16:10:49,136 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TXLS v.1 (1540099) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 30 more{code} #* *TPDF* {code:title=TPDF} 2012-04-03 16:54:02,891 WARN [scheduler_Worker-3] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TPDF v.1 (1540100) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Error: Header doesn't contain versioninfo at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:40) ... 30 more{code} h3. Notes Workaround is not possible as changing the space key is not a usable solution and changing the attachment name on database level would remove it as the configured space logo.
    via by Septa Cahyadiputra [Atlassian],
  • Display Word attachments
    via by vignesh krishnan,
  • Reading binary chars from a CSV file
    via Stack Overflow by alepuzio
    ,
    • com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125) at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 18 more

    Users with the same issue

    Unknown visitor
    Unknown visitor1 times, last one,
    Unknown visitor
    Unknown visitor1 times, last one,