com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0

Atlassian JIRA | Michael Jositz | 4 years ago
  1. 0

    h3. Symptoms I think that closed issue CONF-18733 is not really resolved. We have many entries in our log showing these kind of issues: {code} 2012-10-12 22:30:34,235 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125) at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 18 more {code} h4. Resolution The underlying issue was resolved in Confluence 3.5.11, so the warning message does not occur for csv files uploaded in Confluence 3.5.1.0 or above. However, any csv files that were uploaded in versions older that Confluence 3.5.11 will still have the incorrect Content-type, and when Confluence performs a re-index those warning messages still occur. *To fix the Content-type and resolve the warning message for older documents, you will need to run this command against your database:* {code} UPDATE PUBLIC.ATTACHMENTS SET CONTENTTYPE = 'text/csv' WHERE TITLE LIKE '%.csv' AND CONTENTTYPE LIKE '%excel%'; {code} This only needs to be performed once, to update older documents. Any documents uploaded from Confluence 3.5.11 or above will not suffer from this problem. For more information, please see [this KB article|https://confluence.atlassian.com/display/CONFKB/com.atlassian.bonnie.search.extractor.ExtractorException%3A+Error+reading+content+of+Excel+document].

    Atlassian JIRA | 4 years ago | Michael Jositz
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
  2. 0

    h3. Symptoms I think that closed issue CONF-18733 is not really resolved. We have many entries in our log showing these kind of issues: {code} 2012-10-12 22:30:34,235 WARN [Indexer: 1] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: meta_mailinfo_sec01.csv v.1 (59509056) g6922) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125) at com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128) at com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 18 more {code} h4. Resolution The underlying issue was resolved in Confluence 3.5.11, so the warning message does not occur for csv files uploaded in Confluence 3.5.1.0 or above. However, any csv files that were uploaded in versions older that Confluence 3.5.11 will still have the incorrect Content-type, and when Confluence performs a re-index those warning messages still occur. *To fix the Content-type and resolve the warning message for older documents, you will need to run this command against your database:* {code} UPDATE PUBLIC.ATTACHMENTS SET CONTENTTYPE = 'text/csv' WHERE TITLE LIKE '%.csv' AND CONTENTTYPE LIKE '%excel%'; {code} This only needs to be performed once, to update older documents. Any documents uploaded from Confluence 3.5.11 or above will not suffer from this problem. For more information, please see [this KB article|https://confluence.atlassian.com/display/CONFKB/com.atlassian.bonnie.search.extractor.ExtractorException%3A+Error+reading+content+of+Excel+document].

    Atlassian JIRA | 4 years ago | Michael Jositz
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0
  3. 0

    h3. Description of The Bug When Confluence space key configured to end with a file format, the attached space logo would consider as the file format configured by lucene instead of picture. Example of spacekey affected * TDOC * TXLS * TPDF h3. How to reproduce the bug # Create a space with a spacekey that end with a file format for example: TDOC # Upload a space logo for the previously created space # Trigger the indexing to run # {{atlassian-confluence.log}} will throw the following stack trace depending by the configured space key #* *TDOC* {code:title=TDOC}2012-04-03 15:59:00,156 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TDOC v.1 (1540097) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 31 more Caused by: java.lang.IllegalStateException: java.nio.charset.UnsupportedCharsetException: UTF-7 at asposewobfuscated.mf.vb(Unknown Source) at asposewobfuscated.mf.uX(Unknown Source) at asposewobfuscated.mf.U(Unknown Source) at com.aspose.words.ha.iE(Unknown Source) at com.aspose.words.ha.g(Unknown Source) ... 37 more Caused by: java.nio.charset.UnsupportedCharsetException: UTF-7 at java.nio.charset.Charset.forName(Charset.java:505) ... 42 more{code} #* *TXLS* {code:title=TXLS}2012-04-03 16:10:49,099 WARN [http-8417-5] [confluence.velocity.introspection.AnnotationBoxingUberspect] lookupMethod Velocity template accessing deprecated method com.atlassian.confluence.util.GeneralUtil#format - /com/atlassian/confluence/plugins/schedule/admin/viewscheduledjobs-item.vm[line 23, column 47] -- referer: http://localhost:8417/confluence/admin/scheduledjobs/viewscheduledjobs.action | url: /confluence/admin/scheduledjobs/viewscheduledjobs.action | userName: admin | action: viewscheduledjobs 2012-04-03 16:10:49,136 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TXLS v.1 (1540099) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 30 more{code} #* *TPDF* {code:title=TPDF} 2012-04-03 16:54:02,891 WARN [scheduler_Worker-3] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TPDF v.1 (1540100) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Error: Header doesn't contain versioninfo at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:40) ... 30 more{code} h3. Notes Workaround is not possible as changing the space key is not a usable solution and changing the attachment name on database level would remove it as the configured space logo.

    Atlassian JIRA | 5 years ago | Septa Cahyadiputra [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    h3. Description of The Bug When Confluence space key configured to end with a file format, the attached space logo would consider as the file format configured by lucene instead of picture. Example of spacekey affected * TDOC * TXLS * TPDF h3. How to reproduce the bug # Create a space with a spacekey that end with a file format for example: TDOC # Upload a space logo for the previously created space # Trigger the indexing to run # {{atlassian-confluence.log}} will throw the following stack trace depending by the configured space key #* *TDOC* {code:title=TDOC}2012-04-03 15:59:00,156 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TDOC v.1 (1540097) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 31 more Caused by: java.lang.IllegalStateException: java.nio.charset.UnsupportedCharsetException: UTF-7 at asposewobfuscated.mf.vb(Unknown Source) at asposewobfuscated.mf.uX(Unknown Source) at asposewobfuscated.mf.U(Unknown Source) at com.aspose.words.ha.iE(Unknown Source) at com.aspose.words.ha.g(Unknown Source) ... 37 more Caused by: java.nio.charset.UnsupportedCharsetException: UTF-7 at java.nio.charset.Charset.forName(Charset.java:505) ... 42 more{code} #* *TXLS* {code:title=TXLS}2012-04-03 16:10:49,099 WARN [http-8417-5] [confluence.velocity.introspection.AnnotationBoxingUberspect] lookupMethod Velocity template accessing deprecated method com.atlassian.confluence.util.GeneralUtil#format - /com/atlassian/confluence/plugins/schedule/admin/viewscheduledjobs-item.vm[line 23, column 47] -- referer: http://localhost:8417/confluence/admin/scheduledjobs/viewscheduledjobs.action | url: /confluence/admin/scheduledjobs/viewscheduledjobs.action | userName: admin | action: viewscheduledjobs 2012-04-03 16:10:49,136 WARN [scheduler_Worker-2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TXLS v.1 (1540099) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:103) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:49) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151) at com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89) ... 30 more{code} #* *TPDF* {code:title=TPDF} 2012-04-03 16:54:02,891 WARN [scheduler_Worker-3] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: TPDF v.1 (1540100) admin) com.atlassian.bonnie.search.extractor.ExtractorException: Error getting content of PDF document at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:66) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor604.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy42.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: java.io.IOException: Error: Header doesn't contain versioninfo at org.apache.pdfbox.pdfparser.PDFParser.parseHeader(PDFParser.java:312) at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:169) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1069) at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1036) at com.atlassian.bonnie.search.extractor.PdfContentExtractor.extractText(PdfContentExtractor.java:40) ... 30 more{code} h3. Notes Workaround is not possible as changing the space key is not a usable solution and changing the attachment name on database level would remove it as the configured space logo.

    Atlassian JIRA | 5 years ago | Septa Cahyadiputra [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 0x0A1A0A0D474E5089, expected 0xE11AB1A1E011CFD0
  6. 0

    Getting the Exception error while converting Docx file to XML using Apache POi

    Stack Overflow | 5 years ago | Abhishek
    java.io.IOException: Invalid header signature; read 0x4353414E2023233C, expected 0xE11AB1A1E011CFD0

    2 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      Invalid header signature; read 0x6D453B74726F6853, expected 0xE11AB1A1E011CFD0

      at org.apache.poi.poifs.storage.HeaderBlockReader.<init>()
    2. POI
      POIFSFileSystem.<init>
      1. org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:120)
      2. org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:151)
      2 frames
    3. com.atlassian.confluence
      ExcelTextExtractor.extractText
      1. com.atlassian.confluence.extra.officeconnector.index.excel.ExcelTextExtractor.extractText(ExcelTextExtractor.java:89)
      1 frame
    4. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
      1 frame
    5. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
      1 frame
    6. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    7. com.atlassian.confluence
      AddDocumentIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
      2 frames
    8. com.atlassian.bonnie
      TempIndexWriter.perform
      1. com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
      1 frame
    9. com.atlassian.confluence
      ReindexWorkBatch$1.doInTransaction
      1. com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
      2. com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
      3. com.atlassian.confluence.search.lucene.ReindexWorkBatch.indexCollection(ReindexWorkBatch.java:125)
      4. com.atlassian.confluence.search.lucene.ReindexWorkBatch$1.doInTransaction(ReindexWorkBatch.java:86)
      4 frames
    10. Spring Tx
      TransactionTemplate.execute
      1. org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:128)
      1 frame
    11. com.atlassian.confluence
      ReindexWorkBatch.run
      1. com.atlassian.confluence.search.lucene.ReindexWorkBatch.run(ReindexWorkBatch.java:56)
      1 frame
    12. Java RT
      Thread.run
      1. java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
      2. java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
      3. java.util.concurrent.FutureTask.run(FutureTask.java:138)
      4. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      5. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      6. java.lang.Thread.run(Thread.java:662)
      6 frames