com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.

Atlassian JIRA | Don Willis [Atlassian] | 5 years ago
  1. 0

    One of our users uploaded a file with a .dot extension to Confluence. The file is not a word template. (In this case it was a http://en.wikipedia.org/wiki/DOT_language file). The extractor should really go to more effort to detect the type of a file before just assuming based on file extension and then logging stack traces like this one: {noformat} 2012-03-08 23:36:00,087 WARN [scheduler_Worker-5] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: orgtree.dot v.1 (1973452911) jp olley) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor1860.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy44.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 30 more {noformat}

    Atlassian JIRA | 5 years ago | Don Willis [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.
  2. 0

    One of our users uploaded a file with a .dot extension to Confluence. The file is not a word template. (In this case it was a http://en.wikipedia.org/wiki/DOT_language file). The extractor should really go to more effort to detect the type of a file before just assuming based on file extension and then logging stack traces like this one: {noformat} 2012-03-08 23:36:00,087 WARN [scheduler_Worker-5] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: orgtree.dot v.1 (1973452911) jp olley) com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40) at com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44) at com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331) at com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424) at com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197) at com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149) at sun.reflect.GeneratedMethodAccessor1860.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149) at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204) at $Proxy44.flushQueue(Unknown Source) at com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63) at com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46) at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86) at org.quartz.core.JobRunShell.run(JobRunShell.java:199) at com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20) at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 30 more {noformat}

    Atlassian JIRA | 5 years ago | Don Willis [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.
  3. 0

    The saving of a modified page opened over WebDAV / "Edit in Word" functionality is reported to be broken for NeoOffice 3.0.2 in Studio 2.5. Also not mentioned specifically, [this|http://confluence.atlassian.com/display/DOC/Office+Connector+Prerequisites] page doesn't mention that this version is incompatible. The exception occurs if the customer saves a modified page back to the server while reading in the document from the request InputStream in the underlying Aspose Words library. {code} @400000004d59686a16abeabc -- url: /wiki/plugins/servlet/confluence/editinword/7372872/content/*****.doc | userName: ***** @400000004d59686a16abeea4 org.apache.jackrabbit.webdav.DavException @400000004d59686a16abeea4 at com.benryan.servlet.webdav.PageAsDocResource.saveData(PageAsDocResource.java:186) @400000004d59686a16abfa5c at com.benryan.servlet.webdav.PageResource.addMember(PageResource.java:64) @400000004d59686a16abfe44 at org.apache.jackrabbit.webdav.server.AbstractWebdavServlet.doPut(AbstractWebdavServlet.java:503) @400000004d59686a16ac022c at org.apache.jackrabbit.webdav.server.AbstractWebdavServlet.execute(AbstractWebdavServlet.java:240) @400000004d59686a16ac0614 at com.atlassian.confluence.extra.webdav.servlet.ConfluenceWebdavServlet.service(ConfluenceWebdavServlet.java:104) ... @400000004d59686a16b0096c Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. @400000004d59686a16b00d54 at com.aspose.words.Document.a(Unknown Source) @400000004d59686a16b024c4 at com.aspose.words.Document.b(Unknown Source) @400000004d59686a16b028ac at com.aspose.words.Document.a(Unknown Source) @400000004d59686a16b028ac at com.aspose.words.Document.<init>(Unknown Source) @400000004d59686a16b02c94 at com.aspose.words.Document.<init>(Unknown Source) @400000004d59686a16b0307c at com.aspose.words.Document.<init>(Unknown Source) @400000004d59686a16b0307c at com.benryan.servlet.webdav.PageAsDocResource.saveData(PageAsDocResource.java:176) @400000004d59686a16b0401c ... 135 more @400000004d59686a16b0401c Caused by: java.lang.IllegalStateException: java.nio.charset.UnsupportedCharsetException: UTF-7 @400000004d59686a16b04404 at asposewobfuscated.mf.vb(Unknown Source) @400000004d59686a16b047ec at asposewobfuscated.mf.uX(Unknown Source) @400000004d59686a16b047ec at asposewobfuscated.mf.U(Unknown Source) @400000004d59686a16b04bd4 at com.aspose.words.ha.iE(Unknown Source) @400000004d59686a16b0578c at com.aspose.words.ha.g(Unknown Source) @400000004d59686a16b0578c ... 141 more @400000004d59686a16b05b74 Caused by: java.nio.charset.UnsupportedCharsetException: UTF-7 @400000004d59686a16b05f5c at java.nio.charset.Charset.forName(Charset.java:505) @400000004d59686a16b06344 ... 146 more {code} Here are the version changes between Studio 2.4 and 2.5 in the involved code: {code} Confluence 3.3.3 WebDAV Plugin 2.4 (Jackrabbit 1.4) Office Connector Plugin 1.13 (Aspose Words 3.2.1) Confluence 3.4.7 WebDAV Plugin 2.5 (Jackrabbit 1.4) Office Connector Plugin 1.15 (Aspose Words 3.2.1) {code} [This|http://www.aspose.com/community/forums/thread/165090.aspx] post makes me suspicious that the used UTF-7 encoding might just be a fallback / hiding the original cause. I'm on Linux, thus couldn't verify / reproduce the bug. I asked the customer to edit a page I've verified to be modifiable over OpenOffice 3.2, but he just switched versions to OpenOffice 3.3 and reported it to be working / closed the issue.

    Atlassian JIRA | 6 years ago | Fabian Krämer
    com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. @400000004d59686a16b00d54 at com.aspose.words.Document.a(Unknown Source) @400000004d59686a16b024c4 at com.aspose.words.Document.b(Unknown Source)
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. com.aspose.words.FileCorruptedException

      The document appears to be corrupted and cannot be loaded.

      at com.aspose.words.Document.a()
    2. com.aspose.words
      Document.<init>
      1. com.aspose.words.Document.a(Unknown Source)
      2. com.aspose.words.Document.b(Unknown Source)
      3. com.aspose.words.Document.a(Unknown Source)
      4. com.aspose.words.Document.<init>(Unknown Source)
      5. com.aspose.words.Document.<init>(Unknown Source)
      6. com.aspose.words.Document.<init>(Unknown Source)
      6 frames
    3. com.atlassian.confluence
      WordTextExtractor.extractText
      1. com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37)
      1 frame
    4. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
      1 frame
    5. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:36)
      1 frame
    6. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    7. com.atlassian.confluence
      BulkWriteIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:97)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
      3. com.atlassian.confluence.search.lucene.tasks.UpdateDocumentIndexTask.perform(UpdateDocumentIndexTask.java:40)
      4. com.atlassian.confluence.search.lucene.tasks.BulkWriteIndexTask.perform(BulkWriteIndexTask.java:44)
      4 frames
    8. com.atlassian.bonnie
      LuceneConnection.withWriter
      1. com.atlassian.bonnie.LuceneConnection.withWriter(LuceneConnection.java:331)
      1 frame
    9. com.atlassian.confluence
      DefaultConfluenceIndexManager$BatchUpdateAction.perform
      1. com.atlassian.confluence.search.lucene.tasks.LuceneConnectionBackedIndexTaskPerformer.perform(LuceneConnectionBackedIndexTaskPerformer.java:20)
      2. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager$BatchUpdateAction.perform(DefaultConfluenceIndexManager.java:424)
      2 frames
    10. com.atlassian.bonnie
      LuceneConnection.withBatchUpdate
      1. com.atlassian.bonnie.LuceneConnection.withBatchUpdate(LuceneConnection.java:405)
      1 frame
    11. com.atlassian.confluence
      DefaultConfluenceIndexManager.flushQueue
      1. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.processTasks(DefaultConfluenceIndexManager.java:197)
      2. com.atlassian.confluence.search.lucene.DefaultConfluenceIndexManager.flushQueue(DefaultConfluenceIndexManager.java:149)
      2 frames
    12. Java RT
      Method.invoke
      1. sun.reflect.GeneratedMethodAccessor1860.invoke(Unknown Source)
      2. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
      3. java.lang.reflect.Method.invoke(Method.java:597)
      3 frames
    13. Spring AOP
      ReflectiveMethodInvocation.proceed
      1. org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:307)
      2. org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:182)
      3. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:149)
      3 frames
    14. Spring Tx
      TransactionInterceptor.invoke
      1. org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:106)
      1 frame
    15. Spring AOP
      JdkDynamicAopProxy.invoke
      1. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:171)
      2. org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
      2 frames
    16. Unknown
      $Proxy44.flushQueue
      1. $Proxy44.flushQueue(Unknown Source)
      1 frame
    17. com.atlassian.confluence
      AbstractClusterAwareQuartzJobBean.executeInternal
      1. com.atlassian.confluence.search.lucene.IndexQueueFlusher.executeJob(IndexQueueFlusher.java:30)
      2. com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.surroundJobExecutionWithLogging(AbstractClusterAwareQuartzJobBean.java:63)
      3. com.atlassian.confluence.setup.quartz.AbstractClusterAwareQuartzJobBean.executeInternal(AbstractClusterAwareQuartzJobBean.java:46)
      3 frames
    18. Spring Context Support
      QuartzJobBean.execute
      1. org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
      1 frame
    19. quartz
      JobRunShell.run
      1. org.quartz.core.JobRunShell.run(JobRunShell.java:199)
      1 frame
    20. com.atlassian.confluence
      ConfluenceQuartzThreadPool$1.run
      1. com.atlassian.confluence.schedule.quartz.ConfluenceQuartzThreadPool$1.run(ConfluenceQuartzThreadPool.java:20)
      1 frame
    21. quartz
      SimpleThreadPool$WorkerThread.run
      1. org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
      1 frame