com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.

Atlassian JIRA | Septa Cahyadiputra [Atlassian] | 6 years ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    h3. Summary of the Bug Indexer is not able to index/extract RTF documents which is generated by [{{"ГАРАНТ"}}|http://english.garant.ru/] (Russian government legal documents base). The following stack trace is recorded on logs {noformat} 2011-05-20 22:29:28,850 WARN [Indexer: 2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: 110-п_от_15_05_2009_Постановление_Правительства_Ханты-Мансийского_АО_-_Югры.rtf v.1 (1179649) adminconf) -- referer: http://localhost:8354/admin/search-indexes.action | url: /admin/reindex.action | userName: adminconf | action: reindex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:78) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:62) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 16 more Caused by: java.lang.NullPointerException: style at asposewobfuscated.am.c(Unknown Source) at com.aspose.words.aav.a(Unknown Source) at com.aspose.words.wp.a(Unknown Source) at com.aspose.words.wp.d(Unknown Source) at com.aspose.words.fq.gg(Unknown Source) at com.aspose.words.fq.d(Unknown Source) at com.aspose.words.fq.read(Unknown Source) ... 22 more {noformat} h3. Steps to Reproduce # Download the [attached|^110-п_от_15_05_2009_Постановление_Правительства_Ханты-Мансийского_АО_-_Югры (1).rtf] file # Attach into Confluence # Wait for a minute (indexer run every minute) # Check {{atlassian-confluence.log}} h4. Steps to create the bad RTF document # Go to [http://english.garant.ru/] # Open demo version # Open any full text available document. # Press "Export to word button" h3. Workaround # Open the problematic document on Microsoft Office # Re-save the problematic document on Microsoft Office # Re-attached

    Atlassian JIRA | 6 years ago | Septa Cahyadiputra [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.
  2. 0

    h3. Summary of the Bug Indexer is not able to index/extract RTF documents which is generated by [{{"ГАРАНТ"}}|http://english.garant.ru/] (Russian government legal documents base). The following stack trace is recorded on logs {noformat} 2011-05-20 22:29:28,850 WARN [Indexer: 2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: 110-п_от_15_05_2009_Постановление_Правительства_Ханты-Мансийского_АО_-_Югры.rtf v.1 (1179649) adminconf) -- referer: http://localhost:8354/admin/search-indexes.action | url: /admin/reindex.action | userName: adminconf | action: reindex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded. at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:41) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:78) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:62) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded. at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.b(Unknown Source) at com.aspose.words.Document.a(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.aspose.words.Document.<init>(Unknown Source) at com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37) ... 16 more Caused by: java.lang.NullPointerException: style at asposewobfuscated.am.c(Unknown Source) at com.aspose.words.aav.a(Unknown Source) at com.aspose.words.wp.a(Unknown Source) at com.aspose.words.wp.d(Unknown Source) at com.aspose.words.fq.gg(Unknown Source) at com.aspose.words.fq.d(Unknown Source) at com.aspose.words.fq.read(Unknown Source) ... 22 more {noformat} h3. Steps to Reproduce # Download the [attached|^110-п_от_15_05_2009_Постановление_Правительства_Ханты-Мансийского_АО_-_Югры (1).rtf] file # Attach into Confluence # Wait for a minute (indexer run every minute) # Check {{atlassian-confluence.log}} h4. Steps to create the bad RTF document # Go to [http://english.garant.ru/] # Open demo version # Open any full text available document. # Press "Export to word button" h3. Workaround # Open the problematic document on Microsoft Office # Re-save the problematic document on Microsoft Office # Re-attached

    Atlassian JIRA | 6 years ago | Septa Cahyadiputra [Atlassian]
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Word document: The document appears to be corrupted and cannot be loaded.

    Root Cause Analysis

    1. java.lang.NullPointerException

      style

      at asposewobfuscated.am.c()
    2. asposewobfuscated
      am.c
      1. asposewobfuscated.am.c(Unknown Source)
      1 frame
    3. com.aspose.words
      Document.<init>
      1. com.aspose.words.aav.a(Unknown Source)
      2. com.aspose.words.wp.a(Unknown Source)
      3. com.aspose.words.wp.d(Unknown Source)
      4. com.aspose.words.fq.gg(Unknown Source)
      5. com.aspose.words.fq.d(Unknown Source)
      6. com.aspose.words.fq.read(Unknown Source)
      7. com.aspose.words.Document.b(Unknown Source)
      8. com.aspose.words.Document.a(Unknown Source)
      9. com.aspose.words.Document.<init>(Unknown Source)
      10. com.aspose.words.Document.<init>(Unknown Source)
      11. com.aspose.words.Document.<init>(Unknown Source)
      11 frames
    4. com.atlassian.confluence
      WordTextExtractor.extractText
      1. com.atlassian.confluence.extra.officeconnector.index.word.WordTextExtractor.extractText(WordTextExtractor.java:37)
      1 frame
    5. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:40)
      1 frame
    6. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:45)
      1 frame
    7. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    8. com.atlassian.confluence
      AddDocumentIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:43)
      2 frames
    9. com.atlassian.bonnie
      TempIndexWriter.perform
      1. com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:73)
      1 frame
    10. com.atlassian.confluence
      DefaultObjectQueueWorker$1.doInTransactionWithoutResult
      1. com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
      2. com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
      3. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:78)
      4. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:62)
      4 frames
    11. Spring Tx
      TransactionTemplate.execute
      1. org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33)
      2. org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127)
      2 frames
    12. com.atlassian.confluence
      DefaultObjectQueueWorker.run
      1. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:51)
      1 frame
    13. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
      3. java.lang.Thread.run(Thread.java:662)
      3 frames