com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 8236850760414359372, expect ed -2226271756974174256

Atlassian JIRA | Andrew Moise | 7 years ago
  1. 0

    This problem occurs due to the browser sending the wrong MIME type during a file upload. It appears that Windows boxes where MS Excel handles CSV files uploads CSV files with the "application/vnd.ms-excel" MIME type. It can cause the search index to be only partially built, resulting in missing pages in search results. Sample logs: {noformat} 2010-02-22 11:09:56,038 WARN [Indexer: 2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: textures- streaming.csv v.1 (3014859) kteich) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 8236850760414359372, expect ed -2226271756974174256 at com.atlassian.bonnie.search.extractor.MsExcelContentExtractor.extractText(MsExcelContentExtractor.java:101) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.io.IOException: Invalid header signature; read 8236850760414359372, expected -2226271756974174256 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:103) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:90) at com.atlassian.bonnie.search.extractor.MsExcelContentExtractor.extractText(MsExcelContentExtractor.java:87) ... 16 more {noformat} h3. Workaround Stop Confluence, edit the {{confluence/WEB-INF/classes/mime.types}} file and add the following entry: {code} text/csv csv {code} This ensures that all files with the CSV extension are mapped to the text/csv MIME type regardless of what the browser sends. Next, run the following query against the database and then start Confluence: {code:sql} update attachments set contenttype='text/csv' where lower(title) like '%.csv'; {code} To make the content in the CSV files searchable you will also need to [run a reindex|http://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-RebuildingtheContentIndexes].

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 8236850760414359372, expect ed -2226271756974174256
  2. 0

    This problem occurs due to the browser sending the wrong MIME type during a file upload. It appears that Windows boxes where MS Excel handles CSV files uploads CSV files with the "application/vnd.ms-excel" MIME type. It can cause the search index to be only partially built, resulting in missing pages in search results. Sample logs: {noformat} 2010-02-22 11:09:56,038 WARN [Indexer: 2] [bonnie.search.extractor.BaseAttachmentContentExtractor] addFields Error indexing attachment (Attachment: textures- streaming.csv v.1 (3014859) kteich) -- url: /confluence/admin/reindex.action | userName: moise | referer: https://qix.demiurgestudios.com/confluence/admin/search-indexes.action | action: reind ex com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 8236850760414359372, expect ed -2226271756974174256 at com.atlassian.bonnie.search.extractor.MsExcelContentExtractor.extractText(MsExcelContentExtractor.java:101) at com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39) at com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43) at com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104) at com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102) at com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41) at com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72) at com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43) at com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61) at org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127) at com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) at java.lang.Thread.run(Thread.java:595) Caused by: java.io.IOException: Invalid header signature; read 8236850760414359372, expected -2226271756974174256 at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:103) at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:90) at com.atlassian.bonnie.search.extractor.MsExcelContentExtractor.extractText(MsExcelContentExtractor.java:87) ... 16 more {noformat} h3. Workaround Stop Confluence, edit the {{confluence/WEB-INF/classes/mime.types}} file and add the following entry: {code} text/csv csv {code} This ensures that all files with the CSV extension are mapped to the text/csv MIME type regardless of what the browser sends. Next, run the following query against the database and then start Confluence: {code:sql} update attachments set contenttype='text/csv' where lower(title) like '%.csv'; {code} To make the content in the CSV files searchable you will also need to [run a reindex|http://confluence.atlassian.com/display/DOC/Content+Index+Administration#ContentIndexAdministration-RebuildingtheContentIndexes].

    Atlassian JIRA | 7 years ago | Andrew Moise
    com.atlassian.bonnie.search.extractor.ExtractorException: Error reading content of Excel document: Invalid header signature; read 8236850760414359372, expect ed -2226271756974174256
  3. 0

    Getting the Exception error while converting Docx file to XML using Apache POi

    Stack Overflow | 5 years ago | Abhishek
    java.io.IOException: Invalid header signature; read 0x4353414E2023233C, expected 0xE11AB1A1E011CFD0
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Reading Microsoft Word Document in JAVA - Techie Zone

    hiteshagrawal.com | 7 months ago
    java.io.IOException: Unable to read entire header; 6 bytes read; expected 512 bytes
  6. 0

    Reading binary chars from a CSV file

    Stack Overflow | 5 years ago | alepuzio
    java.io.IOException: Invalid header signature; read 0x003000310030FEFF, expected 0xE11AB1A1E011CFD0

    2 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.io.IOException

      Invalid header signature; read 8236850760414359372, expected -2226271756974174256

      at org.apache.poi.poifs.storage.HeaderBlockReader.<init>()
    2. POI
      POIFSFileSystem.<init>
      1. org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:103)
      2. org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:90)
      2 frames
    3. com.atlassian.bonnie
      BaseAttachmentContentExtractor.addFields
      1. com.atlassian.bonnie.search.extractor.MsExcelContentExtractor.extractText(MsExcelContentExtractor.java:87)
      2. com.atlassian.bonnie.search.extractor.BaseAttachmentContentExtractor.addFields(BaseAttachmentContentExtractor.java:39)
      2 frames
    4. com.atlassian.confluence
      ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields
      1. com.atlassian.confluence.plugin.descriptor.ExtractorModuleDescriptor$BackwardsCompatibleExtractor.addFields(ExtractorModuleDescriptor.java:43)
      1 frame
    5. com.atlassian.bonnie
      BaseDocumentBuilder.getDocument
      1. com.atlassian.bonnie.search.BaseDocumentBuilder.getDocument(BaseDocumentBuilder.java:104)
      1 frame
    6. com.atlassian.confluence
      AddDocumentIndexTask.perform
      1. com.atlassian.confluence.search.lucene.ConfluenceDocumentBuilder.getDocument(ConfluenceDocumentBuilder.java:102)
      2. com.atlassian.confluence.search.lucene.tasks.AddDocumentIndexTask.perform(AddDocumentIndexTask.java:41)
      2 frames
    7. com.atlassian.bonnie
      TempIndexWriter.perform
      1. com.atlassian.bonnie.index.TempIndexWriter.perform(TempIndexWriter.java:72)
      1 frame
    8. com.atlassian.confluence
      DefaultObjectQueueWorker$1.doInTransactionWithoutResult
      1. com.atlassian.confluence.search.lucene.TempIndexWriterStrategy.perform(TempIndexWriterStrategy.java:43)
      2. com.atlassian.confluence.search.lucene.tasks.TempIndexBackedIndexTaskPerformer.perform(TempIndexBackedIndexTaskPerformer.java:21)
      3. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.indexCollection(DefaultObjectQueueWorker.java:73)
      4. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker$1.doInTransactionWithoutResult(DefaultObjectQueueWorker.java:61)
      4 frames
    9. Spring Tx
      TransactionTemplate.execute
      1. org.springframework.transaction.support.TransactionCallbackWithoutResult.doInTransaction(TransactionCallbackWithoutResult.java:33)
      2. org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:127)
      2 frames
    10. com.atlassian.confluence
      DefaultObjectQueueWorker.run
      1. com.atlassian.confluence.search.lucene.DefaultObjectQueueWorker.run(DefaultObjectQueueWorker.java:50)
      1 frame
    11. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
      3. java.lang.Thread.run(Thread.java:595)
      3 frames