java.net.SocketException: Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry

Sakai JIRA | Tim Donohue | 3 years ago
  1. 0

    When performing a backup to DuraCloud using the Replication Task Suite, sometimes larger files (>400MB) will experience random "Caused by: java.net.SocketException: Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry at org.dspace.ctask.replicate.store.DuraCloudObjectStore.uploadReplica(DuraCloudObjectStore.java:193) at org.dspace.ctask.replicate.store.DuraCloudObjectStore.transferObject(DuraCloudObjectStore.java:159) at org.dspace.ctask.replicate.ReplicaManager.transferObject(ReplicaManager.java:259) at org.dspace.ctask.replicate.TransmitAIP.perform(TransmitAIP.java:68) at org.dspace.curate.ResolvedTask.perform(ResolvedTask.java:88) at org.dspace.curate.Curator$TaskRunner.run(Curator.java:563) Unfortunately, when this error is encountered (from commandline or Admin UI), the entire backup to DuraCloud fails/halts, and it needs to be restarted from the beginning. After talking with the DuraCloud team, it sounds like these are issues in Amazon S3 itself, and are essentially temporary timeouts (if you try the upload again, it almost always will succeed the second time). The recommended resolution is to attempt to catch the error and automatically "retry" the upload to DuraCloud (a set number of times). In addition, we should enhance the error handling in the Replication Task Suite so that it's possible to report individual backup failures, but continue the backup process. We should not always return a complete failure if a single error is encountered...instead we should backup what content we can and report which content failed to be backed up.

    Sakai JIRA | 3 years ago | Tim Donohue
    java.net.SocketException: Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry
  2. 0

    When performing a backup to DuraCloud using the Replication Task Suite, sometimes larger files (>400MB) will experience random "Caused by: java.net.SocketException: Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry at org.dspace.ctask.replicate.store.DuraCloudObjectStore.uploadReplica(DuraCloudObjectStore.java:193) at org.dspace.ctask.replicate.store.DuraCloudObjectStore.transferObject(DuraCloudObjectStore.java:159) at org.dspace.ctask.replicate.ReplicaManager.transferObject(ReplicaManager.java:259) at org.dspace.ctask.replicate.TransmitAIP.perform(TransmitAIP.java:68) at org.dspace.curate.ResolvedTask.perform(ResolvedTask.java:88) at org.dspace.curate.Curator$TaskRunner.run(Curator.java:563) Unfortunately, when this error is encountered (from commandline or Admin UI), the entire backup to DuraCloud fails/halts, and it needs to be restarted from the beginning. After talking with the DuraCloud team, it sounds like these are issues in Amazon S3 itself, and are essentially temporary timeouts (if you try the upload again, it almost always will succeed the second time). The recommended resolution is to attempt to catch the error and automatically "retry" the upload to DuraCloud (a set number of times). In addition, we should enhance the error handling in the Replication Task Suite so that it's possible to report individual backup failures, but continue the backup process. We should not always return a complete failure if a single error is encountered...instead we should backup what content we can and report which content failed to be backed up.

    Sakai JIRA | 3 years ago | Tim Donohue
    java.net.SocketException: Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry
  3. 0
    before running my test in debug mode, i open the debug view, and remove some breakpoints (such as some remaining on NullPointersExceptions, and Sockets)
    via GitHub by ppoulard
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0
    Here is a animation for the life cycle. http://tcp.cs.st-andrews.ac.uk/index.shtml?page=connection_lifecycle
  6. 0
    Set larger socket timeout (or 0 to set no timeout).

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.net.SocketException

      Connection reset." errors from Amazon S3 storage. In the DSpace logs, these errors actually look like: Could not add content ITEM@123456-789.zip with type application/zip and size 466096426 to S3 bucket akiajpoktiep72aase4a.my-backup due to error: Encountered an exception and couldn't reset the stream to retry

      at org.dspace.ctask.replicate.store.DuraCloudObjectStore.uploadReplica()
    2. org.dspace.ctask
      TransmitAIP.perform
      1. org.dspace.ctask.replicate.store.DuraCloudObjectStore.uploadReplica(DuraCloudObjectStore.java:193)
      2. org.dspace.ctask.replicate.store.DuraCloudObjectStore.transferObject(DuraCloudObjectStore.java:159)
      3. org.dspace.ctask.replicate.ReplicaManager.transferObject(ReplicaManager.java:259)
      4. org.dspace.ctask.replicate.TransmitAIP.perform(TransmitAIP.java:68)
      4 frames
    3. DSpace Kernel :: API and Implementation
      Curator$TaskRunner.run
      1. org.dspace.curate.ResolvedTask.perform(ResolvedTask.java:88)
      2. org.dspace.curate.Curator$TaskRunner.run(Curator.java:563)
      2 frames