scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1:

Apache's JIRA Issue Tracker | Alexander Pivovarov | 9 months ago
  1. 0

    Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file:// {code} <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data01/yarn/nm,file:///data02/yarn/nm</value> </property> {code} other application, e.g. Hadoop MR and Hive work normally Spark works only if yarn.nodemanager.local-dirs does not have file:// prefix e.g. {code} <value>/data01/yarn/nm,/data02/yarn/nm</value> {code} to reproduce the issue open spark-shell run {code} $ spark-shell > sc.parallelize(1 to 10).count {code} stack trace in spark-shell is {code} scala> sc.parallelize(1 to 10).count 16/02/28 08:50:37 INFO spark.SparkContext: Starting job: count at <console>:28 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:28) with 2 output partitions 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at <console>:28) 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Missing parents: List() 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:28), which has no missing parents 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1096.0 B, free 1096.0 B) 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 804.0 B, free 1900.0 B) 16/02/28 08:50:38 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.101.124.13:39374 (size: 804.0 B, free: 511.5 MB) 16/02/28 08:50:38 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/02/28 08:50:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:28) 16/02/28 08:50:38 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 16/02/28 08:50:39 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 16/02/28 08:50:40 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2) 16/02/28 08:50:42 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34681) with ID 1 16/02/28 08:50:42 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 16/02/28 08:50:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:50:42 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:58315 with 3.8 GB RAM, BlockManagerId(1, ip-10-101-124-14, 58315) 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1. 16/02/28 08:50:53 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0) 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-10-101-124-14, 58315) 16/02/28 08:50:53 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 16/02/28 08:50:53 ERROR cluster.YarnScheduler: Lost executor 1 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 1 16/02/28 08:50:53 INFO spark.ExecutorAllocationManager: Existing executor 1 has been removed (new total is 0) 16/02/28 08:50:53 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 1 idle 16/02/28 08:50:57 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34687) with ID 2 16/02/28 08:50:57 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:50:57 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1) 16/02/28 08:50:57 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:42751 with 3.8 GB RAM, BlockManagerId(2, ip-10-101-124-14, 42751) 16/02/28 08:51:07 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2. 16/02/28 08:51:07 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 0) 16/02/28 08:51:07 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 16/02/28 08:51:07 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, ip-10-101-124-14, 42751) 16/02/28 08:51:07 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor 16/02/28 08:51:08 ERROR cluster.YarnScheduler: Lost executor 2 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, ip-10-101-124-14): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 2 16/02/28 08:51:08 INFO spark.ExecutorAllocationManager: Existing executor 2 has been removed (new total is 0) 16/02/28 08:51:08 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 2 idle 16/02/28 08:51:11 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34693) with ID 3 16/02/28 08:51:11 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:51:11 INFO spark.ExecutorAllocationManager: New executor 3 has registered (new total is 1) 16/02/28 08:51:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:54404 with 3.8 GB RAM, BlockManagerId(3, ip-10-101-124-14, 54404) 16/02/28 08:51:21 INFO cluster.YarnClientSchedulerBackend: Disabling executor 3. 16/02/28 08:51:21 INFO scheduler.DAGScheduler: Executor lost: 3 (epoch 0) 16/02/28 08:51:21 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster. 16/02/28 08:51:21 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, ip-10-101-124-14, 54404) 16/02/28 08:51:21 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor 16/02/28 08:51:23 ERROR cluster.YarnScheduler: Lost executor 3 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, ip-10-101-124-14): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 3 16/02/28 08:51:23 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0) 16/02/28 08:51:23 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 3 idle 16/02/28 08:51:25 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34699) with ID 4 16/02/28 08:51:25 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:51:25 INFO spark.ExecutorAllocationManager: New executor 4 has registered (new total is 1) 16/02/28 08:51:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:54611 with 3.8 GB RAM, BlockManagerId(4, ip-10-101-124-14, 54611) 16/02/28 08:51:29 INFO cluster.YarnClientSchedulerBackend: Disabling executor 4. 16/02/28 08:51:29 INFO scheduler.DAGScheduler: Executor lost: 4 (epoch 0) 16/02/28 08:51:29 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster. 16/02/28 08:51:29 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, ip-10-101-124-14, 54611) 16/02/28 08:51:29 INFO storage.BlockManagerMaster: Removed 4 successfully in removeExecutor 16/02/28 08:51:29 ERROR client.TransportClient: Failed to send RPC 8426247633040168504 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:29 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 4 at RPC address ip-10-101-124-14:34699, but got no response. Marking as slave lost. java.io.IOException: Failed to send RPC 8426247633040168504 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:29 ERROR cluster.YarnScheduler: Lost executor 4 on ip-10-101-124-14: Slave lost 16/02/28 08:51:29 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Slave lost 16/02/28 08:51:29 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job 16/02/28 08:51:29 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/02/28 08:51:29 INFO spark.ExecutorAllocationManager: Existing executor 4 has been removed (new total is 0) 16/02/28 08:51:29 INFO cluster.YarnScheduler: Cancelling stage 0 16/02/28 08:51:29 INFO scheduler.DAGScheduler: ResultStage 0 (count at <console>:28) failed in 51.814 s 16/02/28 08:51:29 INFO scheduler.DAGScheduler: Job 0 failed: count at <console>:28, took 51.930873 s 16/02/28 08:51:29 ERROR client.TransportClient: Failed to send RPC 6043284332463358792 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:29 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts java.io.IOException: Failed to send RPC 6043284332463358792 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Slave lost Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD.count(RDD.scala:1143) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC$$iwC.<init>(<console>:41) at $iwC$$iwC.<init>(<console>:43) at $iwC.<init>(<console>:45) at <init>(<console>:47) at .<init>(<console>:51) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) scala> 16/02/28 08:51:32 ERROR client.TransportClient: Failed to send RPC 7722266052838253208 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:32 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 2 attempts java.io.IOException: Failed to send RPC 7722266052838253208 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:33 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/02/28 08:51:33 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> ip-10-101-124-13, PROXY_URI_BASES -> http://ip-10-101-124-13:8088/proxy/application_1456648448960_0003), /proxy/application_1456648448960_0003 16/02/28 08:51:33 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/02/28 08:51:35 ERROR client.TransportClient: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:35 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 3 attempts java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:35 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful org.apache.spark.SparkException: Error sending message [message = RequestExecutors(0,0,Map())] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcV$sp(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:35 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts org.apache.spark.SparkException: Error sending message [message = RequestExecutors(0,0,Map())] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcV$sp(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more Caused by: java.nio.channels.ClosedChannelException {code}

    Apache's JIRA Issue Tracker | 9 months ago | Alexander Pivovarov
    scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1:
  2. 0

    Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file:// {code} <property> <name>yarn.nodemanager.local-dirs</name> <value>file:///data01/yarn/nm,file:///data02/yarn/nm</value> </property> {code} other application, e.g. Hadoop MR and Hive work normally Spark works only if yarn.nodemanager.local-dirs does not have file:// prefix e.g. {code} <value>/data01/yarn/nm,/data02/yarn/nm</value> {code} to reproduce the issue open spark-shell run {code} $ spark-shell > sc.parallelize(1 to 10).count {code} stack trace in spark-shell is {code} scala> sc.parallelize(1 to 10).count 16/02/28 08:50:37 INFO spark.SparkContext: Starting job: count at <console>:28 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Got job 0 (count at <console>:28) with 2 output partitions 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at <console>:28) 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Parents of final stage: List() 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Missing parents: List() 16/02/28 08:50:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:28), which has no missing parents 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1096.0 B, free 1096.0 B) 16/02/28 08:50:38 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 804.0 B, free 1900.0 B) 16/02/28 08:50:38 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.101.124.13:39374 (size: 804.0 B, free: 511.5 MB) 16/02/28 08:50:38 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 16/02/28 08:50:38 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at <console>:28) 16/02/28 08:50:38 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 16/02/28 08:50:39 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 16/02/28 08:50:40 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2) 16/02/28 08:50:42 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34681) with ID 1 16/02/28 08:50:42 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 16/02/28 08:50:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:50:42 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:58315 with 3.8 GB RAM, BlockManagerId(1, ip-10-101-124-14, 58315) 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1. 16/02/28 08:50:53 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 0) 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 16/02/28 08:50:53 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, ip-10-101-124-14, 58315) 16/02/28 08:50:53 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 16/02/28 08:50:53 ERROR cluster.YarnScheduler: Lost executor 1 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:50:53 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 1 16/02/28 08:50:53 INFO spark.ExecutorAllocationManager: Existing executor 1 has been removed (new total is 0) 16/02/28 08:50:53 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 1 idle 16/02/28 08:50:57 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34687) with ID 2 16/02/28 08:50:57 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:50:57 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1) 16/02/28 08:50:57 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:42751 with 3.8 GB RAM, BlockManagerId(2, ip-10-101-124-14, 42751) 16/02/28 08:51:07 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2. 16/02/28 08:51:07 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 0) 16/02/28 08:51:07 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 16/02/28 08:51:07 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, ip-10-101-124-14, 42751) 16/02/28 08:51:07 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor 16/02/28 08:51:08 ERROR cluster.YarnScheduler: Lost executor 2 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1, ip-10-101-124-14): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000003 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:08 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 2 16/02/28 08:51:08 INFO spark.ExecutorAllocationManager: Existing executor 2 has been removed (new total is 0) 16/02/28 08:51:08 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 2 idle 16/02/28 08:51:11 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34693) with ID 3 16/02/28 08:51:11 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:51:11 INFO spark.ExecutorAllocationManager: New executor 3 has registered (new total is 1) 16/02/28 08:51:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:54404 with 3.8 GB RAM, BlockManagerId(3, ip-10-101-124-14, 54404) 16/02/28 08:51:21 INFO cluster.YarnClientSchedulerBackend: Disabling executor 3. 16/02/28 08:51:21 INFO scheduler.DAGScheduler: Executor lost: 3 (epoch 0) 16/02/28 08:51:21 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 3 from BlockManagerMaster. 16/02/28 08:51:21 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, ip-10-101-124-14, 54404) 16/02/28 08:51:21 INFO storage.BlockManagerMaster: Removed 3 successfully in removeExecutor 16/02/28 08:51:23 ERROR cluster.YarnScheduler: Lost executor 3 on ip-10-101-124-14: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, ip-10-101-124-14): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 16/02/28 08:51:23 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 3 16/02/28 08:51:23 INFO spark.ExecutorAllocationManager: Existing executor 3 has been removed (new total is 0) 16/02/28 08:51:23 WARN spark.ExecutorAllocationManager: Attempted to mark unknown executor 3 idle 16/02/28 08:51:25 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (ip-10-101-124-14:34699) with ID 4 16/02/28 08:51:25 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14, partition 0,PROCESS_LOCAL, 2078 bytes) 16/02/28 08:51:25 INFO spark.ExecutorAllocationManager: New executor 4 has registered (new total is 1) 16/02/28 08:51:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager ip-10-101-124-14:54611 with 3.8 GB RAM, BlockManagerId(4, ip-10-101-124-14, 54611) 16/02/28 08:51:29 INFO cluster.YarnClientSchedulerBackend: Disabling executor 4. 16/02/28 08:51:29 INFO scheduler.DAGScheduler: Executor lost: 4 (epoch 0) 16/02/28 08:51:29 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 4 from BlockManagerMaster. 16/02/28 08:51:29 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(4, ip-10-101-124-14, 54611) 16/02/28 08:51:29 INFO storage.BlockManagerMaster: Removed 4 successfully in removeExecutor 16/02/28 08:51:29 ERROR client.TransportClient: Failed to send RPC 8426247633040168504 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:29 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor loss reason for executor id 4 at RPC address ip-10-101-124-14:34699, but got no response. Marking as slave lost. java.io.IOException: Failed to send RPC 8426247633040168504 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:29 ERROR cluster.YarnScheduler: Lost executor 4 on ip-10-101-124-14: Slave lost 16/02/28 08:51:29 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Slave lost 16/02/28 08:51:29 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job 16/02/28 08:51:29 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 16/02/28 08:51:29 INFO spark.ExecutorAllocationManager: Existing executor 4 has been removed (new total is 0) 16/02/28 08:51:29 INFO cluster.YarnScheduler: Cancelling stage 0 16/02/28 08:51:29 INFO scheduler.DAGScheduler: ResultStage 0 (count at <console>:28) failed in 51.814 s 16/02/28 08:51:29 INFO scheduler.DAGScheduler: Job 0 failed: count at <console>:28, took 51.930873 s 16/02/28 08:51:29 ERROR client.TransportClient: Failed to send RPC 6043284332463358792 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:29 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts java.io.IOException: Failed to send RPC 6043284332463358792 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, ip-10-101-124-14): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Slave lost Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929) at org.apache.spark.rdd.RDD.count(RDD.scala:1143) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:35) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:37) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:39) at $iwC$$iwC$$iwC.<init>(<console>:41) at $iwC$$iwC.<init>(<console>:43) at $iwC.<init>(<console>:45) at <init>(<console>:47) at .<init>(<console>:51) at .<clinit>(<console>) at .<init>(<console>:7) at .<clinit>(<console>) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) scala> 16/02/28 08:51:32 ERROR client.TransportClient: Failed to send RPC 7722266052838253208 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:32 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 2 attempts java.io.IOException: Failed to send RPC 7722266052838253208 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:33 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 16/02/28 08:51:33 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> ip-10-101-124-13, PROXY_URI_BASES -> http://ip-10-101-124-13:8088/proxy/application_1456648448960_0003), /proxy/application_1456648448960_0003 16/02/28 08:51:33 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 16/02/28 08:51:35 ERROR client.TransportClient: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 16/02/28 08:51:35 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 3 attempts java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:35 ERROR cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map()) to AM was unsuccessful org.apache.spark.SparkException: Error sending message [message = RequestExecutors(0,0,Map())] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcV$sp(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more Caused by: java.nio.channels.ClosedChannelException 16/02/28 08:51:35 WARN netty.NettyRpcEndpointRef: Error sending message [message = RequestExecutors(0,0,Map())] in 1 attempts org.apache.spark.SparkException: Error sending message [message = RequestExecutors(0,0,Map())] at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:118) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcV$sp(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnSchedulerEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(YarnSchedulerBackend.scala:185) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to send RPC 8820149531346579856 to ip-10-101-124-14/10.101.124.14:34677: java.nio.channels.ClosedChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) ... 1 more Caused by: java.nio.channels.ClosedChannelException {code}

    Apache's JIRA Issue Tracker | 9 months ago | Alexander Pivovarov
    scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1:
  3. 0

    [jira] [Updated] (SPARK-13532) Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file://

    spark-issues | 9 months ago | Alexander Pivovarov (JIRA)
    scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1:
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    [jira] [Commented] (SPARK-13532) Spark yarn executor container fails if yarn.nodemanager.local-dirs starts with file://

    spark-issues | 9 months ago | Alexander Pivovarov (JIRA)
    scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2, ip-10-101-124-14): ExecutorLostFailure (executor 3 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000004 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000004 Exit code: 1 Stack trace: ExitCodeException exitCode=1:
  6. 0

    [SPARK-4879] Use driver to coordinate Hadoop output committing for speculative tasks by JoshRosen · Pull Request #4066 · apache/spark · GitHub

    github.com | 11 months ago
    scheduler.TaskSetManager: Lost task 75.0 in stage 66.0 (TID 6861, ip-172-31-1-124.us-west-2.compute.internal): java.io.IOException: Failed to save output of task: attempt_201502102217_0066_m_000075_6861

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. scheduler.TaskSetManager

      Lost task 0.0 in stage 0.0 (TID 0, ip-10-101-124-14): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1456648448960_0003_01_000002 on host: ip-10-101-124-14. Exit status: 1. Diagnostics: Exception from container-launch. Container id: container_1456648448960_0003_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1:

      at org.apache.hadoop.util.Shell.runCommand()
    2. Hadoop
      Shell$ShellCommandExecutor.execute
      1. org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
      2. org.apache.hadoop.util.Shell.run(Shell.java:455)
      3. org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
      3 frames
    3. hadoop-yarn-server-nodemanager
      ContainerLaunch.call
      1. org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
      2. org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
      3. org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
      3 frames
    4. Java RT
      Thread.run
      1. java.util.concurrent.FutureTask.run(FutureTask.java:262)
      2. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      3. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      4. java.lang.Thread.run(Thread.java:745)
      4 frames