com.vmware.bdd.exception.TaskException: task execution failed:

SpringSource Issue Tracker | Rajit Saha | 4 years ago
  1. 0

    Scenario: ========= I had a data/compute separate cluster with 5 Compute nodes. I set elasticity as Manual with --targetComputeNodeNum 2. Then I tried to scale out the cluster to 6 compute nodes with cluster resize command. The command fails with exceptions in Serengeti.log. The problems are 1. Command Fails {noformat} FAILED 92% node group: master, instance number: 1 roles:[hadoop_namenode, hadoop_jobtracker] NAME IP STATUS TASK ------------------------------------------------ hadoop1-master-0 xx.yyy.zzz.150 VM Ready node group: data, instance number: 1 roles:[hadoop_datanode] NAME IP STATUS TASK ---------------------------------------------- hadoop1-data-0 xx.yyy.zzz.228 VM Ready node group: compute, instance number: 6 roles:[hadoop_tasktracker] NAME IP STATUS TASK ------------------------------------------------------------------ hadoop1-compute-5 xx.yyy.zzz.135 VM Ready Formatting data disks hadoop1-compute-4 xx.yyy.zzz.112 VM Ready hadoop1-compute-3 xx.yyy.zzz.145 VM Ready hadoop1-compute-2 xx.yyy.zzz.195 VM Ready Bootstrapping VM hadoop1-compute-1 xx.yyy.zzz.239 VM Ready Bootstrapping VM hadoop1-compute-0 xx.yyy.zzz.221 VM Ready Bootstrapping VM node group: client, instance number: 1 roles:[hadoop_client, pig, hive, hive_server] NAME IP STATUS TASK ------------------------------------------------ hadoop1-client-0 xx.yyy.zzz.116 VM Ready cluster hadoop1 resize failed: task execution failed: you can get task failure details from serengeti server log at: /opt/serengeti/logs/serengeti*,/opt/serengeti/logs/ironfan*,/opt/serengeti/logs/task/17/1 {noformat} 2. Found some exception in serengeti.log {noformat} 2013 Jun 01 08:12:50,264+0000 INFO ProgressMonitor-hadoop1| com.vmware.bdd.service.job.software.ProgressMonitor: operation has not finished. wait again 2013 Jun 01 08:12:53,443+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient: Failed run cluseter operation for cluster: hadoop1 2013 Jun 01 08:12:53,444+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: operation : CREATE failed on cluster: hadoop1 com.vmware.bdd.software.mgmt.exception.SoftwareManagementException: failed run operation CREATE for cluster hadoop1. Error is: Exception was thrown during calling ironfan cluster APIs. Ironfan error message: No route to host - connect(2) at com.vmware.bdd.software.mgmt.exception.SoftwareManagementException.CLUSTER_OPERATIOIN_FAILURE(SoftwareManagementException.java:49) at com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient.runClusterOperation(SoftwareManagementClient.java:80) at com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask.call(IronfanSoftwareManagementTask.java:83) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:83) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) Caused by: ClusterOperationException(message:Exception was thrown during calling ironfan cluster APIs. Ironfan error message: No route to host - connect(2)) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result$runClusterOperation_resultStandardScheme.read(SoftwareManagement.java:1105) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result$runClusterOperation_resultStandardScheme.read(SoftwareManagement.java:1083) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result.read(SoftwareManagement.java:1027) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$Client.recv_runClusterOperation(SoftwareManagement.java:106) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$Client.runClusterOperation(SoftwareManagement.java:93) at com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient.runClusterOperation(SoftwareManagementClient.java:76) ... 33 more 2013 Jun 01 08:12:53,445+0000 INFO ProgressMonitor-hadoop1| com.vmware.bdd.service.job.software.ProgressMonitor: Monitor thread was stopped ... 2013 Jun 01 08:12:53,716+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.entity.NodeEntity: node hadoop1-master-0 action changed to 2013 Jun 01 08:12:53,716+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: updated progress. finished? false 2013 Jun 01 08:12:53,716+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: command execution failed, error message is 2013 Jun 01 08:12:53,718+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Aspect for exception handling 2013 Jun 01 08:12:53,718+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Service error com.vmware.bdd.exception.TaskException: task execution failed: at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:89) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Jun 01 08:12:53,719+0000 ERROR SimpleAsyncTaskExecutor-17| org.springframework.batch.core.step.AbstractStep: Encountered an error executing the step com.vmware.bdd.exception.TaskException: task execution failed: at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:89) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Jun 01 08:12:53,720+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.SimpleStepExecutionListener: step finished: softwareResizeClusterStep 2013 Jun 01 08:12:53,725+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.ClusterJobExecutionListener: set cluster hadoop1 status to RUNNING 2013 Jun 01 08:12:53,729+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.InMemoryJobExecutionStatusHolder: unregistering job execution: 17 2013 Jun 01 08:12:53,731+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.ResizeClusterJobExecutionListener: Set cluster hadoop1 group compute instance number to 5 2013 Jun 01 08:12:53,733+0000 INFO SimpleAsyncTaskExecutor-17| org.springframework.batch.core.launch.support.SimpleJobLauncher: Job: [FlowJob: [name=resizeClusterJob]] completed with the following parameters: [{clusterFailureStatus=RUNNING, clusterName=hadoop1, clusterSuccessStatus=RUNNING, groupName=compute, newInstanceNumber=6, oldInstanceNumber=5, timeStamp=1370074136179, verifyNodeScope=group}] and the following status: [FAILED] 2013 Jun 01 08:12:55,450+0000 ERROR http-8080-1| com.vmware.bdd.manager.JobManager: mark task as failed: task execution failed: 2013 Jun 01 08:14:19,511+0000 INFO VC_TASK_THREAD| com.vmware.bdd.entity.NodeEntity: node hadoop1-compute-2 status changed to Powered Off 2013 Jun 01 08:14:19,512+0000 INFO VC_TASK_THREAD| com.vmware.bdd.entity.NodeEntity: node hadoop1-compute-1 status changed to Powered Off ... {noformat} 3. Instead of 2 , 3 Compute VMs were switched ON and as well as Jobtrackers shows 3 TTs are online, whereas Manual elasticity modes shows to have 2 TT only should be UP

    SpringSource Issue Tracker | 4 years ago | Rajit Saha
    com.vmware.bdd.exception.TaskException: task execution failed:
  2. 0

    Scenario: ========= I had a data/compute separate cluster with 5 Compute nodes. I set elasticity as Manual with --targetComputeNodeNum 2. Then I tried to scale out the cluster to 6 compute nodes with cluster resize command. The command fails with exceptions in Serengeti.log. The problems are 1. Command Fails {noformat} FAILED 92% node group: master, instance number: 1 roles:[hadoop_namenode, hadoop_jobtracker] NAME IP STATUS TASK ------------------------------------------------ hadoop1-master-0 xx.yyy.zzz.150 VM Ready node group: data, instance number: 1 roles:[hadoop_datanode] NAME IP STATUS TASK ---------------------------------------------- hadoop1-data-0 xx.yyy.zzz.228 VM Ready node group: compute, instance number: 6 roles:[hadoop_tasktracker] NAME IP STATUS TASK ------------------------------------------------------------------ hadoop1-compute-5 xx.yyy.zzz.135 VM Ready Formatting data disks hadoop1-compute-4 xx.yyy.zzz.112 VM Ready hadoop1-compute-3 xx.yyy.zzz.145 VM Ready hadoop1-compute-2 xx.yyy.zzz.195 VM Ready Bootstrapping VM hadoop1-compute-1 xx.yyy.zzz.239 VM Ready Bootstrapping VM hadoop1-compute-0 xx.yyy.zzz.221 VM Ready Bootstrapping VM node group: client, instance number: 1 roles:[hadoop_client, pig, hive, hive_server] NAME IP STATUS TASK ------------------------------------------------ hadoop1-client-0 xx.yyy.zzz.116 VM Ready cluster hadoop1 resize failed: task execution failed: you can get task failure details from serengeti server log at: /opt/serengeti/logs/serengeti*,/opt/serengeti/logs/ironfan*,/opt/serengeti/logs/task/17/1 {noformat} 2. Found some exception in serengeti.log {noformat} 2013 Jun 01 08:12:50,264+0000 INFO ProgressMonitor-hadoop1| com.vmware.bdd.service.job.software.ProgressMonitor: operation has not finished. wait again 2013 Jun 01 08:12:53,443+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient: Failed run cluseter operation for cluster: hadoop1 2013 Jun 01 08:12:53,444+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: operation : CREATE failed on cluster: hadoop1 com.vmware.bdd.software.mgmt.exception.SoftwareManagementException: failed run operation CREATE for cluster hadoop1. Error is: Exception was thrown during calling ironfan cluster APIs. Ironfan error message: No route to host - connect(2) at com.vmware.bdd.software.mgmt.exception.SoftwareManagementException.CLUSTER_OPERATIOIN_FAILURE(SoftwareManagementException.java:49) at com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient.runClusterOperation(SoftwareManagementClient.java:80) at com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask.call(IronfanSoftwareManagementTask.java:83) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:83) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) Caused by: ClusterOperationException(message:Exception was thrown during calling ironfan cluster APIs. Ironfan error message: No route to host - connect(2)) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result$runClusterOperation_resultStandardScheme.read(SoftwareManagement.java:1105) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result$runClusterOperation_resultStandardScheme.read(SoftwareManagement.java:1083) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$runClusterOperation_result.read(SoftwareManagement.java:1027) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$Client.recv_runClusterOperation(SoftwareManagement.java:106) at com.vmware.bdd.software.mgmt.thrift.SoftwareManagement$Client.runClusterOperation(SoftwareManagement.java:93) at com.vmware.bdd.software.mgmt.impl.SoftwareManagementClient.runClusterOperation(SoftwareManagementClient.java:76) ... 33 more 2013 Jun 01 08:12:53,445+0000 INFO ProgressMonitor-hadoop1| com.vmware.bdd.service.job.software.ProgressMonitor: Monitor thread was stopped ... 2013 Jun 01 08:12:53,716+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.entity.NodeEntity: node hadoop1-master-0 action changed to 2013 Jun 01 08:12:53,716+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: updated progress. finished? false 2013 Jun 01 08:12:53,716+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.software.ironfan.IronfanSoftwareManagementTask: command execution failed, error message is 2013 Jun 01 08:12:53,718+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Aspect for exception handling 2013 Jun 01 08:12:53,718+0000 ERROR SimpleAsyncTaskExecutor-17| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Service error com.vmware.bdd.exception.TaskException: task execution failed: at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:89) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Jun 01 08:12:53,719+0000 ERROR SimpleAsyncTaskExecutor-17| org.springframework.batch.core.step.AbstractStep: Encountered an error executing the step com.vmware.bdd.exception.TaskException: task execution failed: at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:89) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy160.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Jun 01 08:12:53,720+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.SimpleStepExecutionListener: step finished: softwareResizeClusterStep 2013 Jun 01 08:12:53,725+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.ClusterJobExecutionListener: set cluster hadoop1 status to RUNNING 2013 Jun 01 08:12:53,729+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.InMemoryJobExecutionStatusHolder: unregistering job execution: 17 2013 Jun 01 08:12:53,731+0000 INFO SimpleAsyncTaskExecutor-17| com.vmware.bdd.service.job.ResizeClusterJobExecutionListener: Set cluster hadoop1 group compute instance number to 5 2013 Jun 01 08:12:53,733+0000 INFO SimpleAsyncTaskExecutor-17| org.springframework.batch.core.launch.support.SimpleJobLauncher: Job: [FlowJob: [name=resizeClusterJob]] completed with the following parameters: [{clusterFailureStatus=RUNNING, clusterName=hadoop1, clusterSuccessStatus=RUNNING, groupName=compute, newInstanceNumber=6, oldInstanceNumber=5, timeStamp=1370074136179, verifyNodeScope=group}] and the following status: [FAILED] 2013 Jun 01 08:12:55,450+0000 ERROR http-8080-1| com.vmware.bdd.manager.JobManager: mark task as failed: task execution failed: 2013 Jun 01 08:14:19,511+0000 INFO VC_TASK_THREAD| com.vmware.bdd.entity.NodeEntity: node hadoop1-compute-2 status changed to Powered Off 2013 Jun 01 08:14:19,512+0000 INFO VC_TASK_THREAD| com.vmware.bdd.entity.NodeEntity: node hadoop1-compute-1 status changed to Powered Off ... {noformat} 3. Instead of 2 , 3 Compute VMs were switched ON and as well as Jobtrackers shows 3 TTs are online, whereas Manual elasticity modes shows to have 2 TT only should be UP

    SpringSource Issue Tracker | 4 years ago | Rajit Saha
    com.vmware.bdd.exception.TaskException: task execution failed:
  3. 0

    Using "cluster setElasticity --mode manual " after we limit compute node to a lower number, next time using same command to bring up more compute nodes give inconsistent results. Most of the time the command fails. In my cluster setup I have DHCP with default VM Network. Here is the scenario more precisely. I have a cluster of 4 compute only nodes. First I want to have no compute node Up. Next I want to bring up 1/2/3/4 compute node up. My second step fails most of the times. {noformat} serengeti>cluster setElasticity --mode manual --name hadoop1 --targetComputeNodeNum ... ... COMPLETED 100% cluster name: hadoop1, distro: apache, status: RUNNING GROUP NAME RUNNING NODE NUMBER I/O PRIORITY ------------------------------------------------ masterNClient 1 data 1 compute 0 serengeti>cluster setElasticity --mode manual --name hadoop1 --targetComputeNodeNum 1 STARTED 0% STARTED 40% STARTED 50% STARTED 60% FAILED 60% cluster hadoop1 setElasticity failed: task execution failed: No error message from VHM. you can get task failure details from serengeti server log at: /opt/serengeti/logs/serengeti*,/opt/serengeti/logs/ironfan* serengeti> {noformat} Here is the snippet of vim.log {noformat} 2013 Apr 22 19:43:39.19 [10-EmbeddedVHM] Processing message... 2013 Apr 22 19:43:39.20 [10-EmbeddedVHM] Progress percent = 10% msg= 2013 Apr 22 19:43:39.21 [10-EmbeddedVHM] Getting cluster inventory information... 2013 Apr 22 19:43:39.117 [10-EmbeddedVHM] Finding TT states for compute 2013 Apr 22 19:43:39.143 [10-EmbeddedVHM] Total TT VMs = 3, total powered-on TT VMs = 1, target powered-on TT VMs = 3 2013 Apr 22 19:43:39.145 [10-EmbeddedVHM] Progress percent = 30% msg= 2013 Apr 22 19:43:39.145 [10-EmbeddedVHM] Target TT VMs to enable/disable = 2 2013 Apr 22 19:43:39.146 [10-EmbeddedVHM] Progress percent = 40% msg= 2013 Apr 22 19:43:39.147 [10-HadoopAdaptor] TTs length: 2 2013 Apr 22 19:43:39.148 [10-HadoopConnection] Copying data to remote file /tmp/rlist.txt on jobtracker 2013 Apr 22 19:43:39.491 [10-HadoopConnection] Executing remote script: /tmp/recommissionTTs.sh on jobtracker 2013 Apr 22 19:43:43.669 [10-HadoopConnection] Exit status from exec is: 0 2013 Apr 22 19:43:43.669 [10-HadoopAdaptor] Successfully executed recommission script (/tmp/recommissionTTs.sh); 2013 Apr 22 19:43:43.671 [10-AbstractEDP] Progress percent = 50% msg= 2013 Apr 22 19:43:43.672 [10-AbstractEDP] Enabling VM hadoop1-compute-1 ... 2013 Apr 22 19:43:43.687 [10-AbstractEDP] Enabling VM hadoop1-compute-0 ... 2013 Apr 22 19:43:43.710 [10-AbstractEDP] Waiting for completion... 2013 Apr 22 19:43:49.279 [10-AbstractEDP] Done 2013 Apr 22 19:43:49.281 [10-AbstractEDP] Progress percent = 60% msg=verifying active task trackers 2013 Apr 22 19:43:49.281 [10-HadoopAdaptor] AffectedTTs: 2013 Apr 22 19:43:49.282 [10-HadoopAdaptor] proma-1n-dhcp82.eng.vmware.com 2013 Apr 22 19:43:49.282 [10-HadoopAdaptor] proma-1n-dhcp193.eng.vmware.com 2013 Apr 22 19:43:49.283 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:44:16.819 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:44:16.820 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:44:16.820 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (1) 2013 Apr 22 19:44:16.820 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:44:44.150 [10-HadoopConnection] Exit status from exec is: 107 … … 2013 Apr 22 19:50:20.896 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (14) 2013 Apr 22 19:50:20.896 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:50:49.315 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:50:49.316 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:50:49.316 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (15) 2013 Apr 22 19:50:49.317 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:51:17.716 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:51:17.716 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:51:17.717 [10-HadoopAdaptor] # Active TTs < Target number of TTs -- checked by validator script (/tmp/checkTargetTTsSuccess.sh); 2013 Apr 22 19:51:17.719 [10-EmbeddedVHM] Progress percent = 90% msg= 2013 Apr 22 19:51:17.719 [10-EmbeddedVHM] TaskStatus: interpretErrorCode null SUCCEEDED 2013 Apr 22 19:51:17.720 [10-EmbeddedVHM] TaskStatus: blockOnVMTaskCompletion null SUCCEEDED 2013 Apr 22 19:51:17.720 [10-EmbeddedVHM] TaskStatus: blockOnVMTaskCompletion null SUCCEEDED 2013 Apr 22 19:51:17.721 [10-EmbeddedVHM] TaskStatus: testForPowerState null SUCCEEDED 2013 Apr 22 19:51:17.721 [10-EmbeddedVHM] TaskStatus: testForPowerState null SUCCEEDED 2013 Apr 22 19:51:17.722 [10-EmbeddedVHM] TaskStatus: interpretErrorCode # Active TTs < Target number of TTs -- checked by validator script (/tmp/checkTargetTTsSuccess.sh); FAILED {noformat} Snippet of serengeti.log {noformat} 2013 Apr 22 19:48:39,523+0000 ERROR Thread-74| com.vmware.bdd.utils.RabbitMQConsumer: stop receiving messages without normal termination 2013 Apr 22 19:48:39,536+0000 ERROR pool-3-thread-16| com.vmware.bdd.command.MessageTask: No error message from VHM. 2013 Apr 22 19:48:39,539+0000 ERROR SimpleAsyncTaskExecutor-48| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Service error com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM. at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.SetManualElasticityStep.executeStep(SetManualElasticityStep.java:74) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1724.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy149.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Apr 22 19:48:39,541+0000 ERROR SimpleAsyncTaskExecutor-48| org.springframework.batch.core.step.AbstractStep: Encountered an error executing the step {noformat}

    SpringSource Issue Tracker | 4 years ago | Rajit Saha
    com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM.
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    In my cluster I have set elasticity mode as Auto and then I set manual elasticity mode with "targetComputeNodeNum" as max ( 4 ) Then onwards I can not set manual elasticity mode. Steps followed : 1. serengeti>cluster setParam --name hadoop1 --elasticityMode auto 2. serengeti>cluster setParam --name hadoop1 --elasticityMode manual --targetComputeNodeNum 3 From no onwards everytime setting up manual elasticity mode fails with following message {noformat} cluster hadoop1 setParam failed: task execution failed: No error message from VHM. you can get task failure details from serengeti server log at: /opt/serengeti/logs/serengeti*,/opt/serengeti/logs/ironfan* {noformat} and I get following exception in serengeti.log. {noformat} 2013 May 18 05:11:45,339+0000 ERROR Thread-92| com.vmware.bdd.utils.RabbitMQConsumer: stop receiving messages without normal termination 2013 May 18 05:11:45,344+0000 INFO Thread-92| com.vmware.bdd.utils.TracedRunnable: cleanup execution: com.vmware.bdd.command.MessageProcessor 2013 May 18 05:11:45,344+0000 INFO Thread-92| com.vmware.bdd.utils.TracedRunnable: execution succeed: com.vmware.bdd.command.MessageProcessor 2013 May 18 05:11:45,345+0000 ERROR pool-3-thread-2| com.vmware.bdd.command.MessageTask: No error message from VHM. 2013 May 18 05:11:45,347+0000 INFO SimpleAsyncTaskExecutor-30| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Aspect for exception handling 2013 May 18 05:11:45,347+0000 ERROR SimpleAsyncTaskExecutor-30| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Service error com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM. at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.SetManualElasticityStep.executeStep(SetManualElasticityStep.java:76) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1727.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy152.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 May 18 05:11:45,347+0000 ERROR SimpleAsyncTaskExecutor-30| org.springframework.batch.core.step.AbstractStep: Encountered an error executing the step com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM. at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.SetManualElasticityStep.executeStep(SetManualElasticityStep.java:76) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1727.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy152.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 May 18 05:11:45,348+0000 INFO SimpleAsyncTaskExecutor-30| com.vmware.bdd.service.job.SimpleStepExecutionListener: step finished: setManualElasticityStep 2013 May 18 05:11:45,351+0000 INFO SimpleAsyncTaskExecutor-30| com.vmware.bdd.service.job.ClusterJobExecutionListener: set cluster hadoop1 status to RUNNING 2013 May 18 05:11:45,354+0000 INFO SimpleAsyncTaskExecutor-30| com.vmware.bdd.service.job.InMemoryJobExecutionStatusHolder: unregistering job execution: 30 2013 May 18 05:11:45,355+0000 INFO SimpleAsyncTaskExecutor-30| org.springframework.batch.core.launch.support.SimpleJobLauncher: Job: [FlowJob: [name=setManualElasticityJob]] completed with the following parameters: [{activeComputeNodeNumber=4, clusterFailureStatus=RUNNING, clusterName=hadoop1, clusterSuccessStatus=RUNNING, groupName=["compute"], hadoopJobTrackerIP=10.140.109.215, timeStamp=1368853602938}] and the following status: [FAILED] {noformat} The exception found in vhm.log {noformat} 2013 May 18 05:34:53.125 [Stats_Thread_352-AbstractStatsProducer] StatsProducer com.vmware.vhadoop.vhm.stats.VCStatsProducer threw uncaught and unexpected exception java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1117) at java.util.TreeMap$KeyIterator.next(TreeMap.java:1171) at com.vmware.vhadoop.vhm.stats.StatsCollectorImpl$StatsProducerCallback.addSnapshots(StatsCollectorImpl.java:115) at com.vmware.vhadoop.vhm.stats.AbstractStatsProducer.run(AbstractStatsProducer.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) {noformat}

    SpringSource Issue Tracker | 4 years ago | Rajit Saha
    com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM.
  6. 0

    Using "cluster setElasticity --mode manual " after we limit compute node to a lower number, next time using same command to bring up more compute nodes give inconsistent results. Most of the time the command fails. In my cluster setup I have DHCP with default VM Network. Here is the scenario more precisely. I have a cluster of 4 compute only nodes. First I want to have no compute node Up. Next I want to bring up 1/2/3/4 compute node up. My second step fails most of the times. {noformat} serengeti>cluster setElasticity --mode manual --name hadoop1 --targetComputeNodeNum ... ... COMPLETED 100% cluster name: hadoop1, distro: apache, status: RUNNING GROUP NAME RUNNING NODE NUMBER I/O PRIORITY ------------------------------------------------ masterNClient 1 data 1 compute 0 serengeti>cluster setElasticity --mode manual --name hadoop1 --targetComputeNodeNum 1 STARTED 0% STARTED 40% STARTED 50% STARTED 60% FAILED 60% cluster hadoop1 setElasticity failed: task execution failed: No error message from VHM. you can get task failure details from serengeti server log at: /opt/serengeti/logs/serengeti*,/opt/serengeti/logs/ironfan* serengeti> {noformat} Here is the snippet of vim.log {noformat} 2013 Apr 22 19:43:39.19 [10-EmbeddedVHM] Processing message... 2013 Apr 22 19:43:39.20 [10-EmbeddedVHM] Progress percent = 10% msg= 2013 Apr 22 19:43:39.21 [10-EmbeddedVHM] Getting cluster inventory information... 2013 Apr 22 19:43:39.117 [10-EmbeddedVHM] Finding TT states for compute 2013 Apr 22 19:43:39.143 [10-EmbeddedVHM] Total TT VMs = 3, total powered-on TT VMs = 1, target powered-on TT VMs = 3 2013 Apr 22 19:43:39.145 [10-EmbeddedVHM] Progress percent = 30% msg= 2013 Apr 22 19:43:39.145 [10-EmbeddedVHM] Target TT VMs to enable/disable = 2 2013 Apr 22 19:43:39.146 [10-EmbeddedVHM] Progress percent = 40% msg= 2013 Apr 22 19:43:39.147 [10-HadoopAdaptor] TTs length: 2 2013 Apr 22 19:43:39.148 [10-HadoopConnection] Copying data to remote file /tmp/rlist.txt on jobtracker 2013 Apr 22 19:43:39.491 [10-HadoopConnection] Executing remote script: /tmp/recommissionTTs.sh on jobtracker 2013 Apr 22 19:43:43.669 [10-HadoopConnection] Exit status from exec is: 0 2013 Apr 22 19:43:43.669 [10-HadoopAdaptor] Successfully executed recommission script (/tmp/recommissionTTs.sh); 2013 Apr 22 19:43:43.671 [10-AbstractEDP] Progress percent = 50% msg= 2013 Apr 22 19:43:43.672 [10-AbstractEDP] Enabling VM hadoop1-compute-1 ... 2013 Apr 22 19:43:43.687 [10-AbstractEDP] Enabling VM hadoop1-compute-0 ... 2013 Apr 22 19:43:43.710 [10-AbstractEDP] Waiting for completion... 2013 Apr 22 19:43:49.279 [10-AbstractEDP] Done 2013 Apr 22 19:43:49.281 [10-AbstractEDP] Progress percent = 60% msg=verifying active task trackers 2013 Apr 22 19:43:49.281 [10-HadoopAdaptor] AffectedTTs: 2013 Apr 22 19:43:49.282 [10-HadoopAdaptor] proma-1n-dhcp82.eng.vmware.com 2013 Apr 22 19:43:49.282 [10-HadoopAdaptor] proma-1n-dhcp193.eng.vmware.com 2013 Apr 22 19:43:49.283 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:44:16.819 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:44:16.820 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:44:16.820 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (1) 2013 Apr 22 19:44:16.820 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:44:44.150 [10-HadoopConnection] Exit status from exec is: 107 … … 2013 Apr 22 19:50:20.896 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (14) 2013 Apr 22 19:50:20.896 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:50:49.315 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:50:49.316 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:50:49.316 [10-HadoopAdaptor] Target TTs not yet achieved...checking again (15) 2013 Apr 22 19:50:49.317 [10-HadoopConnection] Executing remote script: /tmp/checkTargetTTsSuccess.sh on jobtracker 2013 Apr 22 19:51:17.716 [10-HadoopConnection] Exit status from exec is: 107 2013 Apr 22 19:51:17.716 [10-HadoopAdaptor] ActiveTTs: 2013 Apr 22 19:51:17.717 [10-HadoopAdaptor] # Active TTs < Target number of TTs -- checked by validator script (/tmp/checkTargetTTsSuccess.sh); 2013 Apr 22 19:51:17.719 [10-EmbeddedVHM] Progress percent = 90% msg= 2013 Apr 22 19:51:17.719 [10-EmbeddedVHM] TaskStatus: interpretErrorCode null SUCCEEDED 2013 Apr 22 19:51:17.720 [10-EmbeddedVHM] TaskStatus: blockOnVMTaskCompletion null SUCCEEDED 2013 Apr 22 19:51:17.720 [10-EmbeddedVHM] TaskStatus: blockOnVMTaskCompletion null SUCCEEDED 2013 Apr 22 19:51:17.721 [10-EmbeddedVHM] TaskStatus: testForPowerState null SUCCEEDED 2013 Apr 22 19:51:17.721 [10-EmbeddedVHM] TaskStatus: testForPowerState null SUCCEEDED 2013 Apr 22 19:51:17.722 [10-EmbeddedVHM] TaskStatus: interpretErrorCode # Active TTs < Target number of TTs -- checked by validator script (/tmp/checkTargetTTsSuccess.sh); FAILED {noformat} Snippet of serengeti.log {noformat} 2013 Apr 22 19:48:39,523+0000 ERROR Thread-74| com.vmware.bdd.utils.RabbitMQConsumer: stop receiving messages without normal termination 2013 Apr 22 19:48:39,536+0000 ERROR pool-3-thread-16| com.vmware.bdd.command.MessageTask: No error message from VHM. 2013 Apr 22 19:48:39,539+0000 ERROR SimpleAsyncTaskExecutor-48| com.vmware.bdd.aop.logging.ExceptionHandlerAspect: Service error com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM. at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27) at com.vmware.bdd.service.job.SetManualElasticityStep.executeStep(SetManualElasticityStep.java:74) at com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48) at sun.reflect.GeneratedMethodAccessor1724.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318) at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) at org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161) at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90) at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202) at $Proxy149.execute(Unknown Source) at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386) at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130) at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264) at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76) at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367) at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214) at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143) at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250) at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195) at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135) at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61) at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60) at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144) at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124) at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135) at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293) at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120) at java.lang.Thread.run(Thread.java:636) 2013 Apr 22 19:48:39,541+0000 ERROR SimpleAsyncTaskExecutor-48| org.springframework.batch.core.step.AbstractStep: Encountered an error executing the step {noformat}

    SpringSource Issue Tracker | 4 years ago | Rajit Saha
    com.vmware.bdd.exception.TaskException: task execution failed: No error message from VHM.

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. com.vmware.bdd.exception.TaskException

      task execution failed:

      at com.vmware.bdd.exception.TaskException.EXECUTION_FAILED()
    2. com.vmware.bdd
      TrackableTasklet.execute
      1. com.vmware.bdd.exception.TaskException.EXECUTION_FAILED(TaskException.java:27)
      2. com.vmware.bdd.service.job.software.SoftwareManagementStep.executeStep(SoftwareManagementStep.java:89)
      3. com.vmware.bdd.service.job.TrackableTasklet.execute(TrackableTasklet.java:48)
      3 frames
    3. Java RT
      Method.invoke
      1. sun.reflect.GeneratedMethodAccessor1870.invoke(Unknown Source)
      2. sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      3. java.lang.reflect.Method.invoke(Method.java:616)
      3 frames
    4. Spring AOP
      JdkDynamicAopProxy.invoke
      1. org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:318)
      2. org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
      3. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
      4. org.springframework.aop.aspectj.AspectJAfterThrowingAdvice.invoke(AspectJAfterThrowingAdvice.java:55)
      5. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:161)
      6. org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:90)
      7. org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
      8. org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:202)
      8 frames
    5. Unknown
      $Proxy160.execute
      1. $Proxy160.execute(Unknown Source)
      1 frame
    6. Spring Batch Core
      TaskletStep$ChunkTransactionCallback.doInTransaction
      1. org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
      1 frame
    7. Spring Tx
      TransactionTemplate.execute
      1. org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
      1 frame
    8. Spring Batch Core
      StepContextRepeatCallback.doInIteration
      1. org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
      2. org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
      2 frames
    9. Spring Batch Infrastructure
      RepeatTemplate.iterate
      1. org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
      2. org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
      3. org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
      3 frames
    10. Spring Batch Core
      SimpleJobLauncher$1.run
      1. org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
      2. org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
      3. org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
      4. org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
      5. org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
      6. org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
      7. org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
      8. org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
      9. org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293)
      10. org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120)
      10 frames
    11. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:636)
      1 frame