java.lang.AssertionError

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • TestNG testcase : glm_neg_testcase_137 Test results page : http://172.16.2.161:8080/view/testNG/job/h2o_master_DEV_testng_GLM_testcase/15/testngreports/h2o.testng/TestNG/glm_neg_testcase_137/ nfolds = 20 family = gaussian solver = irlsm Validate Parameters object with testcase: glm_neg_testcase_137 Create modelParameter object with testcase: glm_neg_testcase_137 Set _family: gaussian Set _standardize: Set _lambda_search: Set _nfolds: 20 Set _ignore_const_cols: Set _non_negative: Set _intercept: Create train frame: airquality_train1 Create validate frame: airquality_train1 Set train frame Set validate frame Create success modelParameter object. Build model Train model 09-21 17:13:34.957 172.16.2.171:54321 24632 FJ-0-7 INFO: Creating 20 cross-validation splits with random number seed: -5596913177457903046 09-21 17:13:34.973 172.16.2.171:54321 24632 FJ-0-7 INFO: Building cross-validation model 1 / 20. 09-21 17:13:34.974 172.16.2.171:54321 24632 FJ-1-5 INFO: Building H2O GLM model with these parameters: 09-21 17:13:34.974 172.16.2.171:54321 24632 FJ-1-5 INFO: {"_model_id":null,"_train":{"name":"model_cv_1_airquality_train1.hex_train","type":"Key"},"_valid":{"name":"model_cv_1_airquality_train1.hex_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":null,"_ignore_const_cols":false,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"Ozone","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_standardize":false,"_family":"gaussian","_link":"family_default","_solver":"IRLSM","_tweedie_variance_power":0.0,"_tweedie_link_power":1.0,"_alpha":null,"_lambda":null,"_prior":-1.0,"_lambda_search":false,"_nlambdas":100,"_non_negative":false,"_exactLambdas":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":false,"_beta_epsilon":1.0E-5,"_objective_epsilon":1.0E-5,"_gradient_epsilon":1.0E-4,"_beta_constraints":null,"_max_active_predictors":-1} java.lang.AssertionError: wrong priority for task GLMSingleLambdaTsk, expected 0, but got 1 at water.H2O$H2OCountedCompleter.compute(H2O.java:994) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:914) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 09-21 17:13:34.977 172.16.2.171:54321 24632 FJ-1-5 INFO: GLM[dest=model_cv_1, iteration=0, lambda = 1877.9]: All 5 coefficients are active likelihood = 100648.0 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 WARN: ADMM solver reached maximum number of iterations (10000) 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 WARN: ADMM solver finished with gerr = 8449.675328571428 > eps = 1.0E-4 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: iteration computed in 0 + 3 ms 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: converged (reached a fixed point with ~ 1e-2147483648 precision), got 0 nzs 09-21 17:13:34.983 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: hold-out set validation = mse = 109.0, explained_dev = 0.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Solution at lambda = 1877.9142857142856 has 0 nonzeros, gradient err = 8449.675328571428 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Model Metrics Type: RegressionGLM 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Description: N/A 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: model id: model_cv_1 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: frame id: model_cv_1_airquality_train1.hex_train 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: MSE: 2875.6572 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: R^2: -1.8281012 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: mean residual deviance: 2875.6572 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null DOF: 70.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual DOF: 70.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null deviance: 201296.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual deviance: 201296.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: AIC: 758.134 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Model Metrics Type: RegressionGLM 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Description: N/A 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: model id: model_cv_1 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: frame id: model_cv_1_airquality_train1.hex_valid 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: MSE: 109.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: R^2: -11.111111 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: mean residual deviance: 109.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null DOF: 2.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual DOF: 2.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null deviance: 218.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual deviance: 218.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: AIC: 17.05845 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM Model (summary): 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Family Link Regularization Number of Predictors Total Number of Active Predictors Number of Iterations Training Frame 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: gaussian identity Elastic Net (alpha = 0.5, lambda = 1877.9 ) 6 1 1 model_cv_1_airquality_train1.hex_train 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Scoring History: 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: timestamp duration iteration log_likelihood objective 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: 2015-09-21 17:13:34 0.000 sec 0 100648.00000 1437.82857 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: 2015-09-21 17:13:34 0.006 sec 1 100648.00000 1437.82857 09-21 17:13:34.985 172.16.2.171:54321 24632 FJ-0-7 INFO: Building cross-validation model 2 / 20. 09-21 17:13:34.986 172.16.2.171:54321 24632 FJ-1-5 INFO: Building H2O GLM model with these parameters: 09-21 17:13:34.986 172.16.2.171:54321 24632 FJ-1-5 INFO: {"_model_id":null,"_train":{"name":"model_cv_2_airquality_train1.hex_train","type":"Key"},"_valid":{"name":"model_cv_2_airquality_train1.hex_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":null,"_ignore_const_cols":false,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"Ozone","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_standardize":false,"_family":"gaussian","_link":"family_default","_solver":"IRLSM","_tweedie_variance_power":0.0,"_tweedie_link_power":1.0,"_alpha":null,"_lambda":null,"_prior":-1.0,"_lambda_search":false,"_nlambdas":100,"_non_negative":false,"_exactLambdas":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":false,"_beta_epsilon":1.0E-5,"_objective_epsilon":1.0E-5,"_gradient_epsilon":1.0E-4,"_beta_constraints":null,"_max_active_predictors":-1} java.lang.AssertionError: wrong priority for task GLMSingleLambdaTsk, expected 0, but got 1 at water.H2O$H2OCountedCompleter.compute(H2O.java:994) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:914) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 09-21 17:13:34.988 172.16.2.171:54321 24632 FJ-1-5 INFO: GLM[dest=model_cv_2, iteration=0, lambda = 1760.6]: All 5 coefficients are active likelihood = 90335.0
    via by Neeraja Madabhushi,
  • TestNG testcase : glm_neg_testcase_137 Test results page : http://172.16.2.161:8080/view/testNG/job/h2o_master_DEV_testng_GLM_testcase/15/testngreports/h2o.testng/TestNG/glm_neg_testcase_137/ nfolds = 20 family = gaussian solver = irlsm Validate Parameters object with testcase: glm_neg_testcase_137 Create modelParameter object with testcase: glm_neg_testcase_137 Set _family: gaussian Set _standardize: Set _lambda_search: Set _nfolds: 20 Set _ignore_const_cols: Set _non_negative: Set _intercept: Create train frame: airquality_train1 Create validate frame: airquality_train1 Set train frame Set validate frame Create success modelParameter object. Build model Train model 09-21 17:13:34.957 172.16.2.171:54321 24632 FJ-0-7 INFO: Creating 20 cross-validation splits with random number seed: -5596913177457903046 09-21 17:13:34.973 172.16.2.171:54321 24632 FJ-0-7 INFO: Building cross-validation model 1 / 20. 09-21 17:13:34.974 172.16.2.171:54321 24632 FJ-1-5 INFO: Building H2O GLM model with these parameters: 09-21 17:13:34.974 172.16.2.171:54321 24632 FJ-1-5 INFO: {"_model_id":null,"_train":{"name":"model_cv_1_airquality_train1.hex_train","type":"Key"},"_valid":{"name":"model_cv_1_airquality_train1.hex_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":null,"_ignore_const_cols":false,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"Ozone","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_standardize":false,"_family":"gaussian","_link":"family_default","_solver":"IRLSM","_tweedie_variance_power":0.0,"_tweedie_link_power":1.0,"_alpha":null,"_lambda":null,"_prior":-1.0,"_lambda_search":false,"_nlambdas":100,"_non_negative":false,"_exactLambdas":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":false,"_beta_epsilon":1.0E-5,"_objective_epsilon":1.0E-5,"_gradient_epsilon":1.0E-4,"_beta_constraints":null,"_max_active_predictors":-1} java.lang.AssertionError: wrong priority for task GLMSingleLambdaTsk, expected 0, but got 1 at water.H2O$H2OCountedCompleter.compute(H2O.java:994) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:914) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 09-21 17:13:34.977 172.16.2.171:54321 24632 FJ-1-5 INFO: GLM[dest=model_cv_1, iteration=0, lambda = 1877.9]: All 5 coefficients are active likelihood = 100648.0 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 WARN: ADMM solver reached maximum number of iterations (10000) 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 WARN: ADMM solver finished with gerr = 8449.675328571428 > eps = 1.0E-4 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: iteration computed in 0 + 3 ms 09-21 17:13:34.981 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: converged (reached a fixed point with ~ 1e-2147483648 precision), got 0 nzs 09-21 17:13:34.983 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM[dest=model_cv_1, iteration=1, lambda = 1877.9]: hold-out set validation = mse = 109.0, explained_dev = 0.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Solution at lambda = 1877.9142857142856 has 0 nonzeros, gradient err = 8449.675328571428 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Model Metrics Type: RegressionGLM 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Description: N/A 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: model id: model_cv_1 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: frame id: model_cv_1_airquality_train1.hex_train 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: MSE: 2875.6572 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: R^2: -1.8281012 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: mean residual deviance: 2875.6572 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null DOF: 70.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual DOF: 70.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null deviance: 201296.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual deviance: 201296.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: AIC: 758.134 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Model Metrics Type: RegressionGLM 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Description: N/A 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: model id: model_cv_1 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: frame id: model_cv_1_airquality_train1.hex_valid 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: MSE: 109.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: R^2: -11.111111 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: mean residual deviance: 109.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null DOF: 2.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual DOF: 2.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: null deviance: 218.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: residual deviance: 218.0 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: AIC: 17.05845 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: GLM Model (summary): 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Family Link Regularization Number of Predictors Total Number of Active Predictors Number of Iterations Training Frame 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: gaussian identity Elastic Net (alpha = 0.5, lambda = 1877.9 ) 6 1 1 model_cv_1_airquality_train1.hex_train 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: Scoring History: 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: timestamp duration iteration log_likelihood objective 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: 2015-09-21 17:13:34 0.000 sec 0 100648.00000 1437.82857 09-21 17:13:34.984 172.16.2.171:54321 24632 FJ-0-13 INFO: 2015-09-21 17:13:34 0.006 sec 1 100648.00000 1437.82857 09-21 17:13:34.985 172.16.2.171:54321 24632 FJ-0-7 INFO: Building cross-validation model 2 / 20. 09-21 17:13:34.986 172.16.2.171:54321 24632 FJ-1-5 INFO: Building H2O GLM model with these parameters: 09-21 17:13:34.986 172.16.2.171:54321 24632 FJ-1-5 INFO: {"_model_id":null,"_train":{"name":"model_cv_2_airquality_train1.hex_train","type":"Key"},"_valid":{"name":"model_cv_2_airquality_train1.hex_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":null,"_ignore_const_cols":false,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"Ozone","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_standardize":false,"_family":"gaussian","_link":"family_default","_solver":"IRLSM","_tweedie_variance_power":0.0,"_tweedie_link_power":1.0,"_alpha":null,"_lambda":null,"_prior":-1.0,"_lambda_search":false,"_nlambdas":100,"_non_negative":false,"_exactLambdas":false,"_lambda_min_ratio":-1.0,"_use_all_factor_levels":false,"_max_iterations":-1,"_intercept":false,"_beta_epsilon":1.0E-5,"_objective_epsilon":1.0E-5,"_gradient_epsilon":1.0E-4,"_beta_constraints":null,"_max_active_predictors":-1} java.lang.AssertionError: wrong priority for task GLMSingleLambdaTsk, expected 0, but got 1 at water.H2O$H2OCountedCompleter.compute(H2O.java:994) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:914) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 09-21 17:13:34.988 172.16.2.171:54321 24632 FJ-1-5 INFO: GLM[dest=model_cv_2, iteration=0, lambda = 1760.6]: All 5 coefficients are active likelihood = 90335.0
    via by Neeraja Madabhushi,
  • thought I'd try some multi-machine I did a git clone on mr-0xd10 and built, so it's head of master can run this from any machine as it copies the jars to the machines (mr-0xd2 thru mr-0xd10) (one warning, since I use h2o.py, have to uninstall any h2o python package you installed. I probably need to rename my h2o.py) using airlines_all from the usual /home/0xdiag/datasets on each machine seems to past the training...the progress advances to 1.0 while polling I did it twice, failed both times The last h2o request is ModelMetrics (it finished training, then did Models.json, then Frames.json, then ModelMetrics.json) 2015-02-25 01:37:53.805546 -- Start http://172.16.2.189:54321/3/ModelMetrics.json/models/GBMModelKey/frames/airlines_all.hex # None; not sure if it does the same thing with fewer machines. cd h2o-dev/py2/testdir_single_jvm python test_GBM_airlines.py -cj ../testdir_hosts/pytest_config-182-190.json ====================================================================== ERROR: test_GBM_airlines (__main__.Basic) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_GBM_airlines.py", line 8, in tearDown h2o.check_sandbox_for_errors() File "../h2o_test.py", line 254, in check_sandbox_for_errors python_test_name=python_test_name) File "../h2o_sandbox.py", line 289, in check_sandbox_for_errors raise Exception(errorMessage) Exception: check_sandbox_for_errors: Errors in sandbox stdout or stderr (or R stdout/stderr). Could have occurred at any prior time water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 at water.RPC.get(RPC.java:252) at water.TaskGetKey.get(TaskGetKey.java:28) 02-25 01:29:55.792 172.16.2.186:54321 27724 # Session WARN: Caught exception: water.DException$DistributedException: from /172.16.2.186:54321; by class water.KeySnapshot$GlobalUKeySetTask; class water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122; Stacktrace: [water.MRTask.getResult(MRTask.java:265), water.MRTask.doAll(MRTask.java:295), water.MRTask.doAllNodes(MRTask.java:287), water.KeySnapshot.globalSnapshot(KeySnapshot.java:234), water.KeySnapshot.globalSnapshot(KeySnapshot.java:221), water.api.ModelMetricsHandler$ModelMetricsList.fetch(ModelMetricsHandler.java:22), water.api.ModelMetricsHandler.fetch(ModelMetricsHandler.java:142), water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:155), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:606), water.api.Handler.handle(Handler.java:57), water.api.RequestServer.handle(RequestServer.java:602), water.api.RequestServer.serve(RequestServer.java:560), water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:433), java.lang.Thread.run(Thread.java:745)] at water.DKV.get(DKV.java:210) at water.DKV.get(DKV.java:168) at water.Key.get(Key.java:84) at water.fvec.Frame.vecs_impl(Frame.java:246) at water.fvec.Frame.vecs(Frame.java:232) at water.fvec.Frame.anyVec(Frame.java:208) at water.KeySnapshot$KeyInfo.<init>(KeySnapshot.java:52) at water.KeySnapshot.localSnapshot(KeySnapshot.java:212) at water.KeySnapshot$GlobalUKeySetTask.setupLocal(KeySnapshot.java:249) at water.MRTask.setupLocal0(MRTask.java:339) at water.MRTask.dinvoke(MRTask.java:282) at water.RPC$RPCCall.compute2(RPC.java:333) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) java.lang.AssertionError at water.AutoBuffer.<init>(AutoBuffer.java:132) at water.RPC.response(RPC.java:572) at water.UDPAck.call(UDPAck.java:17) at water.FJPacket.compute2(FJPacket.java:21) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) ----------------------------------------------------------------------
    via by Kevin Normoyle,
  • thought I'd try some multi-machine I did a git clone on mr-0xd10 and built, so it's head of master can run this from any machine as it copies the jars to the machines (mr-0xd2 thru mr-0xd10) (one warning, since I use h2o.py, have to uninstall any h2o python package you installed. I probably need to rename my h2o.py) using airlines_all from the usual /home/0xdiag/datasets on each machine seems to past the training...the progress advances to 1.0 while polling I did it twice, failed both times The last h2o request is ModelMetrics (it finished training, then did Models.json, then Frames.json, then ModelMetrics.json) 2015-02-25 01:37:53.805546 -- Start http://172.16.2.189:54321/3/ModelMetrics.json/models/GBMModelKey/frames/airlines_all.hex # None; not sure if it does the same thing with fewer machines. cd h2o-dev/py2/testdir_single_jvm python test_GBM_airlines.py -cj ../testdir_hosts/pytest_config-182-190.json ====================================================================== ERROR: test_GBM_airlines (__main__.Basic) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_GBM_airlines.py", line 8, in tearDown h2o.check_sandbox_for_errors() File "../h2o_test.py", line 254, in check_sandbox_for_errors python_test_name=python_test_name) File "../h2o_sandbox.py", line 289, in check_sandbox_for_errors raise Exception(errorMessage) Exception: check_sandbox_for_errors: Errors in sandbox stdout or stderr (or R stdout/stderr). Could have occurred at any prior time water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122 at water.RPC.get(RPC.java:252) at water.TaskGetKey.get(TaskGetKey.java:28) 02-25 01:29:55.792 172.16.2.186:54321 27724 # Session WARN: Caught exception: water.DException$DistributedException: from /172.16.2.186:54321; by class water.KeySnapshot$GlobalUKeySetTask; class water.DException$DistributedException: from /172.16.2.187:54321; by class water.KeySnapshot$GlobalUKeySetTask; class java.lang.AssertionError: *** Attempting to block on task (class water.TaskGetKey) with equal or lower priority. Can lead to deadlock! 122 <= 122; Stacktrace: [water.MRTask.getResult(MRTask.java:265), water.MRTask.doAll(MRTask.java:295), water.MRTask.doAllNodes(MRTask.java:287), water.KeySnapshot.globalSnapshot(KeySnapshot.java:234), water.KeySnapshot.globalSnapshot(KeySnapshot.java:221), water.api.ModelMetricsHandler$ModelMetricsList.fetch(ModelMetricsHandler.java:22), water.api.ModelMetricsHandler.fetch(ModelMetricsHandler.java:142), water.api.ModelMetricsHandler.score(ModelMetricsHandler.java:155), sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method), sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:606), water.api.Handler.handle(Handler.java:57), water.api.RequestServer.handle(RequestServer.java:602), water.api.RequestServer.serve(RequestServer.java:560), water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:433), java.lang.Thread.run(Thread.java:745)] at water.DKV.get(DKV.java:210) at water.DKV.get(DKV.java:168) at water.Key.get(Key.java:84) at water.fvec.Frame.vecs_impl(Frame.java:246) at water.fvec.Frame.vecs(Frame.java:232) at water.fvec.Frame.anyVec(Frame.java:208) at water.KeySnapshot$KeyInfo.<init>(KeySnapshot.java:52) at water.KeySnapshot.localSnapshot(KeySnapshot.java:212) at water.KeySnapshot$GlobalUKeySetTask.setupLocal(KeySnapshot.java:249) at water.MRTask.setupLocal0(MRTask.java:339) at water.MRTask.dinvoke(MRTask.java:282) at water.RPC$RPCCall.compute2(RPC.java:333) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) java.lang.AssertionError at water.AutoBuffer.<init>(AutoBuffer.java:132) at water.RPC.response(RPC.java:572) at water.UDPAck.call(UDPAck.java:17) at water.FJPacket.compute2(FJPacket.java:21) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) ----------------------------------------------------------------------
    via by Kevin Normoyle,
  • This is kind of fluky. Never seen it happen before. Filing a Jira just to keep track of it to see if it happens again. mbp2:h2o tomk$ java -ea -Xmx5g -jar target/h2o.jar 11:32:43.545 main INFO WATER: ----- H2O started ----- 11:32:43.546 main INFO WATER: Build git branch: master 11:32:43.546 main INFO WATER: Build git hash: 469a0537a43e82974ef5f95f7082ddef1b811502 11:32:43.546 main INFO WATER: Build git describe: nn-2-4810-g469a053 11:32:43.546 main INFO WATER: Build project version: 2.3.0.99999 11:32:43.546 main INFO WATER: Built by: 'tomk' 11:32:43.547 main INFO WATER: Built on: 'Thu Mar 20 11:26:00 PDT 2014' 11:32:43.547 main INFO WATER: Java availableProcessors: 8 11:32:43.549 main INFO WATER: Java heap totalMemory: 0.08 gb 11:32:43.549 main INFO WATER: Java heap maxMemory: 4.98 gb 11:32:43.549 main INFO WATER: Java version: Java 1.6.0_65 (from Apple Inc.) 11:32:43.550 main INFO WATER: OS version: Mac OS X 10.8.5 (x86_64) 11:32:43.550 main INFO WATER: ICE root: '/tmp/h2o-tomk' 11:32:43.553 main INFO WATER: Possible IP Address: en0 (en0), fe80:0:0:0:2acf:e9ff:fe1c:ccf%5 11:32:43.553 main INFO WATER: Possible IP Address: en0 (en0), 192.168.1.37 11:32:43.553 main INFO WATER: Possible IP Address: lo0 (lo0), 0:0:0:0:0:0:0:1 11:32:43.554 main INFO WATER: Possible IP Address: lo0 (lo0), fe80:0:0:0:0:0:0:1%1 11:32:43.554 main INFO WATER: Possible IP Address: lo0 (lo0), 127.0.0.1 11:32:43.591 main INFO WATER: Internal communication uses port: 54322 + Listening for HTTP and REST traffic on http://192.168.1.37:54321/ 11:32:43.620 main INFO WATER: H2O cloud name: 'tomk' 11:32:43.620 main INFO WATER: (v2.3.0.99999) 'tomk' on /192.168.1.37:54321, discovery address /225.54.105.89:57654 11:32:43.620 main INFO WATER: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555): + 1. Open a terminal and run 'ssh -L 55555:localhost:54321 tomk@192.168.1.37' + 2. Point your browser to http://localhost:55555 11:32:43.622 main INFO WATER: Cloud of size 1 formed [/192.168.1.37:54321] 11:32:43.622 main INFO WATER: Log dir: '/tmp/h2o-tomk/h2ologs' java.lang.AssertionError: Read to much data from a byte[] backed buffer, AB=[AB read first /192.168.1.161:54321 null 0 <= 110 <= 110 <= 1492] 11:32:43.796 FJ-8-1 INFO WATER: at water.AutoBuffer.getImpl(AutoBuffer.java:450) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getSp(AutoBuffer.java:437) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getA1(AutoBuffer.java:824) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getA1(AutoBuffer.java:815) 11:32:43.797 FJ-8-1 INFO WATER: at water.HeartBeat.read(HeartBeat.java) 11:32:43.798 FJ-8-1 INFO WATER: at water.UDPHeartbeat.call(UDPHeartbeat.java:14) 11:32:43.798 FJ-8-1 INFO WATER: at water.FJPacket.compute2(FJPacket.java:20) 11:32:43.798 FJ-8-1 INFO WATER: at water.H2O$H2OCountedCompleter.compute(H2O.java:710) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) 11:32:43.799 FJ-8-1 INFO WATER: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) 11:32:43.799 FJ-8-1 INFO WATER: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
    via by Tom Kraljevic,
  • From [~kbn]: nishant got this GBM "Trying to unlock null" assertion during pyunit_citi_bike_large.py. Seems like it's a delete of some key related to GBM. test seemed to keep going though. from http://mr-0xa1:8080/view/nishant/job/nishant_code_coverage/41/artifact/h2o-py/tests/results/java_0_0.out.txt He later got other assertions that have appeared elsewhere with the pyunit_citi_bike_large.py here's the one I hadn't seen before: 06-30 18:01:35.534 172.17.2.154:56789 3951 # Session INFO: Method: GET , URI: /3/Models/GBMModel__8c033c5ded17b06a9f57036a08014faa, route: /3/Models/(?<modelid>.*), parms: {model_id=GBMModel__8c033c5ded17b06a9f57036a08014faa} 06-30 18:01:35.541 172.17.2.154:56789 3951 # Session INFO: Method: POST , URI: /99/Rapids, route: /99/Rapids, parms: {ast=(removeframe 'pyfdf9ce18-09a0-4dff-98ea-353bf6c7e119')} 06-30 18:01:54.250 172.17.2.154:56789 3951 # Session INFO: Method: POST , URI: /99/Rapids, route: /99/Rapids, parms: {ast=(removeframe 'py3b2531b3-1876-4756-8e0b-454f46b87fb3')} 06-30 18:01:54.286 172.17.2.154:56789 3951 # Session INFO: Method: DELETE, URI: /3/DKV/GBMModel__ae8e7b4651349614921bec0629064c9b, route: /3/DKV/(?<key>.*), parms: {key=GBMModel__ae8e7b4651349614921bec0629064c9b} barrier onExCompletion for hex.tree.gbm.GBM$GBMDriver@6c5d2aea water.DException$DistributedException: from /172.17.2.154:56793; by class water.Lockable$Unlock; class java.lang.AssertionError: Trying to unlock null! at water.Lockable$Unlock.atomic(Lockable.java:180) at water.Lockable$Unlock.atomic(Lockable.java:176) at water.TAtomic.atomic(TAtomic.java:17) at water.Atomic.compute2(Atomic.java:55) at water.H2O$H2OCountedCompleter.compute(H2O.java:698) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 06-30 18:01:54.570 172.17.2.154:56789 3951 # Session INFO: Method: POST , URI: /99/Rapids, route: /99/Rapids, parms: {ast=(, (gput py25ead33e-ba45-4f4a-ae45-04ed5f7dbab3 (cbind %FALSE 'py99fef45f-7d80-41b3-88df-e65c578e2677' 'py1844e0f1-6095-4b69-9e03-8c3b3a9bd336' 'py82e77fc3-c2d4-4093-b77f-c983ced3e0c4' 'py554158f7-9142-469d-bc9b-2d1479b2b118' 'pyce700595-02fb-455b-a13d-6d4ea0f978f3' 'pyc37f39b1-ba43-4eed-bb0d-9f9e6bc19032' 'pyfa208086-445e-4507-bff5-b42a50f6b1ed' 'pybf318908-1f6b-4625-9a3c-2cd1a24affd9' 'pyf97296f1-6bb5-485f-b77b-ad224f498648' 'py74884c42-f4f2-443e-b015-9c736069f699')) (colnames= %py25ead33e-ba45-4f4a-ae45-04ed5f7dbab3 (: #0 #9) (slist "Days" "start station name" "Month" "DayOfWeek" "Humidity Fraction" "Rain (mm)" "Temperature (C)" "WC1" "Dew Point (C)" "bikes")}
    via by Raymond Peck,
  • This is kind of fluky. Never seen it happen before. Filing a Jira just to keep track of it to see if it happens again. mbp2:h2o tomk$ java -ea -Xmx5g -jar target/h2o.jar 11:32:43.545 main INFO WATER: ----- H2O started ----- 11:32:43.546 main INFO WATER: Build git branch: master 11:32:43.546 main INFO WATER: Build git hash: 469a0537a43e82974ef5f95f7082ddef1b811502 11:32:43.546 main INFO WATER: Build git describe: nn-2-4810-g469a053 11:32:43.546 main INFO WATER: Build project version: 2.3.0.99999 11:32:43.546 main INFO WATER: Built by: 'tomk' 11:32:43.547 main INFO WATER: Built on: 'Thu Mar 20 11:26:00 PDT 2014' 11:32:43.547 main INFO WATER: Java availableProcessors: 8 11:32:43.549 main INFO WATER: Java heap totalMemory: 0.08 gb 11:32:43.549 main INFO WATER: Java heap maxMemory: 4.98 gb 11:32:43.549 main INFO WATER: Java version: Java 1.6.0_65 (from Apple Inc.) 11:32:43.550 main INFO WATER: OS version: Mac OS X 10.8.5 (x86_64) 11:32:43.550 main INFO WATER: ICE root: '/tmp/h2o-tomk' 11:32:43.553 main INFO WATER: Possible IP Address: en0 (en0), fe80:0:0:0:2acf:e9ff:fe1c:ccf%5 11:32:43.553 main INFO WATER: Possible IP Address: en0 (en0), 192.168.1.37 11:32:43.553 main INFO WATER: Possible IP Address: lo0 (lo0), 0:0:0:0:0:0:0:1 11:32:43.554 main INFO WATER: Possible IP Address: lo0 (lo0), fe80:0:0:0:0:0:0:1%1 11:32:43.554 main INFO WATER: Possible IP Address: lo0 (lo0), 127.0.0.1 11:32:43.591 main INFO WATER: Internal communication uses port: 54322 + Listening for HTTP and REST traffic on http://192.168.1.37:54321/ 11:32:43.620 main INFO WATER: H2O cloud name: 'tomk' 11:32:43.620 main INFO WATER: (v2.3.0.99999) 'tomk' on /192.168.1.37:54321, discovery address /225.54.105.89:57654 11:32:43.620 main INFO WATER: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555): + 1. Open a terminal and run 'ssh -L 55555:localhost:54321 tomk@192.168.1.37' + 2. Point your browser to http://localhost:55555 11:32:43.622 main INFO WATER: Cloud of size 1 formed [/192.168.1.37:54321] 11:32:43.622 main INFO WATER: Log dir: '/tmp/h2o-tomk/h2ologs' java.lang.AssertionError: Read to much data from a byte[] backed buffer, AB=[AB read first /192.168.1.161:54321 null 0 <= 110 <= 110 <= 1492] 11:32:43.796 FJ-8-1 INFO WATER: at water.AutoBuffer.getImpl(AutoBuffer.java:450) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getSp(AutoBuffer.java:437) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getA1(AutoBuffer.java:824) 11:32:43.797 FJ-8-1 INFO WATER: at water.AutoBuffer.getA1(AutoBuffer.java:815) 11:32:43.797 FJ-8-1 INFO WATER: at water.HeartBeat.read(HeartBeat.java) 11:32:43.798 FJ-8-1 INFO WATER: at water.UDPHeartbeat.call(UDPHeartbeat.java:14) 11:32:43.798 FJ-8-1 INFO WATER: at water.FJPacket.compute2(FJPacket.java:20) 11:32:43.798 FJ-8-1 INFO WATER: at water.H2O$H2OCountedCompleter.compute(H2O.java:710) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) 11:32:43.798 FJ-8-1 INFO WATER: at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) 11:32:43.799 FJ-8-1 INFO WATER: at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) 11:32:43.799 FJ-8-1 INFO WATER: at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
    via by Tom Kraljevic,
    • java.lang.AssertionError: wrong priority for task GLMSingleLambdaTsk, expected 0, but got 1 at water.H2O$H2OCountedCompleter.compute(H2O.java:994) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:914) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:979) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
    No Bugmate found.