java.lang.AssertionError: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2

JIRA | Kevin Normoyle | 2 years ago
  1. 0

    The R log and the corresponding h2o java log is attached. The job is $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b The GLM ends like this 02-01 17:17:40.696 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=1, lambda = 1.0E-5]: Gram computed in 3ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.699 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=2, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.702 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=3, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.704 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=4, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.733 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=5, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.737 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=6, lambda = 1.0E-5]: converged by reaching small enough gradient, with max |subgradient| = 1.0388394057267024E-9 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: callback for task 3 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=0, lambda = 1.0E-5]: starting computation of lambda = 1.0E-5, previous lambda = 148.9480499025015 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=0, lambda = 1.0E-5]: strong rule at lambda_value=1.0E-5, got 8 active cols out of 8 total. 02-01 17:17:40.758 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=1, lambda = 1.0E-5]: Gram computed in 14ms, , step = 1.0, ADMM: 0 iterations, 1ms (1), subgrad_err=0.0 02-01 17:17:40.759 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=2, lambda = 1.0E-5]: Gram computed in 1ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.761 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: Gram computed in 1ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.763 172.16.2.178:41012 2977 FJ-0-7 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2 barrier onExCompletion for hex.glm.GLM$1@397f6d6 java.lang.AssertionError: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2 at hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:476) at hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:461) at water.H2O$H2OCallback.onCompletion(H2O.java:640) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386) at water.MRTask.compute2(MRTask.java:437) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 02-01 17:17:41.346 172.16.2.178:41012 2977 # Session INFO: Method: GET , Path: /3/Jobs/$0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b, route: /3/Jobs/(?<key>.*), parms: {key=$0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b} The R test sees the job is cancelled, but there's no info about why it got cancelled. It's because of the exception above. How come the exception isn't in the job cancel information, and then reported to the R test? It's very slow to wade thru these files since the R run.py is go/nogo info only, so exception info on jobs should be fed to R as much as possible. (in the job completion status) > doTest("GLM: Benign Data", glm2Benign) [2015-02-01 17:17:37] [INFO]: ======================== Begin Test =========================== ^M | ^M | | 0%^M | ^M |======================================================================| 100% [2015-02-01 17:17:39] [INFO]: Build the model ^M | ^M | | 0% ######## ### #### ## ## ## ## ## ## ## ## ## ## ## ###### ## ## ## ## ## ######### ## ## ## ## ## ## ## ## ## ## #### ######## [2015-02-01 17:17:41] [ERROR] : Error: Test failed: 'GLM: Benign Data' Not expected: Job key $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b failed 1: withWarnings(test(conn)) 2: withCallingHandlers(expr, warning = wHandler) 3: test(conn) 4: h2o.glm(y = Y, x = colnames(bhexFV)[X], training_frame = bhexFV, family = "binomial", n_folds = 5, alpha = 0, lambda = 1e-05) 5: .h2o.createModel(training_frame@conn, "glm", parms, dots$envir) 6: .h2o.__waitOnJob(conn, job_key) 7: stop("Job key ", job_key, " failed") 8: .handleSimpleError(function (e) { e$calls <- head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, "Job key $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b failed", quote(.h2o.__waitOnJob(conn, job_key))). SEED used: 718725153 [2015-02-01 17:17:41] [ERROR] : TEST FAILED No traceback available

    JIRA | 2 years ago | Kevin Normoyle
    java.lang.AssertionError: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2
  2. 0

    The R log and the corresponding h2o java log is attached. The job is $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b The GLM ends like this 02-01 17:17:40.696 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=1, lambda = 1.0E-5]: Gram computed in 3ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.699 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=2, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.702 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=3, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.704 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=4, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.733 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=5, lambda = 1.0E-5]: Gram computed in 2ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.737 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_3, iteration=6, lambda = 1.0E-5]: converged by reaching small enough gradient, with max |subgradient| = 1.0388394057267024E-9 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: callback for task 3 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=0, lambda = 1.0E-5]: starting computation of lambda = 1.0E-5, previous lambda = 148.9480499025015 02-01 17:17:40.743 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=0, lambda = 1.0E-5]: strong rule at lambda_value=1.0E-5, got 8 active cols out of 8 total. 02-01 17:17:40.758 172.16.2.178:41012 2977 FJ-0-5 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=1, lambda = 1.0E-5]: Gram computed in 14ms, , step = 1.0, ADMM: 0 iterations, 1ms (1), subgrad_err=0.0 02-01 17:17:40.759 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=2, lambda = 1.0E-5]: Gram computed in 1ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.761 172.16.2.178:41012 2977 FJ-0-13 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: Gram computed in 1ms, , step = 1.0, ADMM: 0 iterations, 0ms (0), subgrad_err=0.0 02-01 17:17:40.763 172.16.2.178:41012 2977 FJ-0-7 INFO: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2 barrier onExCompletion for hex.glm.GLM$1@397f6d6 java.lang.AssertionError: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2 at hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:476) at hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:461) at water.H2O$H2OCallback.onCompletion(H2O.java:640) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386) at water.MRTask.compute2(MRTask.java:437) at water.H2O$H2OCountedCompleter.compute(H2O.java:582) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) 02-01 17:17:41.346 172.16.2.178:41012 2977 # Session INFO: Method: GET , Path: /3/Jobs/$0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b, route: /3/Jobs/(?<key>.*), parms: {key=$0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b} The R test sees the job is cancelled, but there's no info about why it got cancelled. It's because of the exception above. How come the exception isn't in the job cancel information, and then reported to the R test? It's very slow to wade thru these files since the R run.py is go/nogo info only, so exception info on jobs should be fed to R as much as possible. (in the job completion status) > doTest("GLM: Benign Data", glm2Benign) [2015-02-01 17:17:37] [INFO]: ======================== Begin Test =========================== ^M | ^M | | 0%^M | ^M |======================================================================| 100% [2015-02-01 17:17:39] [INFO]: Build the model ^M | ^M | | 0% ######## ### #### ## ## ## ## ## ## ## ## ## ## ## ###### ## ## ## ## ## ######### ## ## ## ## ## ## ## ## ## ## #### ######## [2015-02-01 17:17:41] [ERROR] : Error: Test failed: 'GLM: Benign Data' Not expected: Job key $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b failed 1: withWarnings(test(conn)) 2: withCallingHandlers(expr, warning = wHandler) 3: test(conn) 4: h2o.glm(y = Y, x = colnames(bhexFV)[X], training_frame = bhexFV, family = "binomial", n_folds = 5, alpha = 0, lambda = 1e-05) 5: .h2o.createModel(training_frame@conn, "glm", parms, dots$envir) 6: .h2o.__waitOnJob(conn, job_key) 7: stop("Job key ", job_key, " failed") 8: .handleSimpleError(function (e) { e$calls <- head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, "Job key $0301ac1002b235a0ffffffff$_82080eefbf0abc6dc9890cc25b09425b failed", quote(.h2o.__waitOnJob(conn, job_key))). SEED used: 718725153 [2015-02-01 17:17:41] [ERROR] : TEST FAILED No traceback available

    JIRA | 2 years ago | Kevin Normoyle
    java.lang.AssertionError: GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2
  3. 0

    Maybe that's the first step, is to see if all NOPASS r tests have a jira I assume they don't This goes to my question of gauging doneness. i think test writers know that certain features aren't in yet, and maybe a NOPASS test exposes that, but there's not an expectation that test writers are somehow riding herd on project management..i.e. responsible for documenting all "work not yet done' I filed a jira on this I think it overlaps with an existing jira that gets the same It's reasonable to say that the JIRA next step, should be to modify the test to not be intermittent ------------------------------ It should have a ticket whether or not it's intermittent. Please fill in the test case field. On Mar 6, 2015 5:10 PM, "Kevin" <kevin@0xdata.com> wrote: there are numerous GLM tests that get these intermittent failures, that I changed to NOPASS, and they're being changed from NOPASS, with no underlying change to GLM, so they just fail again intermittently, and I change them back to NOPASS. Maybe the problem is there's no jira filed on these? If the test is intermittent, it probably needs modification to only run the case that fails, multiple times. Or with a fixed seed that matches the failure. Then it's not intermittent and can be a jira http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/artifact/h2o-r/tests/results/java_3_0.out.txt http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/console True fail list: runit_demo_exec2.R http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/artifact/h2o-r/tests/results/java_3_0.out.txt barrier onExCompletion for hex.glm.GLM$1@76a4120d java.lang.AssertionError: unexpected pending count, expected 1, got 3 at hex.glm.GLM$GLMLambdaTask$LineSearchIteration.callback(GLM.java:529) at hex.glm.GLM$GLMLambdaTask$LineSearchIteration.callback(GLM.java:524) at water.H2O$H2OCallback.onCompletion(H2O.java:641) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386) at water.MRTask.compute2(MRTask.java:439) at water.H2O$H2OCountedCompleter.compute(H2O.java:583) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

    JIRA | 2 years ago | Kevin Normoyle
    java.lang.AssertionError: unexpected pending count, expected 1, got 3
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Maybe that's the first step, is to see if all NOPASS r tests have a jira I assume they don't This goes to my question of gauging doneness. i think test writers know that certain features aren't in yet, and maybe a NOPASS test exposes that, but there's not an expectation that test writers are somehow riding herd on project management..i.e. responsible for documenting all "work not yet done' I filed a jira on this I think it overlaps with an existing jira that gets the same It's reasonable to say that the JIRA next step, should be to modify the test to not be intermittent ------------------------------ It should have a ticket whether or not it's intermittent. Please fill in the test case field. On Mar 6, 2015 5:10 PM, "Kevin" <kevin@0xdata.com> wrote: there are numerous GLM tests that get these intermittent failures, that I changed to NOPASS, and they're being changed from NOPASS, with no underlying change to GLM, so they just fail again intermittently, and I change them back to NOPASS. Maybe the problem is there's no jira filed on these? If the test is intermittent, it probably needs modification to only run the case that fails, multiple times. Or with a fixed seed that matches the failure. Then it's not intermittent and can be a jira http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/artifact/h2o-r/tests/results/java_3_0.out.txt http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/console True fail list: runit_demo_exec2.R http://172.16.2.161:8080/job/h2o_master_DEV_runit_small/7018/artifact/h2o-r/tests/results/java_3_0.out.txt barrier onExCompletion for hex.glm.GLM$1@76a4120d java.lang.AssertionError: unexpected pending count, expected 1, got 3 at hex.glm.GLM$GLMLambdaTask$LineSearchIteration.callback(GLM.java:529) at hex.glm.GLM$GLMLambdaTask$LineSearchIteration.callback(GLM.java:524) at water.H2O$H2OCallback.onCompletion(H2O.java:641) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386) at water.MRTask.compute2(MRTask.java:439) at water.H2O$H2OCountedCompleter.compute(H2O.java:583) at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

    JIRA | 2 years ago | Kevin Normoyle
    java.lang.AssertionError: unexpected pending count, expected 1, got 3

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.AssertionError

      GLM2[dest=$1f01ac1002b235a0ffffffff$GLMModel__a3419cb720a03d9fabfa7d9da30440a1_xval_4, iteration=3, lambda = 1.0E-5]: unexpected pending count, expected <= 1, got 2

      at hex.glm.GLM$GLMLambdaTask$Iteration.callback()
    2. hex.glm
      GLM$GLMLambdaTask$Iteration.callback
      1. hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:476)
      2. hex.glm.GLM$GLMLambdaTask$Iteration.callback(GLM.java:461)
      2 frames
    3. water
      H2O$H2OCallback.onCompletion
      1. water.H2O$H2OCallback.onCompletion(H2O.java:640)
      1 frame
    4. jsr166y
      CountedCompleter.tryComplete
      1. jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386)
      1 frame
    5. water
      H2O$H2OCountedCompleter.compute
      1. water.MRTask.compute2(MRTask.java:437)
      2. water.H2O$H2OCountedCompleter.compute(H2O.java:582)
      2 frames
    6. jsr166y
      ForkJoinWorkerThread.run
      1. jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
      2. jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
      3. jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
      4. jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
      5. jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
      5 frames