java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals

JIRA | SriSatish Ambati | 2 years ago
  1. 0

    On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar

    JIRA | 2 years ago | SriSatish Ambati
    java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals
  2. 0

    On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar

    JIRA | 2 years ago | SriSatish Ambati
    java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals
  3. 0

    [jdk1.4.2] java.lang.IllegalArgumentException during classpath scanning

    Netbeans Bugzilla | 1 decade ago | lmartinek
    java.lang.IllegalArgumentException: Cannot process more work than scheduled.
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    ArrayIndexOutOfBoundsException while opening NetBeans

    Netbeans Bugzilla | 1 decade ago | beal91
    java.lang.IllegalArgumentException: Cannot process more work than scheduled.
  6. 0

    Inscrutable error when writing large thrift struct

    GitHub | 3 years ago | colinmarc
    java.lang.IllegalArgumentException: You cannot call toBytes() more than once without calling reset()

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. java.lang.IllegalArgumentException

      Cannot process more than 5000 columns, taking into account expanded categoricals

      at hex.pca.PCA.init()
    2. hex.pca
      PCA.init
      1. hex.pca.PCA.init(PCA.java:102)
      1 frame
    3. water
      Job.serve
      1. water.Job.fork(Job.java:327)
      2. water.Job.serve(Job.java:311)
      2 frames
    4. water.api
      Request.serveGrid
      1. water.api.Request.serveGrid(Request.java:165)
      1 frame
    5. water
      Request2.serveGrid
      1. water.Request2.superServeGrid(Request2.java:490)
      2. water.Request2.serveGrid(Request2.java:411)
      2 frames
    6. water.api
      RequestServer.serve
      1. water.api.Request.serve(Request.java:142)
      2. water.api.RequestServer.serve(RequestServer.java:507)
      2 frames
    7. water
      NanoHTTPD$HTTPSession.run
      1. water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425)
      1 frame
    8. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:744)
      1 frame