java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar
    via by SriSatish Ambati,
  • On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar
    via by SriSatish Ambati,
  • h2o.prcomp() issue
    via by Kumar,
    • java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals at hex.pca.PCA.init(PCA.java:102) at water.Job.fork(Job.java:327) at water.Job.serve(Job.java:311) at water.api.Request.serveGrid(Request.java:165) at water.Request2.superServeGrid(Request2.java:490) at water.Request2.serveGrid(Request2.java:411) at water.api.Request.serve(Request.java:142) at water.api.RequestServer.serve(RequestServer.java:507) at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) at java.lang.Thread.run(Thread.java:744)
    No Bugmate found.