java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals

JIRA | SriSatish Ambati | 2 years ago
tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar

    JIRA | 2 years ago | SriSatish Ambati
    java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals
  2. 0

    On Monday, 22 December 2014 12:49:24 UTC+5:30, Kumar wrote: My Environment - RStudio connecting to 15 node h2o cluster on hadoop. I have two data sets training(pml.training.hex) and test((pml.testing.hex) Both have around 160 columns. Training has 20,000 rows. Test has 20 rows. I tried running this with test data set, pml.testing.pca.model = h2o.prcomp(pml.testing.hex) It goes fine and give the model with 70 principal components. Next I tried running for training data set pml.training.pca.model = h2o.prcomp(pml.training.hex) It gives error in log as below. 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session INFO WATER: Running PCA on dataset with 6127 expanded columns in Gram matrix 22-Dec 05:00:08.887 10.65.252.156:54321 11357 # Session ERRR WATER: + java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals + at hex.pca.PCA.init(PCA.java:102) + at water.Job.fork(Job.java:327) + at water.Job.serve(Job.java:311) + at water.api.Request.serveGrid(Request.java:165) + at water.Request2.superServeGrid(Request2.java:490) + at water.Request2.serveGrid(Request2.java:411) + at water.api.Request.serve(Request.java:142) + at water.api.RequestServer.serve(RequestServer.java:507) + at water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425) + at java.lang.Thread.run(Thread.java:744) I changed the line to pml.training.pcamodel = h2o.prcomp(pml.training.hex, tol = 0.2, cols = "", max_pc = 1000, key = "", standardize = TRUE,retx = FALSE) It still gives exactly same error. It seems mac_pc value is still getting used as of the first job request . But why ? Also what would you suggest to resolve it other than re-starting the cluster ---------- Forwarded message ---------- From: Kumar <sureshemailid@gmail.com> Date: Mon, Dec 22, 2014 at 12:19 AM Subject: [h2ostream] Re: h2o.prcomp() issue To: h2ostream@googlegroups.com OK...I re-started the cloud and this time after all the parsing etc...I tried to execute pml.training.pcamodel = h2o.prcomp(pml.training.hex, max_pc = 1000) This time again I got the same error. Not sure what I am doing wrong. Need help. Attached is the log. thx -Kumar

    JIRA | 2 years ago | SriSatish Ambati
    java.lang.IllegalArgumentException: Cannot process more than 5000 columns, taking into account expanded categoricals

    Root Cause Analysis

    1. java.lang.IllegalArgumentException

      Cannot process more than 5000 columns, taking into account expanded categoricals

      at hex.pca.PCA.init()
    2. hex.pca
      PCA.init
      1. hex.pca.PCA.init(PCA.java:102)
      1 frame
    3. water
      Job.serve
      1. water.Job.fork(Job.java:327)
      2. water.Job.serve(Job.java:311)
      2 frames
    4. water.api
      Request.serveGrid
      1. water.api.Request.serveGrid(Request.java:165)
      1 frame
    5. water
      Request2.serveGrid
      1. water.Request2.superServeGrid(Request2.java:490)
      2. water.Request2.serveGrid(Request2.java:411)
      2 frames
    6. water.api
      RequestServer.serve
      1. water.api.Request.serve(Request.java:142)
      2. water.api.RequestServer.serve(RequestServer.java:507)
      2 frames
    7. water
      NanoHTTPD$HTTPSession.run
      1. water.NanoHTTPD$HTTPSession.run(NanoHTTPD.java:425)
      1 frame
    8. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:744)
      1 frame