py4j.Py4JException: Method __getnewargs__([]) does not exist

newtips.co | 4 months ago
  1. 0

    Can I use Spark DataFrame inside regular Spark map operation?

    Stack Overflow | 1 year ago | Igor Sokolov
    py4j.Py4JException: Method __getnewargs__([]) does not exist
  2. 0

    Count number of non-NaN entries in each column of Spark dataframe with Pyspark

    Stack Overflow | 1 year ago | RKD314
    py4j.Py4JException: Method __getnewargs__([]) does not exist
  3. 0

    Spark __getnewargs__ error

    Stack Overflow | 11 months ago | Paul
    py4j.Py4JException: Method __getnewargs__([]) does not exist
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    spark: use lookup in map

    Stack Overflow | 8 months ago | user3827333
    py4j.Py4JException: Method __getnewargs__([]) does not exist
  6. 0

    Spark does not support nested RDDs or performing Spark actions inside of transformations; this usually leads to NullPointerExceptions (see SPARK-718 as one example). The confusing NPE is one of the most common sources of Spark questions on StackOverflow: - https://stackoverflow.com/questions/13770218/call-of-distinct-and-map-together-throws-npe-in-spark-library/14130534#14130534 - https://stackoverflow.com/questions/23793117/nullpointerexception-in-scala-spark-appears-to-be-caused-be-collection-type/23793399#23793399 - https://stackoverflow.com/questions/25997558/graphx-ive-got-nullpointerexception-inside-mapvertices/26003674#26003674 (those are just a sample of the ones that I've answered personally; there are many others). I think we can detect these errors by adding logic to {{RDD}} to check whether {{sc}} is null (e.g. turn {{sc}} into a getter function); we can use this to add a better error message. In PySpark, these errors manifest themselves slightly differently. Attempting to nest RDDs or perform actions inside of transformations results in pickle-time errors: {code} rdd1 = sc.parallelize(range(100)) rdd2 = sc.parallelize(range(100)) rdd1.mapPartitions(lambda x: [rdd2.map(lambda x: x)]) {code} produces {code} [...] File "/Users/joshrosen/anaconda/lib/python2.7/pickle.py", line 306, in save rv = reduce(self.proto) File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 304, in get_return_value py4j.protocol.Py4JError: An error occurred while calling o21.__getnewargs__. Trace: py4j.Py4JException: Method __getnewargs__([]) does not exist at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333) at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342) at py4j.Gateway.invoke(Gateway.java:252) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) {code} We get the same error when attempting to broadcast an RDD in PySpark. For Python, improved error reporting could be as simple as overriding the {{getnewargs}} method to throw a more useful UnsupportedOperation exception with a more helpful error message. Users may also see confusing NPEs when calling methods on stopped SparkContexts, so I've added checks for that as well.

    Apache's JIRA Issue Tracker | 2 years ago | Josh Rosen
    py4j.Py4JException: Method __getnewargs__([]) does not exist

    5 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. py4j.Py4JException

      Method __getnewargs__([]) does not exist

      at py4j.reflection.ReflectionEngine.getMethod()
    2. Py4J
      GatewayConnection.run
      1. py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:333)
      2. py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:342)
      3. py4j.Gateway.invoke(Gateway.java:252)
      4. py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
      5. py4j.commands.CallCommand.execute(CallCommand.java:79)
      6. py4j.GatewayConnection.run(GatewayConnection.java:207)
      6 frames
    3. Java RT
      Thread.run
      1. java.lang.Thread.run(Thread.java:745)
      1 frame