org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 267, in dump_stream bytes = self.serializer.dumps(vs) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 415, in dumps return pickle.dumps(obj, protocol) PicklingError: Can't pickle <type 'itertools._grouper'>: attribute lookup itertools._grouper failed

Stack Overflow | JiaMing Lin | 4 months ago
  1. 0

    Using itertools.groupby in pyspark but fail

    Stack Overflow | 4 months ago | JiaMing Lin
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 267, in dump_stream bytes = self.serializer.dumps(vs) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 415, in dumps return pickle.dumps(obj, protocol) PicklingError: Can't pickle <type 'itertools._grouper'>: attribute lookup itertools._grouper failed
  2. 0

    How to join different datasets using pyspark and then call a custom function which takes a pandas dataframe to convert into xml file

    Stack Overflow | 1 year ago | Abhay Sagar
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named my_func at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
  3. 0

    pyspark: "too many values" error after repartitioning

    Stack Overflow | 1 year ago | user1836155
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "spark-1.5.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream for obj in iterator: File "spark-1.5.1-bin-hadoop2.6/python/pyspark/rdd.py", line 1703, in add_shuffle_key for k, v in iterator: ValueError: too many values to unpack
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Error in using global variables in spark project

    Stack Overflow | 8 months ago | sammy
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "utils.py", line 6, in returnIfTrue if row[1] in settings.ageList: AttributeError: 'module' object has no attribute 'ageList'
  6. 0

    Python Spark submit job on cluster issues

    Stack Overflow | 6 months ago | FLFLFLFL
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/home/ubuntu/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/home/ubuntu/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) File "/home/ubuntu/spark-1.6.1-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 653, in subimport __import__(name) File "./CDE_Spark.zip/sympathy/__init__.py", line 36, in <module> File "./CDE_Spark.zip/sympathy/types/__init__.py", line 26, in <module> File "./CDE_Spark.zip/sympathy/types/datapointer.py", line 27, in <module> File "./CDE_Spark.zip/sympathy/types/types.py", line 28, in <module> ImportError: ('No module named ply.lex', <function subimport at 0x7f540a1369b0>, ('sympathy.typeutils.adaf',))

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 267, in dump_stream bytes = self.serializer.dumps(vs) File "/opt/zeppelin-0.6.0-bin-netinst/interpreter/spark/pyspark/pyspark.zip/pyspark/serializers.py", line 415, in dumps return pickle.dumps(obj, protocol) PicklingError: Can't pickle <type 'itertools._grouper'>: attribute lookup itertools._grouper failed

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      8. org.apache.spark.scheduler.Task.run(Task.scala:89)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      9 frames
    3. Java RT
      ThreadPoolExecutor$Worker.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      2 frames