org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/spark/python/pyspark/rdd.py", line 1898, in <lambda> IndexError: list index out of range

Stack Overflow | user1753235 | 4 months ago
  1. 0

    What is the best way of filter a list based in other list in spark with python?

    Stack Overflow | 4 months ago | user1753235
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/spark/python/pyspark/rdd.py", line 1898, in <lambda> IndexError: list index out of range
  2. 0

    Why do I have an error of "IndexError: list index out of range" when I do TF-IDF using pyspark.ml.feature?

    Stack Overflow | 4 months ago | kiseliu
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/mypath/classfication.py", line 20, in <lambda> getData = splitData.map(lambda line: [labelMap[line[2]], list(jieba.cut(line[6]+line[13]))]) IndexError: list index out of range
  3. 0

    Support Python >= 3.3 in Dataproc (ensure spark-submit shell has access to global env vars)

    GitHub | 11 months ago | nehalecky
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1453072376140_0005/container_1453072376140_0005_01_000002/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1453072376140_0005/container_1453072376140_0005_01_000002/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1453072376140_0005/container_1453072376140_0005_01_000002/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream for obj in iterator: File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1704, in add_shuffle_key File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1453072376140_0005/container_1453072376140_0005_01_000002/pyspark.zip/pyspark/rdd.py", line 74, in portable_hash raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED") Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    GitHub comment 18#171815606

    GitHub | 11 months ago | dennishuo
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1452810606380_0004/container_1452810606380_0004_01_000002/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1452810606380_0004/container_1452810606380_0004_01_000002/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1452810606380_0004/container_1452810606380_0004_01_000002/pyspark.zip/pyspark/serializers.py", line 133, in dump_stream for obj in iterator: File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1704, in add_shuffle_key File "/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1452810606380_0004/container_1452810606380_0004_01_000002/pyspark.zip/pyspark/rdd.py", line 74, in portable_hash raise Exception("Randomness of hash of string should be disabled via PYTHONHASHSEED") Exception: Randomness of hash of string should be disabled via PYTHONHASHSEED
  6. 0

    Error in using global variables in spark project

    Stack Overflow | 8 months ago | sammy
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/src/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "utils.py", line 6, in returnIfTrue if row[1] in settings.ageList: AttributeError: 'module' object has no attribute 'ageList'

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/spark/python/pyspark/rdd.py", line 1898, in <lambda> IndexError: list index out of range

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      InterruptibleIterator.next
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:129)
      3. org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:125)
      4. org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
      4 frames
    3. Scala
      Iterator$class.foreach
      1. scala.collection.Iterator$class.foreach(Iterator.scala:727)
      1 frame
    4. Spark
      PythonRunner$WriterThread.run
      1. org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
      2. org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:452)
      3. org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:280)
      4. org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765)
      5. org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:239)
      5 frames