org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/mypath/classfication.py", line 20, in <lambda> getData = splitData.map(lambda line: [labelMap[line[2]], list(jieba.cut(line[6]+line[13]))]) IndexError: list index out of range

Stack Overflow | kiseliu | 10 months ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    Why do I have an error of "IndexError: list index out of range" when I do TF-IDF using pyspark.ml.feature?

    Stack Overflow | 10 months ago | kiseliu
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/mypath/classfication.py", line 20, in <lambda> getData = splitData.map(lambda line: [labelMap[line[2]], list(jieba.cut(line[6]+line[13]))]) IndexError: list index out of range
  2. 0

    What is the best way of filter a list based in other list in spark with python?

    Stack Overflow | 10 months ago | user1753235
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/data/3/tmp/hadoop-hadoop/nm-local-dir/usercache/user/appcache/application_1468851295159_0020/container_1468851295159_0020_01_000016/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/spark/python/pyspark/rdd.py", line 1898, in <lambda> IndexError: list index out of range

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/Users/lyj/Programs/Apache/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/mypath/classfication.py", line 20, in <lambda> getData = splitData.map(lambda line: [labelMap[line[2]], list(jieba.cut(line[6]+line[13]))]) IndexError: list index out of range

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      InterruptibleIterator.next
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:129)
      3. org.apache.spark.api.python.PythonRunner$$anon$1.next(PythonRDD.scala:125)
      4. org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:43)
      4 frames