org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize

tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    Unable to load NLTK in spark using PySpark

    Data Science | 1 year ago | krishna Prasad
    org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize

    2 unregistered visitors

    Root Cause Analysis

    1. org.apache.spark.scheduler.TaskSetManager

      Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      8. org.apache.spark.scheduler.Task.run(Task.scala:89)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      9 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames