org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()

GitHub | md6nguyen | 3 weeks ago
tip
Click on the to mark the solution that helps you, Samebug will learn from it.
As a community member, you’ll be rewarded for you help.
  1. 0

    ch06/extract_airlines.py throws error at airplanes_per_carrier.count()

    GitHub | 3 weeks ago | md6nguyen
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()
  2. 0

    Tips for properly using large broadcast variables?

    Stack Overflow | 9 months ago | captaincapsaicin
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/lib/spark/python/pyspark/rdd.py", line 2355, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/lib/spark/python/pyspark/rdd.py", line 2355, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/lib/spark/python/pyspark/rdd.py", line 317, in func return f(iterator) File "/usr/lib/spark/python/pyspark/rdd.py", line 1006, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/usr/lib/spark/python/pyspark/rdd.py", line 1006, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 139, in load_stream yield self._read_with_length(stream) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) MemoryError
  3. 0

    Apache-Spark load files from HDFS - BlogoSfera

    co.uk | 2 years ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Apache-Spark load files from HDFS

    Stack Overflow | 2 years ago | Ruofan Kong
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'
  6. 0

    python - Apache-Spark load files from HDFS - Stack Overflow

    readtiger.com | 2 years ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      8. org.apache.spark.scheduler.Task.run(Task.scala:99)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      9 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames