org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()

GitHub | md6nguyen | 1 month ago
tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    ch06/extract_airlines.py throws error at airplanes_per_carrier.count()

    GitHub | 1 month ago | md6nguyen
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()
  2. 0

    Tips for properly using large broadcast variables?

    Stack Overflow | 10 months ago | captaincapsaicin
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/lib/spark/python/pyspark/rdd.py", line 2355, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/lib/spark/python/pyspark/rdd.py", line 2355, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/lib/spark/python/pyspark/rdd.py", line 317, in func return f(iterator) File "/usr/lib/spark/python/pyspark/rdd.py", line 1006, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/usr/lib/spark/python/pyspark/rdd.py", line 1006, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 139, in load_stream yield self._read_with_length(stream) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) MemoryError
  3. 0

    Apache-Spark load files from HDFS - BlogoSfera

    co.uk | 2 years ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    Apache-Spark load files from HDFS

    Stack Overflow | 2 years ago | Ruofan Kong
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'
  6. 0

    python - Apache-Spark load files from HDFS - Stack Overflow

    readtiger.com | 2 years ago
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/ying/AWS_Tutorial/spark-1.4.0/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 2318, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 304, in func return f(iterator) File "/home/ying/AWS_Tutorial/spark-1.4.0/python/pyspark/rdd.py", line 719, in processPartition f(x) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 41, in <lambda> temp = datafile.foreach(lambda (path, content): myfunc(str(path).strip('file:'))) File "/home/ying/AWS_Tutorial/spark_codes/sum.py", line 26, in myfunc cr = csv.reader(open(s,"rb")) IOError: [Errno 2] No such file or directory: 'hdfs://localhost:9000/data/test1.csv'

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 174, in main process() File "/home/ubuntu/spark/python/lib/pyspark.zip/pyspark/worker.py", line 169, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 2407, in pipeline_func return func(split, prev_func(split, iterator)) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 346, in func return f(iterator) File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <lambda> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "/home/ubuntu/spark/python/pyspark/rdd.py", line 1041, in <genexpr> return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum() File "<stdin>", line 9, in <lambda> TypeError: unorderable types: NoneType() < str()

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
      8. org.apache.spark.scheduler.Task.run(Task.scala:99)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
      9 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames