org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions

Stack Overflow | user1780424 | 4 months ago
  1. 0

    Spark Exception: Python in worker has different version 3.4 than that in driver 3.5

    Stack Overflow | 4 months ago | user1780424
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions
  2. 0

    GitHub comment 3#252594408

    GitHub | 2 months ago | ssallys
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2371, in pipeline_func File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/rdd.py", line 2371, in pipeline_func File "build/bdist.linux-x86_64/egg/tensorspark/core/spark_session.py", line 177, in _spark_run_fn File "build/bdist.linux-x86_64/egg/tensorspark/core/session_worker.py", line 34, in run self._run_fn(splitIndex, partition, self._param_bc.value) File "build/bdist.linux-x86_64/egg/tensorspark/core/session_worker.py", line 68, in _run_fn sutil.restore_session_hdfs(sess, user, session_path, session_meta_path, tmp_local_dir, host, port) File "build/bdist.linux-x86_64/egg/tensorspark/core/session_util.py", line 81, in restore_session_hdfs saver = tf.train.import_meta_graph(local_meta_path) File "/home/etri/anaconda3/envs/tensorflow2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1458, in import_meta_graph return _import_meta_graph_def(read_meta_graph_file(meta_graph_or_file)) File "/home/etri/anaconda3/envs/tensorflow2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1310, in read_meta_graph_file raise IOError("File %s does not exist." % filename) IOError: File /tmp/session_mnist_try_1476098552130.meta does not exist.
  3. 0

    IOError when run the script in spark standalone cluster mode

    GitHub | 2 months ago | ssallys
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "/usr/local/spark-2.0.0-bin-hadoop2.7/python/pyspark/rdd.py", line 2371, in pipeline_func return func(split, prev_func(split, iterator)) File "tensorspark/core/spark_session.py", line 177, in _spark_run_fn File "build/bdist.linux-x86_64/egg/tensorspark/core/session_worker.py", line 34, in run self._run_fn(splitIndex, partition, self._param_bc.value) File "build/bdist.linux-x86_64/egg/tensorspark/core/session_worker.py", line 68, in _run_fn sutil.restore_session_hdfs(sess, user, session_path, session_meta_path, tmp_local_dir, host, port) File "build/bdist.linux-x86_64/egg/tensorspark/core/session_util.py", line 81, in restore_session_hdfs saver = tf.train.import_meta_graph(local_meta_path) File "/home/etri/anaconda3/envs/tensorflow2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1458, in import_meta_graph return _import_meta_graph_def(read_meta_graph_file(meta_graph_or_file)) File "/home/etri/anaconda3/envs/tensorflow2.7/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1310, in read_meta_graph_file raise IOError("File %s does not exist." % filename) IOError: File /tmp/session_mnist_try_1475675085625.meta does not exist. ```
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    GitHub comment 902#264089144

    GitHub | 1 week ago | mooperd
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000012/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__ response = getattr(parent.meta.client, operation_name)(**params) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials
  6. 0

    Add date field to RDD in Spark

    Stack Overflow | 2 years ago
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/python/pyspark/worker.py", line 79, in main serializer.dump_stream(func(split_index, iterator), outfile) File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/python/pyspark/serializers.py", line 196, in dump_stream self.serializer.dump_stream(self._batched(iterator), stream) File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/python/pyspark/serializers.py", line 127, in dump_stream for obj in iterator: File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/python/pyspark/serializers.py", line 185, in _batched for item in iterator: File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/python/pyspark/rdd.py", line 1147, in takeUpToNumLeft yield next(iterator) File "/home/terrapin/Spark_Hadoop/spark-1.1.1-bin-cdh4/test3.py", line 72, in parsedate dt=dateutil.parser.parse("01 Jan 1900 00:00:00").date() AttributeError: 'module' object has no attribute 'parser'

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): File "/opt/spark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main ("%d.%d" % sys.version_info[:2], version)) Exception: Python in worker has different version 3.4 than that in driver 3.5, PySpark cannot run with different minor versions

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      8. org.apache.spark.scheduler.Task.run(Task.scala:85)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      9 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames