org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

tip
Do you know that we can give you better hits? Get more relevant results from Samebug’s stack trace search.
  1. 0

    type error when using Sparkit-Learn's SparkCountVectorizer()

    Stack Overflow | 3 months ago | Thiago Marzagão
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  2. 0

    Py4JJavaError while fit_transform(X_rdd)

    GitHub | 10 months ago | alonsopg
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 8.0 failed 1 times, most recent failure: Lost task 3.0 in stage 8.0 (TID 35, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> tokenize(preprocess(self.decode(doc))), stop_words) File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> return lambda x: strip_accents(x.lower()) AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  3. 0

    GitHub comment 902#264089144

    GitHub | 5 months ago | mooperd
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__ response = getattr(parent.meta.client, operation_name)(**params) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    UnicodeEncodeError

    GitHub | 1 year ago | anndro
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/torque/mucahit/keras_venv/local/lib/python2.7/site-packages/elephas/spark_model.py", line 222, in train put_deltas_to_server(deltas, self.master_url) File "/home/torque/mucahit/keras_venv/local/lib/python2.7/site-packages/elephas/spark_model.py", line 36, in put_deltas_to_server return urllib2.urlopen(request).read() File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 401, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 419, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain result = func(_args) File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1178, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 963, in request File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 997, in _send_request File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 959, in endheaders File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 819, in _send_output File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 795, in send File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(_args) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 183: ordinal not in range(128) ```

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      6. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
      7. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
      8. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
      9. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910)
      10. org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
      11. org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910)
      12. org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668)
      13. org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
      14. org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
      15. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      16. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      17. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
      18. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      19. org.apache.spark.scheduler.Task.run(Task.scala:85)
      20. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      20 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames