org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    type error when using Sparkit-Learn's SparkCountVectorizer()

    Stack Overflow | 4 months ago | Thiago Marzagão
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  2. 0

    Py4JJavaError while fit_transform(X_rdd)

    GitHub | 11 months ago | alonsopg
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 8.0 failed 1 times, most recent failure: Lost task 3.0 in stage 8.0 (TID 35, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> tokenize(preprocess(self.decode(doc))), stop_words) File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> return lambda x: strip_accents(x.lower()) AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  3. 0

    GitHub comment 902#264089144

    GitHub | 6 months ago | mooperd
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000003/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__ response = getattr(parent.meta.client, operation_name)(**params) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    UnicodeEncodeError

    GitHub | 1 year ago | anndro
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/home/torque/mucahit/spark-1.5.2-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/home/torque/mucahit/keras_venv/local/lib/python2.7/site-packages/elephas/spark_model.py", line 222, in train put_deltas_to_server(deltas, self.master_url) File "/home/torque/mucahit/keras_venv/local/lib/python2.7/site-packages/elephas/spark_model.py", line 36, in put_deltas_to_server return urllib2.urlopen(request).read() File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 401, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 419, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain result = func(_args) File "/usr/lib/python2.7/urllib2.py", line 1211, in http_open return self.do_open(httplib.HTTPConnection, req) File "/usr/lib/python2.7/urllib2.py", line 1178, in do_open h.request(req.get_method(), req.get_selector(), req.data, headers) File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 963, in request File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 997, in _send_request File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 959, in endheaders File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 819, in _send_output File "/home/torque/mucahit/keras_venv/lib/python2.7/httplib.py", line 795, in send File "/usr/lib/python2.7/socket.py", line 224, in meth return getattr(self._sock,name)(_args) UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 183: ordinal not in range(128) ```

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      6. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
      7. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
      8. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
      9. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910)
      10. org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
      11. org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910)
      12. org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668)
      13. org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
      14. org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
      15. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      16. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      17. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
      18. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      19. org.apache.spark.scheduler.Task.run(Task.scala:85)
      20. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      20 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames