org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

tip
Your exception is missing from the Samebug knowledge base.
Here are the best solutions we found on the Internet.
Click on the to mark the helpful solution and get rewards for you help.
  1. 0

    type error when using Sparkit-Learn's SparkCountVectorizer()

    Stack Overflow | 2 months ago | Thiago Marzagão
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  2. 0

    Py4JJavaError while fit_transform(X_rdd)

    GitHub | 9 months ago | alonsopg
    org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 8.0 failed 1 times, most recent failure: Lost task 3.0 in stage 8.0 (TID 35, localhost): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 111, in main process() File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/worker.py", line 106, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/usr/local/Cellar/apache-spark/1.6.1/libexec/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> tokenize(preprocess(self.decode(doc))), stop_words) File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> return lambda x: strip_accents(x.lower()) AttributeError: 'numpy.ndarray' object has no attribute 'lower'
  3. 0

    GitHub comment 902#264089144

    GitHub | 3 months ago | mooperd
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0049/container_1480271222291_0049_01_000019/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__ response = getattr(parent.meta.client, operation_name)(**params) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    boto3 seems to be breaking with apache spark in yarn mode. - `NoCredentialsError: Unable to locate credentials`.

    GitHub | 3 months ago | mooperd
    org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 172, in main process() File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/worker.py", line 167, in process serializer.dump_stream(func(split_index, iterator), outfile) File "/hadoop/yarn/local/usercache/centos/appcache/application_1480271222291_0048/container_1480271222291_0048_01_000020/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream vs = list(itertools.islice(iterator, batch)) File "/usr/hdp/2.5.0.0-1245/spark2/python/lib/pyspark.zip/pyspark/rdd.py", line 1306, in takeUpToNumLeft File "/home/centos/fun-functions/spark-parrallel-read-from-s3/tick.py", line 38, in distributedJsonRead File "/usr/lib/python2.7/site-packages/boto3/resources/factory.py", line 520, in do_action response = action(self, *args, **kwargs) File "/usr/lib/python2.7/site-packages/boto3/resources/action.py", line 83, in __call__ response = getattr(parent.meta.client, operation_name)(**params) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 251, in _api_call return self._make_api_call(operation_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/client.py", line 526, in _make_api_call operation_model, request_dict) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 141, in make_request return self._send_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 166, in _send_request request = self.create_request(request_dict, operation_model) File "/usr/lib/python2.7/site-packages/botocore/endpoint.py", line 150, in create_request operation_name=operation_model.name) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 227, in emit return self._emit(event_name, kwargs) File "/usr/lib/python2.7/site-packages/botocore/hooks.py", line 210, in _emit response = handler(**kwargs) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 90, in handler return self.sign(operation_name, request) File "/usr/lib/python2.7/site-packages/botocore/signers.py", line 147, in sign auth.add_auth(request) File "/usr/lib/python2.7/site-packages/botocore/auth.py", line 678, in add_auth raise NoCredentialsError NoCredentialsError: Unable to locate credentials

    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.api.python.PythonException

      Traceback (most recent call last): # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 172, in main # process() # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/worker.py", line 167, in process # serializer.dump_stream(func(split_index, iterator), outfile) # File "/home/cgu.local/thiagovm/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/serializers.py", line 263, in dump_stream # vs = list(itertools.islice(iterator, batch)) # File "/usr/local/lib/python3.5/site-packages/splearn/feature_extraction/text.py", line 289, in <lambda> # A = Z.transform(lambda X: list(map(analyze, X)), column='X').persist() # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 238, in <lambda> # tokenize(preprocess(self.decode(doc))), stop_words) # File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 204, in <lambda> # return lambda x: strip_accents(x.lower()) # AttributeError: 'numpy.ndarray' object has no attribute 'lower'

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      6. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
      7. org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
      8. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
      9. org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:910)
      10. org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
      11. org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:910)
      12. org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:668)
      13. org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
      14. org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
      15. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
      16. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
      17. org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
      18. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      19. org.apache.spark.scheduler.Task.run(Task.scala:85)
      20. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      20 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames