org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize

Data Science | krishna Prasad | 7 months ago
  1. 0

    Unable to load NLTK in spark using PySpark

    Data Science | 7 months ago | krishna Prasad
    org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize
  2. 0

    Spark Streaming Checkpoint not working after driver restart

    Stack Overflow | 1 year ago | Knight71
    org.apache.spark.scheduler.TaskSetManager: Lost task 1.0 in stage 509.0 (TID 882): java.lang.Exception: Could not compute split, block input-0-1446778 622600 not found
  3. 0

    My C* has tables {code} CREATE TABLE csod.role ( object_id uuid, code text, description text, level int, name text, solr_query text, PRIMARY KEY (object_id) ) {code} and {code} CREATE TABLE csod.user_role ( role uuid, user uuid, role_name text, solr_query text, PRIMARY KEY (role, user) ) {code} When I try to use CassandraSQLContext in the Spark shell for joining this tables I get an exception: {code} scala> csc.sql("select * from role r join user_role ur on r.object_id = ur.role").collect WARN 2016-02-10 16:44:46 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 172.26.28.101): scala.MatchError: UUIDType (of class org.apache.spark.sql.cassandra.types.UUIDType$) at org.apache.spark.sql.execution.SparkSqlSerializer2$$anonfun$createSerializationFunction$1.apply(SparkSqlSerializer2.scala:232) at org.apache.spark.sql.execution.SparkSqlSerializer2$$anonfun$createSerializationFunction$1.apply(SparkSqlSerializer2.scala:227) at org.apache.spark.sql.execution.Serializer2SerializationStream.writeKey(SparkSqlSerializer2.scala:65) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:206) at org.apache.spark.util.collection.WritablePartitionedIterator$$anon$3.writeNext(WritablePartitionedPairCollection.scala:104) at org.apache.spark.util.collection.ExternalSorter.spillToPartitionFiles(ExternalSorter.scala:375) at org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:208) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:62) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} If I understand right the join should work just like a string composition but it doesn't.

    DataStax JIRA | 10 months ago | Alexander Sedov
    org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 172.26.28.101): scala.MatchError: UUIDType (of class org.apache.spark.sql.cassandra.types.UUIDType$)
  4. Speed up your debug routine!

    Automated exception search integrated into your IDE

  5. 0

    1) Create a user defined type with keyspace and udt. {code:none} CREATE TYPE "Attachment" ( "Id" text, "MimeType" text, "FileName" text, "FileSize" int, "FileLink" text, "Description" text, "Attributes" frozen<map<text, text>> ); CREATE TABLE "Interaction" ( "Id" text PRIMARY KEY, "AllAttributes" text, "Attachments" map<text,frozen<"Attachment">>, "AttachmentsContent" map<text, blob>, "Attributes" map<text, text>, "CanBeParent" boolean, "ContactId" text, "Content" blob, "ContentSize" int, "CreatorAppId" int, "ESQuery" text, "EndDate" timestamp, "EntityTypeId" int, "ExternalId" text, "MediaTypeId" text, "MimeType" text, "ModifiedDate" timestamp, "OwnerId" int, "ParentId" text, "Participant" set<text>, "StartDate" timestamp, "Status" int, "SubtenantId" int, "SubtypeId" text, "TenantId" int, "ThreadId" text, "TypeId" text ) {code} 2) Insert some data few columns (but udt) and verify that data comes back in cqlsh. {code} INSERT INTO ucs."Interaction"("Id","AllAttributes","CanBeParent","ContactId","CreatorAppId","EntityTypeId","ExternalId","MediaTypeId","MimeType","OwnerId","StartDate","Status","SubtypeId","TenantId","ThreadId","TypeId","Attributes","Attachments") VALUES ('000000a5ixIEvmPD','test',true,'xcb9HMoQ',214,1,'yDRh7j3oBW','email','application/json',243,'2015-01-29T07:19:20.000Z',0,'OutboundNew',101,'000000a5ixIEvmPC','Outbound',{'Lang':'English','Text':'plop','Subject':'Hello from John Doe','StructuredText':'<html><body>plop</body></html>','StructTextMimeType':'text/html','FromAddress':'s.cooperphd@yahoo.com','FromPersonal':'John Doe','Timeshift':'60'},null); {code} 3) Verify that the row comes back in spark {code} dse spark val tableRdd = sc.cassandraTable("ticket17224","Interaction") val dataColumns = tableRdd.map(row => row.getString("MediaTypeId")) dataColumns.count {code} 4) Now insert data into UDT field {code} update ticket17224."Interaction" set "Attachments" = "Attachments" + {'rVpgK':{"Id":'rVpgK',"MimeType":'text/plain',"FileName":'notes.txt',"FileSize":7,"FileLink":'toto',"Description":'bug',"Attributes":null}} where "Id" = '000000a5ixIEvmPD'; {code} 5) Run same code as step 3. This causes app to crash {code} scala> dataColumns.count WARN 2015-06-05 15:22:59 org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.56.102): java.lang.IllegalArgumentException: Id is not a field defined in this definition at com.datastax.driver.core.UserType.getFieldType(UserType.java:165) at com.datastax.spark.connector.AbstractGettableData$.get(AbstractGettableData.scala:109) at com.datastax.spark.connector.UDTValue$$anonfun$1.apply(UDTValue.scala:18) at com.datastax.spark.connector.UDTValue$$anonfun$1.apply(UDTValue.scala:18) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at com.datastax.spark.connector.UDTValue$.fromJavaDriverUDTValue(UDTValue.scala:18) at com.datastax.spark.connector.AbstractGettableData$.convert(AbstractGettableData.scala:85) at com.datastax.spark.connector.AbstractGettableData$$anonfun$convert$3.apply(AbstractGettableData.scala:84) at com.datastax.spark.connector.AbstractGettableData$$anonfun$convert$3.apply(AbstractGettableData.scala:84) at scala.collection.GenTraversableViewLike$Mapped$$anonfun$foreach$2.apply(GenTraversableViewLike.scala:81) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:294) at scala.collection.GenTraversableViewLike$Mapped$class.foreach(GenTraversableViewLike.scala:80) at scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:84) at scala.collection.TraversableOnce$class.toMap(TraversableOnce.scala:279) at scala.collection.IterableViewLike$AbstractTransformed.toMap(IterableViewLike.scala:47) at com.datastax.spark.connector.AbstractGettableData$.convert(AbstractGettableData.scala:84) at com.datastax.spark.connector.AbstractGettableData$.get(AbstractGettableData.scala:98) at com.datastax.spark.connector.CassandraRow$$anonfun$fromJavaDriverRow$1.apply$mcVI$sp(CassandraRow.scala:95) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at com.datastax.spark.connector.CassandraRow$.fromJavaDriverRow(CassandraRow.scala:94) at com.datastax.spark.connector.rdd.reader.RowReaderFactory$GenericRowReader$$.read(RowReaderFactory.scala:69) at com.datastax.spark.connector.rdd.reader.RowReaderFactory$GenericRowReader$$.read(RowReaderFactory.scala:61) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$12.apply(CassandraTableScanRDD.scala:183) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$12.apply(CassandraTableScanRDD.scala:183) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1403) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:927) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR 2015-06-05 15:22:59 org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.56.102): java.lang.IllegalArgumentException: Id is not a field defined in this definition at com.datastax.driver.core.UserType.getFieldType(UserType.java:165) at com.datastax.spark.connector.AbstractGettableData$.get(AbstractGettableData.scala:109) at com.datastax.spark.connector.UDTValue$$anonfun$1.apply(UDTValue.scala:18) at com.datastax.spark.connector.UDTValue$$anonfun$1.apply(UDTValue.scala:18) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at com.datastax.spark.connector.UDTValue$.fromJavaDriverUDTValue(UDTValue.scala:18) at com.datastax.spark.connector.AbstractGettableData$.convert(AbstractGettableData.scala:85) at com.datastax.spark.connector.AbstractGettableData$$anonfun$convert$3.apply(AbstractGettableData.scala:84) at com.datastax.spark.connector.AbstractGettableData$$anonfun$convert$3.apply(AbstractGettableData.scala:84) at scala.collection.GenTraversableViewLike$Mapped$$anonfun$foreach$2.apply(GenTraversableViewLike.scala:81) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:294) at scala.collection.GenTraversableViewLike$Mapped$class.foreach(GenTraversableViewLike.scala:80) at scala.collection.IterableViewLike$$anon$3.foreach(IterableViewLike.scala:84) at scala.collection.TraversableOnce$class.toMap(TraversableOnce.scala:279) at scala.collection.IterableViewLike$AbstractTransformed.toMap(IterableViewLike.scala:47) at com.datastax.spark.connector.AbstractGettableData$.convert(AbstractGettableData.scala:84) at com.datastax.spark.connector.AbstractGettableData$.get(AbstractGettableData.scala:98) at com.datastax.spark.connector.CassandraRow$$anonfun$fromJavaDriverRow$1.apply$mcVI$sp(CassandraRow.scala:95) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at com.datastax.spark.connector.CassandraRow$.fromJavaDriverRow(CassandraRow.scala:94) at com.datastax.spark.connector.rdd.reader.RowReaderFactory$GenericRowReader$$.read(RowReaderFactory.scala:69) at com.datastax.spark.connector.rdd.reader.RowReaderFactory$GenericRowReader$$.read(RowReaderFactory.scala:61) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$12.apply(CassandraTableScanRDD.scala:183) at com.datastax.spark.connector.rdd.CassandraTableScanRDD$$anonfun$12.apply(CassandraTableScanRDD.scala:183) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$13.next(Iterator.scala:372) at com.datastax.spark.connector.util.CountingIterator.next(CountingIterator.scala:16) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1403) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:927) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:927) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1353) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1214) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1202) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1202) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:696) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:696) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1420) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) {code}

    DataStax JIRA | 1 year ago | Alex Liu
    org.apache.spark.scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.56.102): java.lang.IllegalArgumentException: Id is not a field defined in this definition
  6. 0

    Spark SQL Join error on cassandra UUID types

    Stack Overflow | 1 year ago | jguerra
    org.apache.spark.scheduler.TaskSetManager: Lost task 3.0 in stage 0.0 (TID 6, 161.72.45.76): scala.MatchError: UUIDType (of class org.apache.spark.sql.cassandra.types.UUIDType$)

    2 unregistered visitors
    Not finding the right solution?
    Take a tour to get the most out of Samebug.

    Tired of useless tips?

    Automated exception search integrated into your IDE

    Root Cause Analysis

    1. org.apache.spark.scheduler.TaskSetManager

      Lost task 0.0 in stage 2.0 (TID 16, spark-w-0.c.clean-feat-131014.internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/pyspark/worker.py", line 98, in main command = pickleSer._read_with_length(infile) File "/usr/lib/spark/python/pyspark/serializers.py", line 164, in _read_with_length return self.loads(obj) File "/usr/lib/spark/python/pyspark/serializers.py", line 422, in loads return pickle.loads(obj) ImportError: No module named nltk.tokenize

      at org.apache.spark.api.python.PythonRunner$$anon$1.read()
    2. Spark
      Executor$TaskRunner.run
      1. org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:166)
      2. org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:207)
      3. org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
      4. org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
      5. org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
      6. org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
      7. org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
      8. org.apache.spark.scheduler.Task.run(Task.scala:89)
      9. org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
      9 frames
    3. Java RT
      Thread.run
      1. java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      2. java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      3. java.lang.Thread.run(Thread.java:745)
      3 frames