org.pentaho.di.core.exception.KettleStepException

There are no available Samebug tips for this exception. Do you have an idea how to solve this issue? A short tip would help users who saw this issue last week.

  • When creating a Regex Evaluation step with a regular expression like .*XXX(140110|145250)XXX.* and not checking 'Create fields for capture groups' and also not specifying any 'Capture Group Fields' then the transformation stops with an error like this: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Unexpected error Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : org.pentaho.di.core.exception.KettleStepException: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Error in step Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:208) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:40) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at java.lang.Thread.run(Thread.java:680) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Caused by: org.pentaho.di.core.exception.KettleStepException: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:161) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : ... 2 more
    via by Axel Christ,
  • Implementing a Pentaho MapReduce application. In the Mapper transformation customer needs to validate XML file. The XSD file to validate the XML, is stored in HDFS. In the XSD Validator, I set the following fields: XSD Source = is a file, filename is defined in a field XSD filename field = xsd_file_url where the xsd_file_url = hdfs://my.namenode:8020/test/hbase_mr_xml/xsd/Car_v1.xsd I get the following error: 2015/03/25 22:10:28 - XSD Validator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Unexpected error 2015/03/25 22:10:28 - XSD Validator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2015/03/25 22:10:28 - XSD Validator.0 - Error while processing 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - The schema cannot be created by a org.pentaho.hdfs.vfs.HDFSFileObject 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.steps.xsdvalidator.XsdValidator.processRow(XsdValidator.java:303) 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2015/03/25 22:10:28 - XSD Validator.0 - at java.lang.Thread.run(Unknown Source) 2015/03/25 22:10:28 - XSD Validator.0 - Caused by: org.pentaho.di.core.exception.KettleStepException: 2015/03/25 22:10:28 - XSD Validator.0 - The schema cannot be created by a org.pentaho.hdfs.vfs.HDFSFileObject 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.steps.xsdvalidator.XsdValidator.processRow(XsdValidator.java:226) 2015/03/25 22:10:28 - XSD Validator.0 - ... 2 more Reproduction steps: 1. Download the attached KTR, XML, and XSD files 2. Copy the XSD file into an HDFS directory 3. Open the transformation. Update file references for the XML and XSD files. 4. Execute the transformation. Expected Results: The transformation runs successfully, and the XML file is reported as "Valid". Actual Results: The transformation ends with an error (see above)
    via by Hemal Govind,
  • When creating a Regex Evaluation step with a regular expression like .*XXX(140110|145250)XXX.* and not checking 'Create fields for capture groups' and also not specifying any 'Capture Group Fields' then the transformation stops with an error like this: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Unexpected error Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : org.pentaho.di.core.exception.KettleStepException: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Error in step Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:208) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.step.RunThread.run(RunThread.java:40) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at java.lang.Thread.run(Thread.java:680) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Caused by: org.pentaho.di.core.exception.KettleStepException: Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : The number of capture groups in the regular expression (3) does not match the number of fields specified (0)! Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:161) Regex Evaluation.0 - ERROR (version 4.2.0-stable, build 15748 from 2011-09-08 13.11.42 by buildguy) : ... 2 more
    via by Axel Christ,
  • When performing a dimension lookup/update with a "fields tab->table stream field to compare with", if the stream field name does not match the dimension field name you will get a lookup error if the row already exists in the table. If the row does not exist, it will insert fine, so the error is only related to updates. The mapping logic of stream field names to return row field names seems wrong in this case. I had a table dimension field called "STEP_START_TIME" and a stream field called "STEP_START_TIMESTAMP" and because the names did not match it threw an error. I was able to work around by changing the name to match. Since the insert works and you allow the user to define the stream field name, it should not be comparing the string names, at least as it is. Perhaps it should be comparing the meta.fieldLookup fields vs the data.outputRowMeta if it has to match names. Here is the error. {noformat} 2016/09/29 20:55:53 - PopulateStepDimension.0 - Dimension entry found : [1], [1], [2016/09/16 15:19:42.051000000], [2016/09/16 16:19:43.000000000] 2016/09/29 20:55:53 - PopulateStepDimension.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Because of an error this step can't continue: 2016/09/29 20:55:53 - PopulateStepDimension.0 - Error comparing fields - cannot find lookup field [STEP_START_TIMESTAMP] 2016/09/29 20:55:53 - PopulateStepDimension.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2016/09/29 20:55:53 - PopulateStepDimension.0 - Error comparing fields - cannot find lookup field [STEP_START_TIMESTAMP] 2016/09/29 20:55:53 - PopulateStepDimension.0 - 2016/09/29 20:55:53 - PopulateStepDimension.0 - at org.pentaho.di.trans.steps.dimensionlookup.DimensionLookup.lookupValues(DimensionLookup.java:645) 2016/09/29 20:55:53 - PopulateStepDimension.0 - at org.pentaho.di.trans.steps.dimensionlookup.DimensionLookup.processRow(DimensionLookup.java:229) 2016/09/29 20:55:53 - PopulateStepDimension.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2016/09/29 20:55:53 - PopulateStepDimension.0 - at java.lang.Thread.run(Thread.java:745) {noformat} Here are debug notes: {noformat} rowMeta RowMeta (id=417) - stream rows [JOB_ID String(32)], [WORKFLOW_NAME String(32)], [PHASE_NAME String(32)], [STEP_NAME String(32)], [STEP_START_TIMESTAMP Timestamp], [DEFAULT_STEP_END_TIMESTAMP Timestamp], [StepEndTimeStamp Timestamp], [Thread String(15)], [Server String(32)], [Workflow String(14)], [Phase String(8)], [Step String(25)], [JobId String(32)], [JobName String(6)], [Queued Number(4, 2)], [Processing Number(4, 2)], [User Number(4, 2)], [System Number(4, 2)], [Returncode Integer(15)] row Object[39] (id=419) [10000025, PrintDocuments, Assemble, BuildAFPFromDocuments, 2016-09-16 15:19:42.051, 2016-09-29 23:53:27.719477, 2016-09-16 16:19:43.0, 468552274, System, PrintDocuments, Assemble, BuildAFPFromDocuments, 10000025, null, 0.06, 1.04, 0.01, 0.01, 0, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null] lookupRowMeta RowMeta (id=422) - database rows for query [JOB_ID String(32)], [WORKFLOW_NAME String(32)], [PHASE_NAME String(32)], [STEP_NAME String(32)], [SCD_EFFECTIVE_FROM Date], [SCD_EFFECTIVE_TO Date] lookupRow Object[6] (id=424) [10000025, PrintDocuments, Assemble, BuildAFPFromDocuments, Thu Sep 29 23:53:27 BST 2016, Thu Sep 29 23:53:27 BST 2016] returnRow Object[14] (id=426) -- what was looked up [1, 1, 2016-09-16 15:19:42.051, 2016-09-16 16:19:43.0, null, null, null, null, null, null, null, null, null, null] returnRowMeta RowMeta (id=497) (first 2 fields are ignored) [STEP_KEY Integer(9)], [SCD_VERSION Integer(9)], [STEP_START_TIME Timestamp], [STEP_END_TIME Timestamp] meta.fieldStream String[3] (id=396) [STEP_START_TIMESTAMP, StepEndTimeStamp, null] meta.fieldLookup String[3] (id=524) [STEP_START_TIME, STEP_END_TIME, SCD_CURRENT] outputRowMeta RowMeta (id=488) -- what we will be writing to the output stream [JOB_ID String(32)], [WORKFLOW_NAME String(32)], [PHASE_NAME String(32)], [STEP_NAME String(32)], [STEP_START_TIMESTAMP Timestamp], [DEFAULT_STEP_END_TIMESTAMP Timestamp], [StepEndTimeStamp Timestamp], [Thread String(15)], [Server String(32)], [Workflow String(14)], [Phase String(8)], [Step String(25)], [JobId String(32)], [JobName String(6)], [Queued Number(4, 2)], [Processing Number(4, 2)], [User Number(4, 2)], [System Number(4, 2)], [Returncode Integer(15)], [STEP_KEY Integer(9)] fieldnrs (id=477) (indexes into output fields for field stream entries) [4, 6, -1] {noformat} --loop fields in stream that should be added to the stream from the GUI {noformat} for ( int i = 0; i < meta.getFieldStream().length; i++ ) { if ( data.fieldnrs[i] >= 0 ) { // make sure it is a real field // Only compare real fields, not last updated row, last version, etc // // V1==stream data ValueMetaInterface v1 = data.outputRowMeta.getValueMeta( data.fieldnrs[i] ); // v1 ValueMetaTimestamp (id=388) STEP_START_TIMESTAMP Timestamp Object valueData1 = row[data.fieldnrs[i]]; // 2016-09-16 15:19:42.051 findColumn = meta.getFieldStream()[i]; // STEP_START_TIMESTAMP // find the returnRowMeta based on the field in the fieldLookup list ValueMetaInterface v2 = null; Object valueData2 = null; // Fix for PDI-8122 // See if it's already been computed. returnRowColNum = columnLookupArray[i]; // -1 if ( returnRowColNum == -1 ) { // It hasn't been found yet - search the list and make sure we're comparing // the right column to the right column. for ( int j = 2; j < data.returnRowMeta.size(); j++ ) { // starting at 2 because I know that 0 and 1 are // poked in by Kettle. // V2==table data v2 = data.returnRowMeta.getValueMeta( j ); // STEP_START_TIME Timestamp, STEP_END_TIME Timestamp if ( ( v2.getName() != null ) && ( v2.getName().equalsIgnoreCase( findColumn ) ) ) { // is this the // right column? //STEP_START_TIME, STEP_END_TIME columnLookupArray[i] = j; // yes - record the "j" into the columnLookupArray at [i] for the next time // through the loop valueData2 = returnRow[j]; // get the valueData2 for comparison break; // get outta here. } else { // Reset to null because otherwise, we'll get a false finding at the end. // This could be optimized to use a temporary variable to avoid the repeated set if necessary // but it will never be as slow as the database lookup anyway v2 = null; } } } else { // We have a value in the columnLookupArray - use the value stored there. v2 = data.returnRowMeta.getValueMeta( returnRowColNum ); valueData2 = returnRow[returnRowColNum]; } if ( v2 == null ) { // If we made it here, then maybe someone tweaked the XML in the transformation // and we're matching a stream column to a column that doesn't really exist. Throw an exception. throw new KettleStepException( BaseMessages.getString( PKG, "DimensionLookup.Exception.ErrorDetectedInComparingFields", meta.getFieldStream()[i] ) ); } try { cmp = v1.compare( valueData1, v2, valueData2 ); } catch ( ClassCastException e ) { throw e; } {noformat}
    via by Adam Swartz,
  • There are 2 errors in 5.0 that did not occur in 4.4.0.0 with the Regex evaluation step when Replace previous fields is checked. First when the output fields do exist, the Regex Evaluation step gives the following error and fails so you cannot replace previous fields: 2013/09/20 08:59:14 - Regex Evaluation.0 - ERROR (version 5.0.0.1, build 1 from 2013-09-11_16-51-19 by buildguy) : java.lang.ArrayIndexOutOfBoundsException: 13 2013/09/20 08:59:14 - Regex Evaluation.0 - at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:145) 2013/09/20 08:59:14 - Regex Evaluation.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60) 2013/09/20 08:59:14 - Regex Evaluation.0 - at java.lang.Thread.run(Thread.java:724) Second when the output fields do NOT exist, the Regex Evaluation steps gives the following error and fails. In 4.4.0.0 the output fields did not have to exist and you could still check the "Replace previous fields" option and the step would still work. 2013/09/20 08:43:20 - Regex Evaluation.0 - ERROR (version 5.0.0.1, build 1 from 2013-09-11_16-51-19 by buildguy) : Unexpected error 2013/09/20 08:43:20 - Regex Evaluation.0 - ERROR (version 5.0.0.1, build 1 from 2013-09-11_16-51-19 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2013/09/20 08:43:20 - Regex Evaluation.0 - org.pentaho.di.core.exception.KettleStepException: 2013/09/20 08:43:20 - Regex Evaluation.0 - We cannot find result field to replace [result] 2013/09/20 08:43:20 - Regex Evaluation.0 - 2013/09/20 08:43:20 - Regex Evaluation.0 - 2013/09/20 08:43:20 - Regex Evaluation.0 - We cannot find result field to replace [result] 2013/09/20 08:43:20 - Regex Evaluation.0 - 2013/09/20 08:43:20 - Regex Evaluation.0 - 2013/09/20 08:43:20 - Regex Evaluation.0 - at org.pentaho.di.trans.steps.regexeval.RegexEvalMeta.getFields(RegexEvalMeta.java:507) 2013/09/20 08:43:20 - Regex Evaluation.0 - at org.pentaho.di.trans.steps.regexeval.RegexEval.processRow(RegexEval.java:83) 2013/09/20 08:43:20 - Regex Evaluation.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60) 2013/09/20 08:43:20 - Regex Evaluation.0 - at java.lang.Thread.run(Thread.java:724) 2013/09/20 08:43:20 - Regex Evaluation.0 - Caused by: org.pentaho.di.core.exception.KettleStepException: 2013/09/20 08:43:20 - Regex Evaluation.0 - We cannot find result field to replace [result] 2013/09/20 08:43:20 - Regex Evaluation.0 - 2013/09/20 08:43:20 - Regex Evaluation.0 - at org.pentaho.di.trans.steps.regexeval.RegexEvalMeta.getFields(RegexEvalMeta.java:473) 2013/09/20 08:43:20 - Regex Evaluation.0 - ... 3 more
    via by Chris Deptula,
  • Implementing a Pentaho MapReduce application. In the Mapper transformation customer needs to validate XML file. The XSD file to validate the XML, is stored in HDFS. In the XSD Validator, I set the following fields: XSD Source = is a file, filename is defined in a field XSD filename field = xsd_file_url where the xsd_file_url = hdfs://my.namenode:8020/test/hbase_mr_xml/xsd/Car_v1.xsd I get the following error: 2015/03/25 22:10:28 - XSD Validator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : Unexpected error 2015/03/25 22:10:28 - XSD Validator.0 - ERROR (version 5.3.0.0-213, build 1 from 2015-02-02_12-17-08 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2015/03/25 22:10:28 - XSD Validator.0 - Error while processing 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - The schema cannot be created by a org.pentaho.hdfs.vfs.HDFSFileObject 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.steps.xsdvalidator.XsdValidator.processRow(XsdValidator.java:303) 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2015/03/25 22:10:28 - XSD Validator.0 - at java.lang.Thread.run(Unknown Source) 2015/03/25 22:10:28 - XSD Validator.0 - Caused by: org.pentaho.di.core.exception.KettleStepException: 2015/03/25 22:10:28 - XSD Validator.0 - The schema cannot be created by a org.pentaho.hdfs.vfs.HDFSFileObject 2015/03/25 22:10:28 - XSD Validator.0 - 2015/03/25 22:10:28 - XSD Validator.0 - at org.pentaho.di.trans.steps.xsdvalidator.XsdValidator.processRow(XsdValidator.java:226) 2015/03/25 22:10:28 - XSD Validator.0 - ... 2 more Reproduction steps: 1. Download the attached KTR, XML, and XSD files 2. Copy the XSD file into an HDFS directory 3. Open the transformation. Update file references for the XML and XSD files. 4. Execute the transformation. Expected Results: The transformation runs successfully, and the XML file is reported as "Valid". Actual Results: The transformation ends with an error (see above)
    via by Hemal Govind,
    • org.pentaho.di.core.exception.KettleStepException: 2016/02/01 14:07:56 - PK Columns.0 - Return value KEY_SEQ can't be found in the input row. 2016/02/01 14:07:56 - PK Columns.0 - at org.pentaho.di.trans.steps.streamlookup.StreamLookupMeta.getFields(StreamLookupMeta.java:184) at org.pentaho.di.trans.TransMeta.getThisStepFields(TransMeta.java:2042) at org.pentaho.di.trans.TransMeta.getStepFields(TransMeta.java:1871) at org.pentaho.di.trans.TransMeta.getStepFields(TransMeta.java:1834) at org.pentaho.di.trans.TransMeta.getStepFields(TransMeta.java:1834) at org.pentaho.di.trans.TransMeta.getPrevStepFields(TransMeta.java:1940) at org.pentaho.di.trans.TransMeta.getPrevStepFields(TransMeta.java:1905) at org.pentaho.di.trans.steps.groupby.GroupBy.processRow(GroupBy.java:106) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) at java.lang.Thread.run(Thread.java:745)
    No Bugmate found.