Here you enter the percentage Value which will be used be the selected statistical method.

Here you can set special values (separated by Semicolon and/or space) which will be checked separately. If a match is found the result column will be flagged with a 1.

Summary

This algorithm filters data records through value comparisons and statistical averages, averages with standard deviations, medians and special values. Value changes in relation to the previous data record are also considered.

Configuration

Input settings of existing table

Name

Value

Opt.

Description

Example

Apply to the following columns

System.Object

-

Enter the desired column name, e.g. A, C-H, K

-

Settings

Name

Value

Opt.

Description

Example

Action

System.String

  • All rows
  • Only rows that fulfill ALL criteria
  • Only rows that fulfill NO criteria

-

Which data records should the operation retain as result?

-

Value less than

System.String

opt.

The absolute value entered is compared with the selected column(s). The criterion is fulfilled if the column value is less than the entered value.

-

Value greater than

System.String

opt.

The absolute value entered is compared with the selected column(s). The criterion is fulfilled if the column value is greater than the entered value.

-

Valid value quantity

System.String

opt.

Several values separated by a semi-colon or a space can be entered here. The columns to be analyzed are searched for these values and the flag is set to 1 in the result column when one is found.

-

Invalid value quantity

System.String

opt.

Several values separated by a semi-colon or a space can be entered here. The columns to be analyzed are searched for these values and the flag is set to 1 in the result column when one is found.

-

Statistical method

System.String

  • Average
  • Median
  • Average with standard deviation
  • Percentile 1
  • Percentile 5
  • Percentile 95
  • Percentile 99

-

Which statistical method should be used? The comparative values for the statistical method are entered in the input fields Value less than ... percent of the statistical method or Value greater than X percent of the statistical method.

-

Value less than ... percent from the statistical method

System.String

opt.

A percent value is entered here that is used by the selected statistical method.

-

Value greater than ... percent from the statistical method

System.String

opt.

A percent value is entered here that is used by the selected statistical method.

-

Values change as in relation to the previous value

System.String

  • Increase (compared to previous value) by more than ...
  • Increase (compared to previous value) by not more than ...
  • Increase (compared to previous value) by more than ... percent
  • Increase (compared to previous value) by more than ...percent
  • Decrease (compared to previous value) by more than ...
  • Decrease (compared to previous value) by not more than ...
  • Decrease (compared to previous value) by more than ...percent
  • Decrease (compared to previous value) by not more than ... percent

opt.

Here, you can select several methods for the report that refer to the previous data record. The comparative value is entered in the value change against previous value.

-

Value change against previous value

System.String

opt.

An absolute value is entered here for the selected method in the field Values change in relation to the previous value.

-

Ignore values with 0 as previous value

System.Boolean

opt.

Here you can select whether 0 values should be ignored as preceding values.

-

Result column

System.String

-

Name of the column where the result is written.

-

Remarks

  • This operator can analyze multiple columns in one run.
  • All calculations and comparisons are done separately, for each column, e.g. the average is calculated for every column not the average from multiple columns
  • All comparisons are conectet with an AND, ALL conditions have to be met
  • more detailed information see: Statistical methods
  • Value changed in comparison to the precursor: The data sets are used as they are given, there is no soting involved. The first data set will be used as a precursor, to prevent that it is counted as an outlier.
  • Problems with 0-Values:
  • 0-Values in the data set can produce a great change in value, so normal values can be flaged as outliers, so there is the setting Ignore values with zero as previous value.
  • By 0-Values with percentage variation a constant will be used to prevent divided by 0 errors, it will produce values in the billions,  this should describes an almost infinite slope.

Want to learn more?

Screenshot

Examples

Example: Value smaller than 5

Situation

  • In the value column from the data node A01, the data records which are smaller than 5 should be found:

Settings

  • Add the operation 'Outlier search' to data node A01.
  • Under Analyze, enter the following column (s) 'B'.
  • Under Value, enter less than '5'.

Result


  • Each data record is compared with the entered value; if the value of column B is less than 5, column C is set to 1.


Project File

-

Example: Value greater than 5

Situation

  • In of the value column from the data node A01, the data records which are larger than 5 should be found:

Settings

  • Add the operation 'Outlier search' to data node A01.
  • Under Analyze, enter the following column (s) 'B'.
  • Under Value, enter greater than '5'.

Result


  • Each data record is compared with the entered value; if the value of column B is less than 5, column C is set to 1.


Project File

-


Example: Median (smaller/greater)

Situation

  • In the value column from data node A01, the data sets, which are 20% smaller than the median, should be to be found:

Settings

  • Add data to node A01 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • Select the method 'Median' as the statistical method.
  • enter Value, less than ...% of the statistical method '20'.

Result


  • The median is calculated via column B. For each row, the difference between the value from column B and the median is calculated. If the difference is less than 20 percent of the median, column C is set to 1.


Project File

-

Example: Average (smaller/greater)

Situation

  • In the value column of node A01, all values which are smaller than the average by at least 21% should be found

Settings

  • Add data to node A01 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • Select the method 'Average' as the statistical method.
  • enter Value, less than ...% of the statistical method '21'.

Result


  • The average is calculated via column B. For each row, the difference between the value from column B and the average is calculated. If the difference is less than 21 percent of the average, column C is set to 1.


Project File

-


Example: Average with Standard deviation (smaller)

Situation

  • In the value column of node A01, all values which are smaller than the average with the standard deviation by at least 20% should be found

Settings

  • Add data to node A01 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • Select the method 'Average with standard deviation' as the statistical method.
  • enter Value, less than ...% of the statistical method '21'.

Result


  • First by using column B the average with standard deviation is calculated, later on the average is calculated and the values are compared, if the deviation is too big, the flag in column c is set to 1.


Project File

-


Example: percentile

Situation

  • In the B column from the data node A04, the data sets where the values are 1 percent smaller than the percentile 5 will be marked.

Settings

  • Add data to node A01 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • As an action, "Only retain data records WITH all criteria".
  • Select the method 'percentile 5' as the statistical method.
  • enter Value, less than ...% of the statistical method '1'

Result


  • In the B column from the data node A04, the data sets where the values are 1 percent smaller than the percentile 5 will be marked.


Project File

-


Example: Valid values

Situation

  • In the value column from the data node A03, the data sets which are  0, 10, 100 will be flagged.

Settings

  • Add data to node A01 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • As an action, "Valid value quantity", select  0,10,100
  • Select the method 'percentile 5' as the statistical method.
  • enter Value, less than ...% of the statistical method '1'

Result


  • The selected values wiill be searched in column B, if a match is found column C will be flagged.


Project File

-

Example: Increase to predecessor by more than...

Situation

  • In the B column from the data node A02, the data sets which values increases by more than 5 in relation to the predecessor will be marked:

Settings

  • Add data to node A02 to perform the operation of the outlier operation.
  • Under Analyze, enter the following column (s) 'B'.
  • As an action, "increase to predecessor by more than".
  • enter Value, more than ...% of the statistical method '5'

Result

All Values which are different by more than 5% from the predecessor are flagged.

  • In the left column 0-values are ignored.
  • In the right column 0-values are not ignored.

 
Left

 
right

Project File

-

Troubleshooting

Nothing known up to now.


Related topics