Summary

Merge columns from the current table with columns from another data node's result table based on one or more key columns. This operator implements various join operations for relational databases.

Configuration

Settings

Name

Value

Opt.

Description

Merge the following columns with columns from the data source

System.String

opt.

A sequence of columns of the current table that serve as keys, i.e., that will be used to match rows of the other table.

The number of columns in this parameter and the type of the columns must equal the number of columns in parameter "Columns used as key in data source 1" and their type.

Data source

System.Int32

-

The name of the data node whose result table is to be merged with the current table.

Columns used as key in data source 1

System.String

opt.

A sequence of columns of the result table of the data node specified in parameter "Data source" that serve as keys, i.e., that will be used to match rows of the current table.

The number of columns in this parameter and the type of the columns must equal the number of columns in parameter "Merge the following columns with columns from the data source" and their type.

For further information see also Column references

How should the data be merged?

System.String

  • Intersection (same keys)
  • Add rows in data source 1
  • Add rows in data source 2
  • Add rows in both data sources
  • Append (same table structure required)
  • Keys only used in data source 1?
  • Keys only used in data source 2?
  • Keys only used in one data source
  • Cartesian product
  • Rows with common key in data source 2
  • Rows without common key in data source 2

-

An option that defines which rows and which columns from which source(s) are to be included in the resulting table.

See table "Overview of merging options and comparison to set theory and SQL" below for details.

See table "Examples" at the bottom of this page for examples for each of the options.

Ignore Time?

System.Boolean

-

If checked, comparisons will only consider the date and ignore the time of keys that are of type Date.

Generate blank row if nothing has been found

System.Boolean

opt.

If checked, a table containing a single row is returned instead of an empty table.

This row contains NULL in numerical columns and date columns and the text defined in parameter "Text fields content for blank line" in text columns.

Text fields content for blank line

System.String

opt.

The text to be inserted in text columns if parameter "Generate blank row if nothing has been found" is checked.

Double-values in key columns

  • No warning
  • Warning
  • Round to 3 decimals places
  • Round to 6 decimal places
  • Round to 9 decimal places

-

Default is Warning

If columns of data type double are used as Identier than one of the comparison behaviours must be selected.

Want to learn more?

Merge data from the current table with data from another data node's result table based on one or more key columns. This operation results in a table containing rows with matching keys and the set of columns defined in parameter "How should the data be merged?".

Overview of merging options and comparison to set theory and SQL

The following table describes the merge options set in parameter "How should the data be merged?" in detail. These options define which rows and which columns of the two tables are contained in the result table and how they are merged.

Keys play a central role in the merge for all options except "Append (same table structure required)" and "Cartesian product". They are responsible for the selection of rows from the two data sources. The keys are defined in parameters "Merge the following columns with columns from the data source" and "Columns used as key in data source 1". For each data source, more than one column can be defined as key. The number and types of keys set for each data source have to match. Keys given as Integer are treated as Decimals, therefore columns of these types also match.

Note that key values in the data sources are not necessarily unique, i.e., a key can occur in more than one row of a table. In this case, the set theory definition in the table is not adequate for options 1 to 4, 10, and 11. In this case, the result table contains the cartesian product of multiply occurring keys. E.g., if table A contains two occurrences of key A.1 and table B two occurrences of B.1, then the result table contains four rows merging each row with key A.1 with each row with key B.1.

The column "Set theory" in this table describes how the rows are selected according to the keys, i.e., which set of keys is contained in the result table.

The column "Result structure" describes which columns are included in the selected rows (described in column "Set theory"). For better readability, the term "key" is used although a sequence of keys, i.e., multiple columns, can be defined as key. The suffix "fields" refers to all fields of a selected row except those fields that are used as key.

For each option, the table also shows a rough correspondence to an SQL statement,

The current table is referred to by "A" and the table from another data node is referred to by "B".



How should the data be merged?

old name

Deutsch

alte Bezeichnung

SQL 

Set theory

Result column structure

Description

1

Rows with matching identifiers (columns of both data nodes)

Intersection (same keys)


Zeilen mit übereinstimmenden Identifizierern (Spalten beider Datenknoten)

Schnittmenge (gleiche Schlüssel)    

SELECT * FROM A INNER JOIN B 
ON A.key = B.key

A.key n B.key (Intersection)

A.key; A.fields; B.fields

The result table contains a row for each key that occurs in both tables. For each row, all columns from both tables are added. Except for the key column, columns with the same name are renamed.

2

Add to rows of current data node

Add rows in data source 1

Zeilen des aktuellen Datenknotens ergänzen

Zeilen aus Datenknoten 1 ergänzen

SELECT * FROM A LEFT OUTER JOIN B 
ON A.key = B.key

(A.key n B.key) u A.key

A.key; A.fields; B.fields

The result table contains a row for each key that occurs in the current table. To each row, all columns from both tables are added. Rows with keys in both tables are merged. Except for the key column, columns with the same name are renamed.

3

Add to rows of incoming data node

Add rows in data source 2

Zeilen des eingehenden Datenknotens ergänzen

Zeilen aus Datenknoten 2 ergänzen

SELECT * FROM A RIGHT  OUTER JOIN B 
ON A.key = B.key

 (A.key n B.key) u B.key

A.key; A.fields; B.fields

The result table contains a row for each key that occurs in the other data node's table. To each row, all columns from both tables are added. Rows with keys in both tables are merged. Except for the key column, columns with the same name are renamed.

Add to rows of both data nodes

Add rows in both data sources

Zeilen beider Datenknoten ergänzen

Zeilen beider Datenknoten ergänzen

SELECT * FROM A FULL OUTER JOIN B 
ON A.key = B.key

 A.key u B.key (Union)

A.key; A.fields; B.fields

The result table contains a row for each key that occurs in either of the two tables. For each row, all columns from both tables are added. Rows with keys in both tables are merged. Except for the key column, columns with the same name are renamed.

5

Append (same table structure)

Append (same table structure required)

Aneinanderhängen (gleiche Tabellenstruktur)

Aneinanderhängen (nur bei gleicher Tabellenstruktur) 

SELECT * FROM A
 UNION
 SELECT * FROM B

n/a

A.key; A.fields

The result table of the other data node is appended to the current table. Both tables must have the same schema. 

 6

Number of rows occurring only in current data node

Keys only used in data source 2?

Anzahl Zeilen ausschließlich im aktuellen Datenknoten

Schlüssel aus Datenknoten 1 kommen nur dort vor?

SELECT A.key, COUNT(A.key) 
 FROM A 
 WHERE A.key NOT IN (SELECT B.key FROM B) 
 GROUP BY A.key

A.key \ B.key (Difference)

A.key; #

For all rows of the current table whose key also occurs in the other data node's table, the result table contains the key and its number of occurrences  in the current table.

 7 

Number of rows occurring only in incoming data node

Keys only used in one data source

Anzahl Zeilen ausschließlich im eingehenden Datenknoten

Schlüssel aus Datenknoten 1 kommen nur dort vor?

SELECT B.key, COUNT(B.key)
 FROM B 
 WHERE B.key NOT IN (SELECT A.key FROM A)
 GROUP BY B.key

B.key \ A.key (Difference)

B.key; #

For all rows of the other data node's table whose key also occurs in the other data node's table, the result table contains the key and its number of occurrences  in the current table.

 8

Number of rows without reference in the other data node

Keys only used in data source 1?

Anzahl Zeilen ohne Bezug im anderen Datenknoten

Schlüssel ohne Bezug im anderen Knoten

SELECT A.key, COUNT(A.key) 
 FROM A 
 WHERE A.key NOT IN (SELECT B.key FROM B) 
 GROUP BY A.key
 UNION
 SELECT B.key, COUNT(B.key)
 FROM B 
 WHERE B.key NOT IN (SELECT A.key FROM A)
 GROUP BY B.key

(A.key u B.key) \ (A.key n B.key)

(Symmetric difference)

A.key; #

For all rows whose keys occurs only in either of the tables, the result table contains the key and its number of occurrences  in the respective table.

 9

Cartesian product

Kartesisches Produkt

SELECT * FROM A, B

n/a

A.key; A.fields; B.key; B.fields

The result table contains the cartesian product of the current table and the other data node's table.

10 

Rows with identifier in incoming data node

Rows with common key in data source 2

Zeilen mit Identifizierer im eingehenden Datenknoten

Zeilen mit gemeinsamen Schlüssel in Datenknoten 2

SELECT * FROM A 
WHERE A.key IN (SELECT B.key FROM B)

A.key n B.key

A.key; A.fields

The result table contains all rows of the current table, whose key is contained in the other data node's table.

 11

Rows without identifier in incoming data node

Rows without common key in data source 2

Zeilen ohne Identifizierer im eingehenden Datenknoten

Zeilen ohne gemeinsamen Schlüssel in Datenknoten 2

SELECT * FROM A 
WHERE A.key NOT IN (SELECT B.key FROM B)

A.key \ B.key

A.key; A.fields

The result table contains all rows of the current table, whose key is not contained in the other data node's table.

More than one key column

SELECT * FROM A LEFT JOIN B ON {A.X <> B.Y AND A.X2 <> B.Y2}

Examples

Example: Combine two tables using "Intersection"

(Examples for other options see bottom of this page.)


The following tables should be merged:

A01

A02

Settings

  

Result

Each row of A01 is compared with each row of A02. If the values of the columns indicated in the first parameter and the third parameter (i.e.,the keys) coincide,
then the respective row of A01 is completed with the values of the matching row of A02 and shown in the result table.

Project-File

Confluence Op Merge Data.gzip

Example: Boss table

Tow tables should be merged in order to better visualize who is whose boss

The following tables should be merged:

A01

A02

Employee table

Boss table

Settings

In the node with table A01.

Result

Project File

-

Examples: Merge options in the Operator

Calculation method

Data nodes to be merged

Settings

Result

Rows with matching identifiers (columns of both data nodes)

  • old name: Intersection (same keys)


Zeilen mit übereinstimmenden Identifizierern (Spalten beider Datenknoten)


  • alte Bezeichnung: Schnittmenge (gleiche Schlüssel)


A01:

A02:


Each row of A01 is compared with each row of A02. If the values of the columns indicated in the first parameter and the third parameter (i.e.,the keys, columns A of both tables) coincide,  then the respective row of A01 is completed with the values of the matching row of A02 and shown in the result table.


Add rows in data source 1

A01:

A02:

Each row of A01 is compared with each row of A02. If the values of the columns indicated (i.e., the keys, columns A of both tables) coincide, then the respective row of A01 is completed with the values of the matching row of A02 and shown in the result table. All remaining rows of A02 are also added to the result. Fields of these rows whose columns are not contained in A02 are completed with NULL  (denoted by "–", see column "Date").


Add rows in data source 2

A01:

A02:

Each row of A01 is compared with each row of A02. If the values of the columns indicated (i.e., the keys, columns A of both tables) coincide, then the respective row of A02 is completed with the values of the matching row of A02 and shown in the result. All remaining rows of A02 are also added to the result. Fields of these rows whose columns are not contained in A02 are completed with NULL (denoted by "–", see column "Value").

Add rows in both data sources

A01:

A02:

Each row of A01 is compared with each row of A02. If the values of the columns indicated (i.e., the keys, columns A of both tables) coincide, then the respective row of A01 is completed with the values of the matching row of A02 and shown in the result table. All remaining rows of A01 and A02 are also added to the result. Fields of these rows whose columns are not contained in the other table are completed with NULL  (denoted by "–", see columns "Date" and "Value").

Append (same table structure required)

A01:

A03:


Each row of data node A03 is appended to the table of data node A01.


Keys only used in data source 1

A01:

A03:

Each row of A01 is compared with each row of A03 according to their keys (columns A in both tables). Each key which is present only in data node A01 is shown in the result together with its number of occurrences.


Keys only used in data source 2

A01:

A03:

Each row of A01 is compared with each row of A03 according to their keys (columns A in both tables). Each key which is present only in data node A03 is shown in the result together with its number of occurrences.


Keys only used in one data source

A01:

A03:

Each row of A01 is compared with each row of A03 according to their keys (columns A in both tables). Each key which is either only present in either data node A01 or in data node A03 is shown in the result together with its number of occurrences in column "Number", the table where it was found in column "table", and the row number(s) of the table where the occurrences where found in column "Occurrence".


Cartesian product

A01:

A02:

Each row of A02 is appended to each row of A01.

Rows with common key in data source 2

A01:

A02:

Those rows of data node A01 whose values in column A are also present in column A of data node A02 are shown in the result.

Rows without common key in data source 2

A01:

A02:

Those rows of data node A01 whose value in column A is not present in column A of data node A02 are shown in the result.



Troubleshooting

Frequent Problems

Solutions

Caution: You're comparing floating point numbers, which can yield unintended results. Please create text columns of the respective columns using ToString("F3").

Columns are not entered in corresponding order

e.g.

  • node A00: column A = date, column B = number
  • node B00: column A = number, column B = date
  • In this case you can not enter A,B and A,B BUT A,B and B,A

Wrong selection from the list of nodes (this may sound unlikely but it is not)

The tricks explained under Data Node can help.

There is a text in the background, even if it does not look like it. e.g. with Fehlerknoten (old wiki)


There are different data types in the data nodes (e.g. numbers, once imported as text, once imported as number)

The Formula operator (rowwise) can help converting the data to a common data type.

TOO MANY colums are marked for comparison. Among them there are columns which do not have common keys (even if e.g. intersection was selected).


War rot markiert im Wiki: In combination with other operations (e.g. Identifier instances) if it should be used with or without consideration of upper and lower case letters.


War rot markiert im Wiki: Column order should be chosen so that the strongest restrictions are located in the beginning. Typically, FROM-TO should be in the beginning of the list of keys.


Related topics