How to work with large data sets
Overview on approaches to make TIS faster by proper node and operation use
Approach | Description | Pro's | Con's |
Build corresponding SQL Statements in order to use database mechanisms to select & aggregate | Reduces data | In Version 5.1. one has to go into the operator in [TIS]Editor and type | |
Have a data set for each part of the job | Saves massively | In Version 5.1. one has to go into the node in [TIS]Editor and select | |
Separate the processing in several stages. | Saves massively | Can become cumbersome when updating data | |
(WORK IN PROGRESS) | Saves massively | Can become cumbersome when updating data | |
Consider design recommendations – See: Make TIS faster by proper node & operation use | Often easy to do | Can become cumbersome with complex data |
Tactical issues in making TIS faster
Eliminate columns as early as possible | |
Filter as early as possible | |
Keep number of nodes small | |
Keep texts that are used very often short | |
Merge order |
Examples for fast and slow Filter and Merge use
Filter | SLOW: Two nodes A: Result of Merge 1000.000 rows | FAST: One Node |
| RESULT: Filtering directly in the node saves storage of 1mio records. |
|
Merge | ASSUMPTION: DATE-Column with no duplicates and ID-Column with a constant value |
|
| SLOW: Selection in DataMerge DATE, ID | FAST: Selection in DataMerge DATE, ID |
| If ID's are combined first then the Join leads to an enormous number of rows which are reduced later on. This can be avoided by having DATE as the first column to be used. |
|
Develop faster by test data nodes
Idea |
|
Examples how to build test data sets |
|