Overview on approaches to make TIS faster by proper node and operation use


Approach

Description

Pro's

Con's

      Reduce size of data set in SQL

      Build corresponding SQL Statements in order to use database mechanisms to select & aggregate

      Reduces data

      In Version 5.1. one has to go into the operator in [TIS]Editor and type

          Have several data sets and switch

          Have a data set for each part of the job

          Saves massively

          In Version 5.1. one has to go into the node in [TIS]Editor and select

              Pre-process data

              Separate the processing in several stages.

              Saves massively

              Can become cumbersome when updating data

                  Pre-process data with Solution Runner

                  (WORK IN PROGRESS)

                  Saves massively

                  Can become cumbersome when updating data

                      Make TIS faster by proper node & operation use

                      Consider design recommendations – See: Make TIS faster by proper node & operation use

                      Often easy to do

                      Can become cumbersome with complex data


                      Tactical issues in making TIS faster


                      Eliminate columns as early as possible

                          Reduces data to be stored and retrieved

                          Filter as early as possible

                              Reduces data to be stored and retrieved

                              Keep number of nodes small

                                  For each data node, its result table is stored. By having many operations in the same data node, less data has to be stored and retrieved.

                                  Keep texts that are used very often short

                                      Reduces data to be stored and retrieved

                                      Merge order

                                          Within theMerge data
                                              operator use those columns first that restrict the result set most strongly.


                                              Examples for fast and slow Filter and Merge use


                                              Filter

                                              SLOW: Two nodes A: Result of Merge 1000.000 rows
                                              B: refers to A - filter has a result set of 1.000 rows

                                               FAST: One Node
                                              Result of Merge 1000.000 rows
                                              Result of filter has 1.000 rows

                                               

                                              RESULT: Filtering directly in the node saves storage of 1mio records.

                                               

                                              Merge

                                              ASSUMPTION: DATE-Column with no duplicates and ID-Column with a constant value

                                               

                                               

                                              SLOW: Selection in DataMerge DATE, ID

                                              FAST: Selection in DataMerge DATE, ID

                                               

                                              If ID's are combined first then the Join leads to an enormous number of rows which are reduced later on. This can be avoided by having DATE as the first column to be used.

                                               


                                              Develop faster by test data nodes


                                              Idea

                                              • Prepare small test nodes to facilitate design. Only when everything seems to work fine, switch to the complete data-set

                                              Examples how to build test data sets

                                              • Prepare you SQL statement in a way so that it is easy to restrict the number of rows given back
                                              • Prepare additional data nodes, where you filter large data sets down to one identifier or a short period of time (with many identifiers) – depending on what you develop.