3. Basic Terms

Environment(BT, RT)

Core Auto can manage multiple environments. Each environment has its own set of metadata tables, UNIX directories (for scripts and logs), sets of steps and connections, and processing parameters can be independently defined. The environments are independent of each other and each of them can run one BT and one RT manager. Apart from the obvious use of environments (prod, test, dev), developers can create private environments -sandboxes, in which they can prepare changes without disturbing each other.

Event(RT)

Event can be generated by any external system using collector endpoint. Event is recognized by event name, which must be defined inmetadata using GUI. Each event contains a logical source name (e.g., username or system name) and payload (JSON), which contains all details. Such payload is accessible by steps using collector endpoint. Event may be handled by single step or by multistep.

Step(BT, RT)

A step is an atomic element of processing. It can only be made as a whole. It is assumed that each step is restartable, i.e., it canbe repeated without worrying about the correctness of the processed data. The step definition includes its unique name, area,and path to the script (or scripts) with the code. In the case of a classic script (ASCII file), you also provide the name of the program (e.g., python3)to which the file name should be passed. If the given path leads to an executable program (or shell script), the program name is not provided.

Core Auto can start any script which can be executed form Unix level (or Windows if using Windows Agent).

Area(BT)

Steps can be divided thematically into areas. Each step is assigned to one area. Areas have an auxiliary purpose - in the case ofa large number of steps, they allow easier orientation in monitoring process - for example, you can pause and then resume the execution of indicated areas. Just like steps, they must have unique names. They also allow for a more convenient organization of script directories because the path defined within the area becomes the default directory for script's steps defined at the step level.

Multistep, stepsrelations(BT, RT)

Multistep consists of many steps. IN BT all steps are treated as one multistep without any name and Core Auto's goal is to complete all steps. For RT many multisteps can be defined for handling appropriate events.
A natural situation is the relationship between steps. If step A creates an object used in step B, the start of step B must wait until step A completes successfully. Certainly each step may depend on many others.

The figure below shows an example of the connectionsbetween several steps.

Core Auto will keep such relations during multistep execution. But sometimes some steps can be inactive – because status has been changed to Inactive or (BT only) step should be executed only on some days ofweek or month. In such a case Core Auto will ignore such step and generate substitute connections. For example, if step S3 is inactive, Core Auto will generate substitute connections marked in red.

In general, the reduction algorithm consists in rewriting the set of parents of the deleted step to the sets of parents of all steps dependent on the deleted step.

Generation

The division of steps into generations is performed automatically and its purpose is to facilitate tracking of processing in the GUI. Generations are defined recursively:
* Generation #1 contains only steps that are independent of the others
* Generation N contains steps dependent on steps from generations 1..N-1,and they must be dependent on at least one step from generation N-1. In other words, the step is qualified to the lowest possible generation.

The figure below shows an example multistep for which automatic division created five generations.

GCP (global critical path)

A global critical path (GCP) is a sequence of steps inwhich each (except the first) element is directly dependent on the previousone, and the sum of the execution times of these steps is the largest in theset of all such paths. GCP determines the minimum possible execution time because the steps belonging to GCP cannot be parallelized.

Below numbers in the circles show example execution times of the steps in minutes, and in red - 3-element GCP with a duration of 12minutes.

It is worth noting that GCP does not have to extend to the last generation. It may also "skip" some generations (3 in above case). However, it must start in the first generation, and one generation cannot have two elements, because the steps in the generations are independent of each other.

Generally, the goal of the Core Auto optimizer is to organize the processing to reduce it to the length of the GCP. To this end, Core Auto must ensure that there are no gaps between steps on the critical pathand that no step "exceeds" the last GCP step.

LCP (local critical path)

In practice, it may turn out that it is more important for the user to perform a step that does not necessarily belong to GCP as quickly as possible. If such a step is defined (dynamically via the GUI), CoreAuto determines the local critical path leading to the indicated step and changes the operating strategy to one that leads to the execution of the indicated step as quickly as possible. In the above figure the 5-element LCP leading to step S5a (9 minutes) is marked in green.

Host

A host is a server on which some steps should be executed. Initially only Loading Node is defined as a host.  If some steps must be executed on another server for some reason, such server must be added to the hosts list in GUI. Host is recognized be logical name not related with the real host name. Such spare host (called remote host) must have Postgres client installed and component of Core Auto called CAagnt started.

Agent

One agent supports all environments defined in CA. It is needed to start remote workers (separately for each environment) and also in emergency situations to interrupt suspended processes (e.g., due to deadlock).CAagnt is called with parameter which defines the logical remote host name – used only by Core Auto.

Core Auto does not require any remote host credentials. It uses Postgres mechanism (notify, listen) to communicate with Agent and remote workers.

Flag

Flags allow to define external conditions for starting a given step. For example, retrieving data from the source system may wait fora “file ready” flag. Any number of flags can be defined for each step. The Collector endpoint is used to set the flags.

KPI

KPI can be defined to check if the execution of the given workload stays within the given timebrackets.

SLA

SLA view groupsone or more KPI to see if workloads stay within the given SLA.

Business Area

Business Area groups steps into business steps (e.g., Step 1,2,3 means that OLAP Cube X refresh is ready). It allows to inform users whether given chain of executionis ready or what is the estimated time of the completion.