Building a Universal Data Mapper: Part 2

21 min readApr 30, 2019

UDM Reference Implementation

In continuation of the previous article titled Building a Universal Data Mapper: Part 1, as promised a reference implementation of the specification is ready.

UDM has design-time and run-time components. The design-time component is a UI-heavy functionality that helps you build the mapper with appropriate rules around data consideration, mapping & transformation, and chaining. The output of the design activity is a JSON construct describing the mapper definition. The run-time component is an engine that takes an input file along with optional reference files for lookups and diff processing and applies a mapper to produce mapped output.

UDM reference implementation has a web application interface that provides design-time functionality as well as a repository to hold mappers and jobs that can be managed from the portal. The portal supports multi-user functionality.

Apologies for a long write-up, all with the good intention of sharing details :)

Tech Stack

UDM’s design-time component requires heavy user interaction to define mapping and transformations. To facilitate rapid development, NodeJS and React framework was made based on the ease of development and community support with tons of modules supporting mundane to pretty interesting features. A JSON Db, file-based data store was used to store mapper definitions, jobs, users and other artifacts that are created and generated on the portal. Intentionally, the components are chosen to facilitate a quick prototyping exercise. Throughout the application, material-UI is used for UI design and numerous node modules (npm modules) are put to use. My sincere thanks to the contributors of those modules. At present, the reference implementation supports only Excel (xlsx) file format for input and output. Extending this to support other desired formats should not be an issue.

The concept and the architecture can be expanded to use more robust industry-standard components as appropriate. You are welcome to explore this for further enhancements. The reference implementation is open-sourced on GitHub at Git — Universal Data Mapper

Demo Flow

To describe the features and mapper design flow in the UDM tool, a standard SAP FBL5N (Customer Line Items) report as the mapping source and a custom report with extracts of customers, sales (transactions) and payment details as the mapping target are considered.

SAP FLN5N is a standard SAP transaction code available within R/3 SAP systems that provides transaction details as customer line items. The target format is a pure-custom template with the goal of extracting entity details with the scope of organizing data proprietary to a target system.

This write-up takes the approach of UDM walk-through with a sample mapper design in a step-by-step manner, aligned with the specifications described in the article Building a Universal Data Mapper: Part 1 of the series.

If and when you are ready with your own instance of this reference implementation running on your local machine or in a docker container, you would be able to sign up for an account and login. The same instance can be used to sign up new users.

So, let us begin.

A quickie on the terms used in UDM

Following are some of the terms used and referred to in the UDM.

Named Lookups — A named lookup source either internal (source file) or external source file to be used for looking up a value to return an attribute from the source file

Static Value — A plain static value that can be assigned to a target attribute

Lookup Value — A lookup value from the source sheet, a simple value lookup by attribute name

Empty Value — In conditional mapping, it is a check for empty value on a given attribute. On the preprocessors and condition evaluations, it represents the action of resetting the attribute value to empty value.

Skip Row — In conditional mapping, it represents the intent to skip the current source row (ignore row) in the mapping process

vLookup Name — A construct that allows you to specify the named lookup to be used for lookup purposes. Yes, you can have multiple named lookups defined and put to use as appropriate.

Simple Mapping — A simple mapping that assigns a source value to a target attribute

Conditional Mapping — A condition evaluates to TRUE or FALSE and assigns the result of the set condition

Switch Mapping — A traditional switch-case construct allowing multiple conditions to be evaluated to arrive at the desired value for the target attribute. Switch mapping also supports a default condition (a default value) to be assigned to the target attribute when all conditions fail.

Aggregated Mapping — A statistical aggregation of an attribute on the subset of rows grouped by a distinct mode of data consideration and assign the resulting value to the target attribute.

Custom Mapping — A developer friendly, JS code execution and assign the resulting value to the target attribute. There are conventions to be followed while attempting to access the source or target data (later)!

Finally — Even though the mapping and transformation constructs supported in the UDM would do the job, at times you may want an override facility to alter the value before it gets assigned to the target attribute. The finally option simply serves the purpose by providing a facility to execute a custom JS code snippet.

Joiner — Relevant when attempting to construct a value by concatenation of results from two or more conditions. It can be either a single character or a short sting, as appropriate.

Condition Joiners — When using multiple conditions for evaluation, this helps define how the results of the conditions are joined together. Joiner conditions can be either a logical condition (AND, OR) or an arithmetic condition (Add, Multiply, Subtract, Divide). Computed and Aggregated mapping supports this condition joiners as they support multiple conditions in them.

Stage 1: Mapper Specification

Let us begin by identifying the new mapper with a name. Followed by specifying the source and target templates.

The uploaded templates are used to identify the sheet and header composition of source and target data, any data (rows) if present will be ignored.

Note that the mapper design is a guided activity with a step-wise progression towards completing a mapper. At any point in time, you would be able to change the data source or target or other configurations — however, be aware that the change might reset certain dependent configurations allowing you to redefine the strategy.

You can either use the numbered step navigation or the prev-next navigation to move along the design process.

If the mapping is about tweaking and making minor changes to the source data, the target template can be set as the source itself. This can serve the use cases like filter & copy, data subsetting on source data. All you have to do is to checkmark the Reuse Source file option.

Occasionally, there could be a use case to lookup value for an attribute dynamically based on the value found on a specific attribute at runtime — (similar to the vLookup feature of Excel). For this purpose, you would be able to define the lookup source. There are two options available for lookup source definition.

a) The same source file (same or different sheet) can be used as the lookup source (Internal Lookup)

b) A distinct file that serves purely for the purpose of lookup can be uploaded and used at runtime (External Lookup)

Use of Named Lookups is on need basis and optional

Stage 2: Source Validation

UDM parses the source template and identifies the sheets with their attribute composition (from the first row on the sheet).

Though Excel identifies data attributes as a <rowNum, Colum>, UDM uses a notation of <rowNum, Col Name> making it easy to operate on a human-readable attribution.

In the Source Validation stage, UDM presents the identified sheets and attributes on each sheet.

Let us analyze some of the controls available on this screen.

Ignore Sheet — Mark the sheet as ignored for the mapping.

Ignore Sheet

Header Row — Typically, the first row carries the header names (column name attribute). If the header column is present in a row other than the first row, you can specify the row number and reload the sheet level configuration to load attribute names appropriately.

This is useful when the report contains meta-information that is not to be considered for mapping at the beginning of the report.

Merge Sheet — If the data in the current sheet need to be merged with another sheet for consolidated processing in the mapping, you can specify the merge destination sheet here.

Attribute Setting — On the set of attributes defined on the sheet, you can

a) Identify an attribute as the key (futuristic, scope and use case is evolving)

b) Identify the attributes that are required (not empty)

c) identify the attributes that are expected to have a unique value in the dataset

d) Most importantly, identify the data type expected for the attribute.

Currently, String, Number, and Date types are supported with String as the default data type.

Attribute-level Validation

Preprocessors — It is often useful to preprocess the source data to ensure that the corrections and amendments are made to the data before mapping. Sheet preprocessors serve the purpose of data cleanup and corrections with a list of standard preprocessors.

You can define multiple preprocessors — essentially a data reduction mechanism to derive at a focused dataset to be put through the mapping process.

Preprocessors will be executed in sequence as a pipeline with input for each preprocessor from the previous preprocessor

List of supported preprocessor functions

In the current exercise, let us define the following source validation rules.

Convert the attribute Name1 to uppercase
Mark the Customer and Name1 attribute as required
Posting Date, Clearing Date, and Document Date attributes to be of type Date
Amount in local currency attribute to be of type Number

Stage 3: Target Validation

UDM parses the target template and identifies the sheets with their attribute composition (from the first row on the sheet). Though Excel identifies data attributes as a <rowNum, Colum>, UDM uses a notation of <rowNum, Col Name> making it easy to operate on a human-readable attribution.

In the Target Validation stage, UDM presents the identified sheets and attributes on each sheet.

Let us analyze some of the controls available on this screen.

Ignore Sheet — Mark the sheet as ignored for the mapping. It simply means that the target sheet will not be presented in the Map & Transform stage.

Attribute Setting — On the set of attributes defined on the sheet, you can

a) Identify an attribute as the key (futuristic, scope and use case is evolving)

b) Identify the attributes that are required (not empty)

c) identify the attributes that are expected to have a unique value in the dataset

d) Most importantly, identify the data type expected for the attribute.

Currently, String, Number, and Date types are supported with String as the default data type.

In the current exercise, let us define the following target validation rules.

On the Sales sheet, update the data types of the following attributes:

Due Date as Date

Total Amount as Number

2. On the Payment Receipts sheet, update the data types of following attributes:

Payment Date as Date

Payment Amount as Number

3. On the Payment Receipt Details sheet, update the data types of following attributes:

Ref Document Date as Date

Payment Amount and Adjustment Amount as Number

Stage 4: Map & Transform

UDM presents the chosen target sheets for building mapping and transformation rules.

Note: that the target sheets that are marked as Ignore will not be presented as a candidate for mapping.

Start defining mapping and transformations as required for each of the target sheet attributes.

Formulate the strategy for mapping of each sheet:

a) Mapping Consideration

b) Identify the source sheet

c) Identify the data-consideration strategy

d) Define preprocessors

e) Define a mapping for the target attribute

f) Define transformation at attribute-level as appropriate

Mapping Consideration — Decide whether the target sheet requires mapping or to be ignored. If appropriate, enable the copy attributes by name in order to pre-populate the mapping of target attributes with the source attributes of the same name. This is a handy feature to save time and get a quick and simple mapping for attributes, use only when applicable.

Identify Source sheet — For a given target sheet, choose the source sheet where the data will be picked up for mapping.

Data Consideration Strategy — Identify a suitable strategy to consider rows of data from the chosen source sheet for mapping. Following strategies are available:

a) All Rows — Consider all rows from the source sheet. This will result in mapping every single row from the source sheet to the target sheet.

b) Distinct Rows — Consider rows from the source sheet with distinct values on the specified attribute.

c) Distinct Composite Rows — Consider rows that are distinct based on the composite values on the specified attributes (typically more than one)

An All Rows mode is a simple row-for-row mapping from source to target. However, the Distinct modes segregate the source data as groups based on the composite scheme and may result in multiple groups with each group having one or more rows in them. In such cases, the mapping would default to values found on the first row in the group. But, for number values — use of Aggregate mapping would consolidate the number values from all the rows in the group.

Use of Distinct and Distinct Composite modes for data consideration should be thoroughly evaluated for use case fitment. Otherwise, it might result in partial data processing.

Now, let us go mapping!

Each of the attributes in the target sheet can be mapped to attributes found on the source sheet. The mapping can range from a simple assignment to mapping with transformation. Wide range of transformations is supported — simple arithmetic, string manipulation to statistical aggregations along with conditional evaluation and executions. If the standard transformation options does not suffice, a custom transformation in the form of a plain JS script can be used as well.

a. Simple Mapping — Used for assigning static value, lookup value from a lookup source or a looked up value from the source sheet.

Assign a looked up value from the source sheet found on the specified attribute

Assign a looked up value from a pre-defined lookup source

Assign a static value to the target attribute

b. Conditional Mapping — Assigning a value for the target attribute based on a logical condition. This gives an opportunity to choose the value to be assigned to a target attribute based on the TRUE or FALSE outcome of the condition evaluation.

The condition could be based on a looked up value from the source sheet or a looked up value from a lookup source.

Conditions — Essence of conditional mapping is to evaluate a logical condition to arrive at a TRUE or FALSE result. Based on the mapping defined for the TRUE/FALSE conditions, the resulting value will be assigned to the target attribute. Following are the conditions supported in the conditional mapping.

List of supported functions for transformations

Result— A conditional mapping evaluates the specified condition and arrives at a TRUE or FALSE value. The TRUE and FALSE condition can be associated with its own evaluation construct ranging from Skip Row, Empty Value to Static, Lookup and Looked Up values.

c. Switch Mapping — Used for evaluating multiple conditions and assign a value from the condition that successfully evaluates to TRUE.

Note that the conditions supported for evaluation are the same as the list of functions supported in the Conditional Mapping.

They are Is Empty, Is Not Empty, Equals to, Not Equals to, Less than, Less than or equals to, Greater than or equals to, Starts With, Ends With, Contains and Substring.

d. Computed Mapping — Used for evaluating multiple conditions and assign the resulting values joined by appropriate join functions.

The supported conditions for evaluation are Uppercase, Lowercase, Concatenate, Substring, Convert to String, Convert to Number, MD5 Token, Add, Subtract, Multiply and Divide.

Ensure that the right functions are chosen based on the data types defined on the target and source attributes.

Join the result of the conditions via a Join function (concatenate)

Here is a list of functions supported in the Computed Mapping.

e. Aggregated Mapping — Used for evaluating multiple numeric functions and assign the resulting values joined by appropriate join functions.

The supported conditions are typical statistical and aggregation functions such as Sum, Average, Min, Max, Mean, Median, Variance, Standard Deviation, and Percentile.

Ensure that the right functions are chosen and data types defined on the target and source attribute is Number.

Join the result of the conditions via a Join function (Add)

Skip Conditions functionality allows you to skip rows from the aggregation activity. Use it as appropriate.

Skip Rows that have the amount value less than 0

The final form of Aggregated Mapping with Skip Condition

f. Custom Mapping — A pure JS code snippet execution and assigning the returned value to the target attribute.

Use of Finally functionality — an example

The final block executes the code snippet and the result is assigned to the target attribute

A bit about the source data template:-

It contains financial transactions like Invoices, Credit & Debit Notes and Payment Receipts
It also contains clearing details around the knock-off, association of one or more credit entries with one or more debit entries

On the same note, the target template expects the data to be organized in a manner

Extract distinct customer details referred in the source data in Customers sheet
Consolidate transactions of type Invoice, Credit and Debit notes in the Sales sheet
Consolidate unutilized payments (uncleared, i.e., Clearing Document reference is empty) and capture a record with the aggregated sum of receipt amounts in the Payment Receipts sheet
Aggregate the customer level outstanding on the Customer Outstanding sheet

It should be noted that the mapper needs to account for Cleared and Open receipts, which are distinct in nature and cannot be processed in a single-pass at the source data. Hence, another mapper is engaged specifically focusing on processing cleared documents and chained to the main mapper. The chained mapper processes

Consolidate utilized payments (cleared, i.e., Clearing Document reference is found) and capture a record with the aggregated sum of receipt amounts in the Payment Receipts sheet
Consolidate the transactional documents (Invoice, Credit and Debit notes) that are cleared by a clearing document in the Payment Receipt Details sheet

In essence, both the mappers operate on the same source but executes different logic and consolidate them into a single output file conforming to the specified target template.

Main Mapper — SAP FBL5N Mapper (Stage 1)

In the Main mapper, the focus is to map the open documents (not cleared)

Customers Sheet Mapping

Following mapping & transformation applied to generate Customers sheet

a) Choose Distinct rows from the source on Customer attribute

b) From the grouped records by the distinct key, a simple map for Customer to the Customer ID attribute

c) From the grouped records by the distinct key, a simple map for Name 1 to the Customer Name attribute

Rest of the attributes are not required, hence not mapped

Sales Sheet Mapping

Following mapping & transformation applied to generate Sales sheet.

Apply following preprocessors:-

i) Skip rows that are of payment types by checking the Document Type attribute value equal to DZ

ii) Skip rows that are cleared — non-empty value on Cleared Document attribute

a) Choose Distinct Composite rows from the source on Customer and Document Number attributes

b) A simple map for Document Number to the Doc # attribute

c) A switch map for on Document Type — If RV, assign Tax Invoice else Debit Note as the default value for the Type attribute

d) A simple map for Customer to the Customer ID attribute

e) A simple map for Name 1 to the Customer Name attribute

f) A simple map for Document Date to the Document Date attribute

g) A computed map for Due Date by adding 45 to the Document Date attribute

h) An aggregated map for Total Amount by summing the Amount in local currency values on the grouped records. Add a final block and check if the aggregated sum is found to be < 0, update the Type attribute on the constructed row with Credit Note.

i) A computed map for Doc Id by concatenating Customer and Document Number attributes

Rest of the attributes are not required, hence not mapped.

Payment Receipts Sheet Mapping

Following mapping & transformation applied to generate Payment Receipts sheet.

Apply following preprocessors:-

i) Skip rows that are of payment types by checking the Document Type attribute value equal to DZ

ii) Skip rows that are cleared — non-empty value on Cleared Document attribute

a) Choose Distinct Composite rows from the source on Customer and Document Number attributes

b) A simple map for Document Number to the Doc # attribute

c) A simple map for Customer to the Customer ID attribute

d) A simple map for Name 1 to the Customer Name attribute

e) A simple map for Payment Date to the Document Date attribute

f) An aggregated map for Payment Amount by summing the Amount in local currency values on the grouped records.

g) A computed map for Doc Id by concatenating Customer and Document Number attributes

Rest of the attributes are not required, hence not mapped.

Second Mapper: SAP FBL5N Mapper (Stage 2)

In the Main mapper, the focus is to map the cleared documents and populate the outstanding sheet.

Sales Sheet Mapping

The following mapping & Following mapping & transformation applied to generate Sales sheet.

Apply following preprocessors:-

i) Skip rows that are of payment types by checking the Document Type attribute value not equal to DZ

ii) Skip rows that are not cleared — empty value on Cleared Document attribute