Data Mapping
At some point you will have data coming into your VSM workspace from different sources. Soon you will ask yourself: How do I map data coming to the same object from different sources? Or how do I cleanup duplicate data efficiently? Here's how you do it!
The VSM mapping logic
Every integration that LeanIX adds to the VSM product comes with it's own ID field, called external ID. The name of the field usually gives a notion of the type of integration.
Some Fact Sheet types will have multiple external IDs as multiple data sources are intended to write into one object. This also means one Fact Sheet will be referenced with different identifiers in different source systems.
Autogenerated ID
Each Fact Sheet comes with a type-specific, autogenerated ID. This ID will hold a unique identifier for each Fact Sheet, allowing you to reference the Fact Sheet from other places, e.g., via a tag in your Cloud Environment or some 3rd party tool.

The Software Artifact can have up to 6 IDs currently.
To map different sources onto one object, the external IDs need to be set correctly.
Removing duplicates
Especially when you use multiple integrations to discover information about your software artifacts, you will soon encounter duplicates, as the same object is discovered by different integrations but not properly mapped.

The same Software Artifact was discovered twice.
When you inspect the two Fact Sheets in detail you will realize that both of them use different external ID fields. In our scenario, the duplicate Fact Sheet is discovered via the Kubernetes integration and the initial Fact Sheet was created through the CICD integration.

The duplicate fact sheet with a Kubernetes ID

The original fact sheet with a CICD ID
Rerouting the duplicate is simple. You just take the ID from the duplicate Fact Sheet and paste it into the corresponding ID field on the original fact sheet.
In our example we will paste the Kubernetes ID orderFulfillment1 from the duplicate to the Kubernetes ID on the original Fact Sheet.

The correct mapping.
This way, you have created a simple mapping between Kubernetes and CI/CD by hand. All there is left to do is to remove the duplicate.
Cleaning up incorrect IDs
Incorrect IDs can manifest themselves in two different behaviors:
1. Fact sheets are created
You may encounter fact sheets with strange names or Ids as names. This is likely the cause of an incorrect mapping.

Fact sheet created due to a mapping mistake.
In this scenario there was a mapping mistake made, which caused the creation of the fact sheet "rder-flfilment-1.0".

Fact sheet with wrong CICD ID.
By inspecting the IDs on the Fact Sheet, you can identify where the Fact Sheet is coming from and what reference created the mismatch.
To resolve this issue you will have then go to the source of this integration and fix the ID there. In our scenario, we can look up the owner of the Order Fulfillment Service and contact them about the mapping issue.
2. Referencing warnings are thrown in the Sync Log.
If you navigate to the Administration --> Sync Log, you might find warnings thrown in your integration runs that are talking about External IDs.
Especially if these errors arise on a processor for relation creation, there is a high likelihood that the reference (External ID) the integration is trying to use a Fact Sheet that does not exist.
By copying the mentioned external ID from the error message and pasting it into the search, you can easily validate whether the fact sheet exists or not.
If it doesn't exist, it is very likely that the wrong ID is being referenced.
To resolve this issue you need to understand what the right ID is. Furthermore, you need to find the source. Once you have both these things you can trigger a correction of the ID and run the integration again.
Best practice on data maintenance
Over time, it can be challenging to keep data quality high and keep an overview of what data is properly mapped and where there are open to-dos.
For this reason, we introduced a data quality tag group.

The data quality tag group with the "reviewed" tag.
We suggest that you implement a review process for your key fact sheet types. Every fact sheet that you deem to have a correct mapping, you add the tag to.
On a regular basis, you can then search for software artifacts that do not have the tag to discover any non reviewed data. With this filter.

The facet filter set to not "Reviewed" fact sheets.
Mapping via the Manifest
Prerequisite
You need to use our native CI/CD connector API, either via a plugin or directly, to use this mapping approach.
With the manifest file that you establish in every repository, you have an ideal location to create and maintain a mapping of multiple sources onto the Software Artifact.
In the regular setup, every repository, and therefore every manifest file, represents one Software Artifact. Therefore, you can simply add additional key/value pairs to the manifest.yaml that are the IDs to the other source systems you are trying to map.
id: myfirstapp
name: myfirstapp
self: https://github.com/Yannik-Lacher/myfirstapp/blob/master/lx-manifest.yaml
description: First app to showcase CICD integration.
owner: [email protected]
links:
- name: Github
url: https://github.com/Yannik-Lacher/myfirstapp
kubernetesId: my-first-app
sonarQubeId: firstApp
muleSoftId: my-1st-app
Our CI/CD connector API will automatically pick up any additional data sent via the manifest file and transfer it to the workspace. With some simple custom integration API processors, you can then write the references onto the Software Artifact. Here is an example for the Kubernetes ID. Please also make sure the processor is configured as Processing Direction: inbound and Processing Mode: full
{
"processors": [
{
"processorType": "inboundFactSheet",
"processorName": "Software Artifact ID mappings",
"processorDescription": "Update the ID mappings for Software Artifacts based on the YAML manifest.",
"type": "Microservice",
"filter": {
"exactType": "service"
},
"identifier": {
"external": {
"id": {
"expr": "${content.id}"
},
"type": {
"expr": "cicdId"
}
}
},
"updates": [
{
"key": {
"expr": "kubernetesId.externalId"
},
"values": [
{
"expr": "${data.custom.kubernetesId}"
}
]
}
],
"enabled": true
}
],
"variables": {},
"executionGroups": [
"vsmCiCd"
]
}
To execute this processor whenever the CI/CD integration runs, you need to create a custom integration API processor set and add an execution group. To add the execution group, simply paste the following snippets onto the highest level of your processor configuration.
"executionGroups": [
"vsmCiCd"
]
You can inspect the Sync log to understand if your mapping is running without errors.
One-time mapping using Excel
Most of the methods above focus on solving individual mapping issues one by one. But especially towards the beginning of your VSM journey, you will face many mapping issues or duplicates at once. In such a scenario we recommend using our Excel export/import functionality to load the right mapping directly into the workspace.
In this example we will do a mapping of Kubernetes IDs onto existing Software Artifact Fact Sheets that were created through a CI/CD integration. However, the same method is applicable to any set of IDs that you want to map.
1. Filter a list of Fact Sheets in the inventory
To start out filter the inventory for the specific scope that you want to conduct a mapping for.

Filtered list of objects you want to map with.
2. Export the list with the empty column for the respective ID you want to add.

Export the mapping elements.
3. Fill out the Excel file with your ID mapping
You can use any trick in the book to make your life easier when adding the IDs into the Excel sheet. For example if your IDs follow a certain schema you can use a transformation rule on the name column or another ID column to auto-generate all the IDs. However, make sure to double-check them with the originals on your clusters.

Mapping completed in Excel.
4. Reimport the Excel to your workspace.
Once the Excel-file is complete, you can hit the import button and upload your file. It is recommended to do a test run before you actually apply the data, to understand if there are any errors. If all goes smooth, hit the import button, and you are done.

Conduct a test run and import the Excel file.
Regular updates
Over time, you will see incorrect mappings pop up again and again. This is only natural as human errors occur and new software artifacts pop up.
For this reason it can be helpful to introduce a data quality tag group. Whenever you see and incorrect or missing mapping, tag the respective fact sheet accordingly. On a regular cadence you can then do the method described above with all tagged fact sheets.
Advanced: Dynamic mapping from Kubernetes or Github to CI/CD
Prerequisite
- You need to use our native CI/CD connector API, either via a plugin or directly, to use this mapping approach.
- You have sufficient control and knowledge over your pipelines to execute some small code snippet at runtime.
- K8s/GitHub IDs respectively follow a specific pattern or are accessible from the pipeline.
We regularly see a combination of Kubernetes or GitHub Repository connector and CI/CD connector in use. With these unique combinations respectively, a more automated approach for ID mapping is possible.
If these conditions are met, you can apply the same manifest-mapping-approach as sketched out above, but in an automated manner.
While the pipeline is running you will have access to the manifest file. Also, you will have access to the Kubernetes/ GitHub ID respectively. Therefore, you can dynamically write the ID as a key/value pair into the manifest.yaml at build time.
Make sure to include this custom step before you run the CI/CD connector.
Thereafter, the approach follows the same steps as elaborated in the manifest mapping section above: create a custom processor and add the execution group.
Updated about 1 year ago