Part 4 – Convergence and Divergence

We now turn to event data over multiple objects – or Object-Centric Process Mining. Most processes operate on multiple distinct, related data objects.

This part introduces a simple, real-life like example of an ordering process that illustrates the various complexities of a process handling many different objects. We revisit how event data of processes over multiple objects differs from classical event logs and how to extract a classical event log from it. Finally, we discuss how extracting a classical event log transforms the information in the source event data in undesired ways. We specifically discuss which errors and false information is introduced due to the event log extraction.

This part and several subsequent parts of the course make use of the Chapter “Process Mining over Multiple Behavioral Dimensions with Event Knowledge Graphs” in the open access Process Mining Handbook.

  1. Example: Complexities of many real-life processes
    1. Exercise: Objects, Relations, Types, and Entities
    2. Reflection: which real-life processes have similar complexities?
  2. Recognizing Entities and Relations in Event Data
    1. Exercise: Recognize entities in the BPI Challenge 2017 event log
    2. Exercise: Recognize entities in ERP system data
  3. Classical Process Mining: Extracting Event Logs for a Single Case Identifier
    1. Exercise: Extract an Event Log Yourself – Order Process Example
    2. Event Log Extraction in Industrial Practice
    3. Advanced Exercise: Extract an Event Log Yourself – ERP data
  4. Convergence and Divergence
    1. Hands-On: see the consequence of convergence and divergence
    2. Case Study: Convergence and Divergence on ERP System data
  5. Convergence and Divergence cannot be solved with pre-/postprocessing
  6. Reflection
  7. Next steps: Prevent convergence and divergence from the start – object-centric process mining

Example: Complexities of many real-life processes

The video introduced in a small example that packs a lot of characteristics of a real-life process. The remainder of the course will build on this example. To fully understand it, read the section “A Second Look at Processes” in the open access Process Mining Handbook. The event data for this example is available as an event table.

Exercise: Objects, Relations, Types, and Entities

The process is complex because it operates over multiple related data objects. To better understand the situation we are looking at, create an Entity-Relationship Diagram or a UML class diagram for the objects in this process:

  • Write down which (data) objects are involved in the process. Which data objects have the same type?
  • Write down which (data) objects are related to each other. What is the type of each relation?

There are more “things” involved in the process than just the (data) objects. The term entity is typically used for things of interest. So, which other entities are there that are not objects in a strict sense?

  • Write down the non-object entities involved in the process. Which entities have the same type?

Reflection: which real-life processes have similar complexities?

Revisit the list of processes you noted down in Part 1 – What are Processes?. Which of your processes have similar characteristics as this example?

Recognizing Entities and Relations in Event Data

For someone working with event data and process mining techniques, it is an important skill to recognize entities and relations in event data, both in source event data and in event logs.

Read section “Multi-entity Event Data” of the Process Mining Handbook, it explains the basic concepts – and provides the necessary technical definitions – alongside the example of the order process.

The following two exercises show two other ways of how entities appear in event data.

Exercise: Recognize entities in the BPI Challenge 2017 event log

The table on the left is a subset of the BPI Challenge 2017 event log. This loan application process also handles multiple distinct entities. However, they are not stored in the data in the same way as the order process.

Which values identify different entities? Which features in the data (values/attribute names/…) define the type of an entity.

Recognize the entities and relations in the event data. Create an ER-diagram/UML class diagram for this example.

Hint: you should find at least 4 different entity types.

The event table for this example is also available as a file to load into Excel:

Exercise: Recognize entities in ERP system data

The tables on the left are a simplified example of an order process as it would be recorded in an ERP system. In practice, also the event data we have studied in the order process example in the beginning most would have been extracted from such a system.

This event data here is from a slightly different process.

Which values identify different entities? Which features in the data (values/attribute names/…) define the type of an entity.

Recognize the entities and relations in the event data. Create an ER-diagram/UML class diagram for this example.

Classical Process Mining: Extracting Event Logs for a Single Case Identifier

Classical Process Mining assumes that all events are grouped into sequences of events; the events in each sequence are assumed to be part of the same process execution. The default method for creating this grouping is to choose a case identifier, e.g., picking one column in the data as case identifier, each distinct value in this column defines a case. Then group all events to the case they “belong”. These steps are non-trivial.

The Section “Classical Event Log Extraction” of the open access Process Mining Handbook explains these steps alongside the running example of the order process.

Exercise: Extract an Event Log Yourself – Order Process Example

Take the event table from the order process example and repeat the steps of event log extraction yourself, for example in Excel.

Event Log Extraction in Industrial Practice

The following two chapters of the open access Process Mining Handbook give a more in-depth treatment of event log extraction.

Advanced Exercise: Extract an Event Log Yourself – ERP data

This exercise is for advanced readers.

Extract the event log for the ERP example data . Choose the “Sales Order”documents as your case identifier, i.e., you should have two cases S1 and S2.

  • First just extract the events for object creation and build an event log based on these events.
  • Next, you can also extract the other events and add them to the traces.

Convergence and Divergence

Event log extraction in event data where entities are in a 1-to-n, n-to-1 or n-to-m relation always cause Convergence and Divergence. The phenomenon was first observed in a Master thesis at TU Eindhoven:

“A divergent event log contains audit trail entries which execute the same activity on one process instances several times. In a database structure, this is presented as 1:n cardinality. For example the goods receipt in several parts of a purchase order.”

“A convergent event log contains audit trail entries which execute one activity on several process instances at once. In a database structure, this becomes clear as n:1 cardinality. For example the payment of several invoices related to several purchase orders.”

Segers, I. E. A. Investigating the application of process mining for auditing purposes. Master Thesis Eindhoven University of Technology. 31 Aug 2007 (pages 41-42)

Read Section “False Behavioral Information in Classical Event Logs” of the open access Process Mining Handbook which explains the phenomenon and the problem on the running example of the order process. Note: there is a small mistake in the published chapter of the Handbook, the terms convergence and divergence have been swapped. For clarity:

  • Convergence: the same activity is executed in multiple process instances at once, i.e., the event is duplicated, and statistics are inflated (counting more created orders than existed in reality).
  • Divergence: for one instance, multiple executions of the same activity are observed – but they belong to different objects, i.e., the ordering of activities observed in the trace is wrong

Hands-On: see the consequence of convergence and divergence

Download the event log extracted from the order process event table

  • Import the event log into a process mining tool of your choice.
  • Identify the consequence of convergence and divergence in the event data.
  • Compare the analysis insights generated with the ground truth that you have about the process and note down any differences you observe.

Case Study: Convergence and Divergence on ERP System data

The top part of the poster on the left and the following blog post explains the consequence of convergence and divergence on ERP Ssytem event data.

Artifact-Centric Process Mining for ERP-Systems with Multiple Case Identifiers

This post also explains one of the first object-centric process mining techniques that had been implemented to prevent convergence and divergence.

Convergence and Divergence cannot be solved with pre-/postprocessing

The convergence and divergence problem is fundamental to event logs and cannot be reliably solved by filtering in the event log or post-processing the analysis results.

  • Convergence: replicating the same event into multiple traces is inevitable if the activity that was executed belongs to the case. Comparison with the ground-truth relational data allows to detect the replication and correct counting statistics. But performance analysis is harder to fix: what is correct average waiting time for an activity if it is replicated into multiple different cases? For a single replicated event, this is the minimum waiting time in all cases where the event occurs. But globally, we cannot take the minimum waiting time as the average. This is just a simple example.
  • Divergence: this is the worse problem of the two. Divergence introduces false behavioral observations or dependencies in the data and into the model. As a consequence, in almost every real-life dataset that suffers from divergence, over 50% of the directly-follows edges in a process map are wrong, i.e., they show a step from from one activity to the next that didn’t occur. Worse, these wrong edges are also often more frequent than the correct edges. See

Reflection

  • Revisit the process you noted down in Part 1 – What are Processes?. Which analysis questions you noted down for your processes might suffer from convergence and divergence?
  • If you have already worked with event data involving multiple entities: Have you experienced analysis issues that can be traced to convergence and divergence? How much effort and time did it cost you to resolve them? Could you resolve them?
  • Feel free to extend the list of processes you noted down if the discussion above gave you new inspiration. It will be useful for the coming parts.

Next steps: Prevent convergence and divergence from the start – object-centric process mining

The next parts of the course discuss how to prevent convergence and divergence from occurring right from the start. The fundamental step we have to make is: abandon the unique case identifier and the classical event log.

The consequence is that we have to re-think process mining techniques from the ground up as every tool and technique we applied so far either no longer applies or needs serious reconsideration. The techniques and methods that come with this are so fundamentally different that they require a distinct name: object-centric process mining. The next parts are all about this paradigm.

< Part 3 – Detecting Emergent DynamicsPart 5 – Event Knowledge Graphs >

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s