Part 5 – Event Knowledge Graphs

The previous Part 4 showed the Convergence and Divergence problem. In essence: grouping events from multiple different objects into the same trace falsifies the information in the data. Applying process mining on such false data leads to false results and insights.

The consequence is that we have to re-think process mining techniques from the ground up as every tool and technique we applied so far either no longer applies or needs serious reconsideration. The techniques and methods that come with this are so fundamentally different that they require a distinct name: object-centric process mining. This and the next parts are all about this paradigm. And we have to start by discussing the data models that can correctly model events over multiple objects/entities without convergence and divergences.

This part introduces not only a different data model to correctly model the behavioral information over multiple data objects, but also an easily accessible technology stack to build your first own object-centric process mining analysis.

  1. Data Models for Event Data – a short review
    1. Exercise: extracting traces per object
  2. Event Knowledge Graphs: A graph-based data model for events and entities
    1. Reading: definition and construction of Event Knowledge Graphs
    2. Exercise: your first event knowledge graph
    3. Exercise for Process Mining Professionals: your first event knowledge graph for ERP System data
  3. Hands-On: Build your own object-centric process mining analysis using the neo4j graph database
    1. Analysis through querying
    2. New performance measures
    3. More queries
    4. Object-centric process discovery
  4. Industrial Case Studies with Event Knowledge Graphs
  5. Further Tutorials, Data, Tools

Data Models for Event Data – a short review

The problems of convergence and divergence originate from flattening and denormalizing the relations between different entities into a flat event log of global sequences. The fundamental idea to prevent this problem is to not construct a single sequential trace for the entire process execution. Instead, we construct for each data object its own individual trace.

This idea of separating event data per data object has been studied extensively for at least 12 years, likely longer, leading to multiple proposals for various data models.

This Background section covers

  • fundamental data features that have to be recorded in any event data format to enable process analysis,
  • requirements for data models for event data that enable analysis over multiple data objects and while avoiding convergence and divergence (based on an extensive literature analysis)
  • a review of existing data models (from event tables with multiple identifiers to triple stores)

The fundamental idea of all effective data models for events over multiple objects is to either explicitly store one trace per object or to provide data operations to construct traces per object as needed. The traces of the different objects meet or synchronize in shared events – for example those that update two different objects. This idea can be expressed and implemented in various forms and data models.

Exercise: extracting traces per object

Extract the object-traces for each of the objects in the running example of the Order Process introduced in Part 4 – Convergence and Divergence.

  • Download the event table (Excel format) of the order process: example_order_process_event_table_orderhandling.xlsx
  • Open the file in a Spreadsheet program.
  • For each object type (Order, Supplier order, Item, Invoice, Payment)…
  • … and for each object identifer per type (O1, O2) then (A,B) etc …
  • filter the table to only show the event records containing the object identifier (e.g., all rows where O1 occurs, all rows where X1 occurs alone or with other values)
  • each filtered-down table is the trace for one object.

In terms of relational databases, this is a selection operation on the rows related to one objects. In terms of event logs (where we consider sequences of events), this is a projection operation on the elements of the sequence related to the same object.

Notice that through this filtering, we avoid divergence as each object-trace only contains the events related to this very object and no events of unrelated objects appear. But we are still creating multiple copies of events that are related to multiple objects. For example, e7 occurs in the object-trace of O2 and of B. In reality, both O2 and B synchronize in this shared events e7.

To also avoid convergence, we have to use a data model that contains each event only once and can express that this event is part of two different traces.

In the following, we discuss a graph-based data model. Graphs are a natural data model for such structures. By materializing the event data and the object-traces in a graph form, we also avoid the need to repeatedly filtering the source data to obtain the object traces. At the end of this part, we point to various other ideas and implementations of object-centric process mining.

Event Knowledge Graphs: A graph-based data model for events and entities

Technically, we turn to a particular form of knowledge graphs (labeled property graphs). We show how to use knowledge graph concepts to

  • naturally model that one event is related to (or operates on) multiple different data objects – resulting in a network of events and objects
  • naturally model for trace of each object as a path of events in the graph – resulting in a network of event paths.

Another benefit is that knowledge graph technology has been researched and developed extensively and we can rely on established data stores and tooling.

Regardless of the specific technical flavor (specifically labeled property graphs, knowledge graphs, or graph-based data models at all), using this (or similar) data models is the fundamental prerequisite and enabler for Object-Centric Process Mining. The following video introduces the idea of event knowledge graphs on our running example.

Reading: definition and construction of Event Knowledge Graphs

Read the Section “Event Knowledge Graphs” of the open access Process Mining Handbook to understand the concepts explained in the video, specifically:

  • Labeled Propety Graphs
  • the node and edge types of Event Knowledge Graphs
  • the steps for constructing an Event Knowledge Graph from an event table
  • directly-follows paths in event knowledge graphs.

Exercise: your first event knowledge graph

The video shows the steps of constructing an event knowledge graph on a small example. To get fully familiar with the concept

Exercise for Process Mining Professionals: your first event knowledge graph for ERP System data

Consider the simplified ERP system data in the tables on the left. Can you adapt the steps for event knowledge graph construction to this example? Some hints:

  • First transform the relational data into event tables. But instead of creating one global event table under a single case identifier, create multiple object-type specific event tables, e.g. one for Sales Orders, one for Return Orders, etc.
  • Then “import” the events from all tables.
  • Then continue with inferring entities.

Can this be done differently? Faster?

Hands-On: Build your own object-centric process mining analysis using the neo4j graph database

Event Knowledge Graphs use Labeled Property Graphs as their underlying generic data model. This data model in turns is supported by a variety of graph database systems. One of them is Neo4j which is freely available for private use and developers. This makes this platform a good starting ground for getting familiar with event knowledge graphs and doing your first object-centric process mining analysis

Analysis through querying

Having built the event knowledge graph, your next step is to query it for analysis. Here are some possible analysis tasks for the data. Formulate the Cypher corresponding cypher queries to answer them:

  • All events related to Order O1
  • All events related to Order O2 and a Supplier Order
  • All events where at least two Item entities are involved.
  • The Item which had the shortest time between Unpack and Pack Shipment
  • Which object had the longest overall waiting time between any two subsequent steps? What was this waiting time?
  • Supplier Order had 3 subsequent Unpack events for 3 different Items
  • Which events of which objects precede each Pack Shipment events?
  • Which events followed Update SO? Which objects were affected by it?

Which other kinds of analysis questions can you think of?

    New performance measures

    Once we are in an object-centric “world”, other performance analysis questions arise. Traditionally, we measure waiting time and soujourn time. For steps where multiple objects synchronize in the same event, such as batching in the Pack Shipment activities, new performance measures are needed. The following paper defines such new measures:

    Study Figure 2 of this paper and answer the following question for the example using Neo4j/Cypher

    • What is the minimum, average, maximum waiting time for Pack Shipment?
    • What is the minimum, average, maximum pooling time for Pack Shipment?
    • What is the minimum, average, maximum lagging time for Pack Shipment?
    • What is the minimum, average, maximum synchronization time for Pack Shipment?
    • What is the minimum, average, maximum flow time for Pack Shipment?

    More queries

    This part of the course is still under development. More material will be added in the future. In the meantime, here are some pointers

    Object-centric process discovery

    A later part will cover object-centric process discovery in more detail. But if you are interested, here are two tutorials for how to perform object-centric process discovery by simple aggregation in graph databases: multi-object process maps and proclet models

    Industrial Case Studies with Event Knowledge Graphs

    Although the concept of event knowledge graphs is rather young, they have already been applied succesfully in industrial case studies (all open access):

    Further Tutorials, Data, Tools

    The field of object-centric process mining is just emerging. Many other tools and ideas are being developed. Explore them to get familiar with the field:

      < Part 4 – Convergence and Divergence Part 6 >

      Leave a Reply

      Fill in your details below or click an icon to log in:

      WordPress.com Logo

      You are commenting using your WordPress.com account. Log Out /  Change )

      Twitter picture

      You are commenting using your Twitter account. Log Out /  Change )

      Facebook photo

      You are commenting using your Facebook account. Log Out /  Change )

      Connecting to %s