The hidden assumptions and forgotten topics of Process Mining, part 2 (Deviations, Patterns, Features)

Also this year, we are organizing the 3rd International workshop on Event Data and Behavioral Analytics (EdbA’22) at the ICPM’22. Part of our workshop philosophy is to dedicate half of the time to open discussions on the topics of the workshop. The discussion often turns to what makes the problems we are looking at difficult – and it’s usually hidden assumptions and under-researched topics. The second and third session on “Deviation Analysis” and “…beyond Control-Flow” surfaced the following topics (for which I managed to record a few short bullet points and questions).

See also The hidden assumptions and forgotten topics of Process Mining, part 1 (Human Behavior) for my notes on “Human Behavior”.

Conditions in process discovery/conformance checking

  • How can we be sure that data conditions are driving the choice between two options and that we did not miss other confounding variables (in the event log itself or in the analysis) that may have influenced the choice?
  • Are we analyzing something that is specific to the case or something that is affecting all instances in the process?
  • How can we distinguish between process-specific vs case-specific contexts that determine a choice? Do we actually observe when the context relevant for decisions changes? For example, a change in season influences all ongoing cases, or a new compliance rule becomes effective?

Using Frequency for Deviation Analysis

  • We measure frequency of various behavioral features by counting how often activities occur – but this natural frequency may be misleading, e.g., an activity being repeated multiple times after each other may signify something else than than repeating activities with more time in between. Measuring the global frequency cannot capture this.

Analyzing Deviations

Typical structure in deviation analysis: first detect where did something happen we didn’t expect, then explain why this happened. Do we have to separate both steps or should they be combined/integrated?

  • This separating into two sequential tasks is necessary when developing techniques/detectors: first need to identify the patterns that constitute the deviations, then can detect and analyze causes based on these patterns.
  • Deviation patterns are crucial to explain and show to domain experts what the difference/change to the expected behavior is to confirm these deviations are relevant and agree on the analysis goal, to give focus on the next steps.
  • How do we best visualize changes and deviations in processes that make sense to domain experts to reason together about relevance and causes? Process visualizations themselves are already complex, visualizing complex patterns of how the process deviates is not straight forward.

Method selection for deviation analysis

How do we pick the right approach for detecting anomalies?

  • Pattern-based, ML-based, rule-based?
  • Research often only obtained acceptable results by searching for the best performing method among a variety of options. Sometimes even different methods have to be used within the same process and task for different target classes. This need to search for and engineer the most suitable method is a huge hurdle to adoption in industry.

Analyzing processes beyond control-flow

Adding more features to event data and analytics (data attributes/properties) makes the analysis more complex.

  • How do we find which features matter for analysis and prediction? The space of possibilities is exploding as we add related data objects and entities to the event data feature space.
  • There is a lot of domain knowledge available about how objects are related, which attributes are part of which objects, how the systems work (e.g., UML diagrams) – we should use it, but not store it inside the event log.
  • Domain knowledge does not invalidate the use of event logs. Event logs provide additional insights into this kind of domain knowledge: how are the structures used over time? How are the structures described in the domain documentation used in practice differently from their documentation?

The common topic of these above questions is that our standard approaches lack concepts and data structures for modeling the context of a particular behavior that is outside the specific case. The two dimensions of Multi-Dimensional Process Thinking of the inner and the outer scope of a process (analysis) may help to think about these concepts and data structures by taking a different perspective on processes.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s