Analyzing Statistics

This chapter explains in detail how the statistics view works. Refer to Analyzing Process Maps and Analyzing Cases to learn how to use the map and cases views.

The Statistics View

While the Map view (see Analyzing Process Maps) gives you an understanding about the actual process flow, the Statistics view provides you with additional overview information and detailed performance metrics about your process.

You get to the Statistics view by simply changing to the Statistics tab as shown in Figure 1.

_images/StatisticsView.png

Figure 1: The Statistics view in Disco.

Depending on the statistics view that you have selected on the left (4-7), the following information areas are shown:

Overview information (1)
Key overview figures about the selected statistics view. For example, for the global statistics shown in Figure 1, you can see how many events and cases are in your data set, how many different activities there are, what the median and mean case duration is, and which time frame is covered in the log.
Performance charts (2)
A number of pre-generated charts visualize relevant performance metrics for the current statistics view. Charts can be exported as explained in Exporting Charts and Tables.
Detailed information (3)
In the lower part of the screen, detailed statistics are shown in a table format. Every table in Disco can be exported as a CSV file to further process the information with other tools such as Excel or Minitab. Refer to Exporting Charts and Tables to learn how to do that.

The statistics are organized into the following views:

Global statistics (4)
Overview statistics about the whole data set (see Overview Statistics), individual cases (see Case Statistics), and variants (see Variant Statistics).
Activity statistics (5)
Statistics about the different process steps in your data set (see Activity Statistics).
Resource statistics (6)
Statistics about the people or organizational units in your data set (see Resource Statistics).
Attribute statistics (7)
Statistics about all further attributes (see Attribute Statistics).

Finally, like in any of the three analysis views (Map, Statistics, Cases), you have the following functionalities available at the bottom of your screen:

Filtering (8)
The log filter controls for the current data set can be accessed from each of the analysis views. Filters are really important drill into specific aspects of your process and to focus your analysis. Read the Filtering chapter for detailed information on how filtering works in Disco.
Copy, Remove, and Export data set (9)
Data sets can be copied (see Copying Data Sets) and deleted (see Deleting Data Sets) right from the current analysis view. Process maps and data sets can be exported via this export button in the lower right corner. Cases, variants, statistics and charts can be directly exported via right-click from the Statistics view. You can find a detailed overview about all export functions of Disco in the Export chapter.

Overview Statistics

In the upper right area of the screen, a few summary statistics are shown, see (1) in Figure 1. The summary figures are:

  • Events: Total number of events in the data set. Think of this as the number of rows in your CSV file that were imported.
  • Cases: Total number of process instances in the data set. This is the number of different case IDs in your file (see The Minimum Requirements for an Event Log for more information on case IDs).
  • Activities: Total number of different activities in the data set (see The Minimum Requirements for an Event Log for more information on activity names).
  • Attributes: Total number of attributes (the number of columns that you have configured as Other during the import - See also Import Configuration Settings).
  • Median case duration: The median for all case durations in your data set (refer to Performance Metrics for an explanation of the difference between median and mean).
  • Mean case duration: The average of all case durations in your data set (refer to Performance Metrics for an explanation of the difference between mean and median).
  • Start and End: The range of time covered by your data set (from earliest to latest timestamp observed).

You can change the chart that is displayed on the left — See (2) in Figure 1 — by selecting a metric from the list. Most of these metrics can also be used to filter your data set with the Performance Filter, where the same charts are used to guide your configuration.

The following metrics are available in the chart view of the global statistics:

_images/1-EventsOverTime.png

Figure 2: Events over time.

Events over time

The log timeline on the horizontal axis represents the total timeframe covered by your log (from the earliest to the latest timestamp observed). The events over time metric then shows the level of activity in your process by plotting the number of performed activities in the process on the vertical axis. You can hover over the graph with your mouse to inspect the different data points as shown in Figure 2.

In many processes, the events over time chart can show you seasonal or weekly patterns. For example, you will see spikes on weekdays and less activity in the weekend. Or you will see that more activity occurs in the process at certain times of the year.

_images/2-ActiveCasesOverTime.png

Figure 3: Active cases over time.

Active cases over time

The active cases over time metric shows you the development of the number of cases that are in progress at the same time over the timeframe of your data set. When new cases are started, then the value on the vertical axis rises. When cases are completed, then the level of active cases drops.

In many processes, the active cases over time chart can help you understand changes in workload. For example, you might see that the number of cases that are in the process is piling up for a while and that, later, the workload has been reduced again (either through the deployment of additional resources or due to external fluctuations in workload).

Note that for each data set the number of cases in the system naturally starts and ends with 0 and, therefore, the graph will always rise upwards at the beginning and descend again towards the end of the timeline. This does not mean anything particular and simply is due to the fact that Disco has no knowledge of the “current state” of your process before the data set starts. Similar to many simulation studies, you should analyze the workload levels of your process while ignoring this “warm-up” and “cool-down” period with respect to the number of active systems in the system.

Keep also in mind that while the active cases over time metric considers active cases as cases that have started but not ended yet, Disco has no knowledge of what we call incomplete cases in the process mining world. It simply takes the first event in each case as the start moment and the last event in each case as the end moment. You will need to use the Endpoints Filter to remove incomplete cases if you do not want to see incomplete cases in this view. If you want to include open cases in a way that lets Disco recognize that incomplete cases are open (and ongoing) until the very end, take a look at this guide “How to analyze open cases with process mining” [1].

This chart is used as a back-drop in the Cases view (see Analyzing Cases), where it serves as a visual reference with respect to the timeframe that the selected case was active in comparison to the overall log timeline. The active cases over time chart is also used in the background visualization of the Timeframe Filter.

_images/3-CaseVariants.png

Figure 4: Pareto chart showing the cumulative number of cases covered by the variants so far.

_images/4-CaseVariants2.png

Figure 5: Histogram displaying the number of cases for each variant.

Case variants

In addition to the summary overview of the variants shown at the bottom of the global statistics view (see Variant Statistics), this chart gives you a visual representation of how the variants are distributed over the cases.

There are two alternative views to inspect the case variant chart:

  • In Figure 4 you can see the pareto chart, which is the default view. The variants are lined up from the most frequent on the left of the horizontal axis to the least frequent ones on the right. When you hover over the chart with your mouse, the displayed bubbles show you both the number of cases that follow the activity sequence pattern of the current variant and the total percentage of cases in your data set that are currently covered by all variants from the left up to this point. For example, in Figure 5 you can see that the most frequent 40 variants cover almost 90% of the cases, and that Variant 40 in particular is followed by 2 cases.
  • A normal histogram view is shown when you click on the corresponding symbol in the upper right corner of the chart (see red highlight in Figure 5). In the histogram view, the variant frequencies are simply displayed from least frequent to most frequent as shown in Figure 5, where, again, the activity sequence of Variant 40 is followed by 2 cases.

The case variants charts can give you a better understanding of how your variants are distributed. In the Cases view (see Analyzing Cases) you can see how many variants there are in total, and you can explore the variants by inspecting example cases for each of them. However, how many of your variants are covering 80% of your process? Are there a few very dominant variants and then a long tail of many unique cases? Different processes have different variant distribution shapes and you can use that information to understand how much variation there is.

Refer to Variant Statistics and Inspecting Variants for more information about variants.

_images/5-EventsPerCase.png

Figure 6: Events per case.

Events per case

The events per case metric shows the distribution of how many events occur in each case. The horizontal axis runs from the minimum number of events that has been observed on the left up the maximum number of activities on the right. On the vertical axis the number of events per case is displayed.

For example, in Figure 6 you can see that in total 80 cases each performed 17 activities in the process. These 17 activities may have been performed in different sequences, they may be different activities or all the same activities. So, the events per case metric gives you a general sense of how many steps have been made in the process for each case.

In many processes, the number of steps can be used as a proxy for the amount of effort that is dispensed. If you think, for example, of an incident management process at an IT Service Desk: Each event — each step — corresponds to a time that an employee has touched an incident ticket to work on it, forward it, or change the status. Inefficiencies like rework and ping-pong behavior in the process are then reflected by a higher number of steps. Process improvement initiatives for such kind of processes often aim to reduce the number of steps that are needed to complete the process to cut costs.

Disco also calculates the number of events for each variant and case in the Cases view (see Analyzing Cases) and displays them in the Case statistics (see Case Statistics) and Variant statistics (see Variant Statistics). Furthermore, the Performance Filter can be used to focus your analysis on particular long or short cases based on the events per case metric.

_images/6-CaseDuration.png

Figure 7: Case duration.

Case duration

The case duration metric shows you the throughput time of the process from the very beginning (start of first activity) to the very end (completion of last activity). For example, in Figure 7 one can see that there are a number of short-running cases and then, towards the right, a smaller number of very long-running cases (the ones pointed out by the mouse lasted 84 days and 13 hours).

The mean and median duration of all cases shown in this histogram is displayed in the summary statistics — See (1) in Figure 1 — on the right.

For many processes, the case duration is an important metric because it shows the total time that the process needed to complete. For example, if a customer-facing process is taking too long then it might result in a bad customer experience. If contractual service levels are not met, there can be financial or legal consequences.

Again, keep in mind that Disco cannot know which cases are incomplete in the sense that they have not finished yet. The case duration for an incomplete case is measured as the time between the first and the last event in the case. You will need to use the Endpoints Filter to remove incomplete cases if you do not want to see incomplete cases in this view. If you want to include open cases in a way that lets Disco recognize that incomplete cases are open (and ongoing) until the very end, take a look at this guide “How to analyze open cases with process mining” [1].

You can use the Performance Filter to focus on particular fast or slow cases, or on cases that meet or do not meet a certain service level target for your process. If you want to see the distribution of time it takes to get just from one specific activity in the process to another one instead of the complete process, you can apply an Endpoints Filter in trim mode.

_images/7-CaseUtilization.png

Figure 8: Case utilization.

Case utilization (only available if you have start and end timestamps for your activities)

The case utilization metric gives you a sense of how much time in your process is spent in activities (active time) relative to the total case duration. If the case utilization is 1.0 (100%), then all the time has been actively spent performing activities (this is trivially true if only one activity was performed once).

In many processes, the case utilization is low because there are often much longer waiting times (also called idle times) between activities than the time that it takes to actually perform an activity in the process. For example, in the case utilization chart in Figure 8 the highest case utilization that was achieved in the process was 41.2%.

The case utilization can only be calculated if your data set has start and completion timestamps for each activity. Refer to Including Multiple Timestamp Columns to learn more about how multiple timestamps can be imported in Disco.

_images/8-MeanActivityDuration.png

Figure 9: Mean activity duration.

Mean activity duration (only available if you have start and end timestamps for your activities)

The mean activity duration shows how much time was spent—on average–per activity for each case.

For example, in Figure 9 the average activity execution time was around 19 minutes for 44 cases in the data set.

The mean activity duration can only be calculated if your data set has start and completion timestamps for each activity. Refer to Including Multiple Timestamp Columns to learn more about how multiple timestamps can be imported in Disco.

_images/9-MeanWaitingTime.png

Figure 10: Mean waiting time.

Mean waiting time (only available if you have start and end timestamps for your activities)

The mean waiting time indicates the average time that was spent inactively between two activities (after an activity has been completed and before the next one is started).

For example, in Figure 10 there was one case, where—on average—44 days were spent without the execution of any activity in the process.

The mean waiting time can only be calculated if your data set has start and completion timestamps for each activity. Refer to Including Multiple Timestamp Columns to learn more about how multiple timestamps can be imported in Disco.

Case Statistics

In the lower part of the Overview statistics screen, you find a list of all cases in your data set, see (3) in Figure 1.

For each case, you can see the number of events, their variant, the earliest timestamp, the latest timestamp, and the duration as shown in Figure 11. You can sort the table by clicking on the header of the column that you want to sort by. For example, in Figure 11 the case statistics are sorted by their case duration and the longest case is displayed at the very top (108 days and 7 seven hours). When you click on the same header again, the table will be sorted in the opposite direction.

_images/Overview-Cases.png

Figure 11: A list of all cases with their number of events, variant, start and end time, and duration.

You can scroll through the list of cases, and sort the table according to different dimensions, to get an overview. When you right-click somewhere in the table, you get the following additional options (see 1-4 in Figure 12).

_images/Overview-Cases-2.png

Figure 12: When you right-click on the case statistics table, you can choose to inspect a particular case, filter for the case in the current row, copy the cell content to your clipboard, or export the whole case statistics table.

Show case details (1)
To inspect the full history of a case, you can choose the Show case details option. This is a great way to inspect extreme cases to see what happened, or to ensure that there is not a data quality problem. For example, in Figure 12 we have sorted the cases based on the number of steps. We can see that case 20 has a considerably shorter duration (just 11 days and 22 hours) compared with the other cases that took many steps. To investigate what might be the reason, and to potentially learn from this case to promote it as a best practice, we want to look at this case in more detail. The Show case details option brings you directly to the case in the row that you have clicked on in the Cases View (see also Search and Inspection Short-cuts).
Filter for case (2)
Instead of just inspecting a case, you can also use the Filter for case short-cut to filter your data set to keep only this one case. This can be useful, for example, if you want to see the process map (and the remaining statistics) just for this one case. Refer to Filtering Cases, Variants, Activities, Resources, and Attributes for further information.
Copy to clipboard (3)
Sometimes you just want to copy the content of a particular cell to your clipboard, where you can paste it into another program outside of Disco. For example, you could copy the case ID from the case statistics table and paste it into your operational system to look up more detailed information about a particular case. The Copy option is available via right-click for all tables throughout Disco. Refer to Copying Values and Results to the Clipboard for more information.
Export case statistics (4)
The whole case statistics table can be exported as a CSV file. For example, the exported file can be opened in Excel and to create some custom charts, or you can import it into a statistics tool to perform additional analyses with it. Refer to Exporting Cases for further information about the case statistics export.

To let these additional options disappear again, you can simply click somewhere in the table with your left mouse button.

Variant Statistics

Alternatively, you can switch to see the variants as shown in Figure 13. A variant in Disco is a specific sequence of activities, and there may be multiple cases that follow the same sequence through the process (see also Inspecting Variants). In the variant statistics you see at a glance how many cases follow each variant, how many events make up the sequence, and what the median and average duration is for all cases that follow this variant.

Like in the case statistics (see Case Statistics), or any other statistics table in Disco, you can sort the variant statistics by clicking on the header of the column that you want to sort by. For example, the variants in Figure 13 are sorted by their frequency.

_images/Overview-Variants.png

Figure 13: A list of all variants with their frequency, number of events, and median and average duration.

The variant statistics give you an overview about your variants from a bird’s eye perspective. You can sort the table according to different dimensions to get an overview. When you right-click somewhere in the table, you get the following additional options (see 1-4 in Figure 14).

_images/Overview-Variants-2.png

Figure 14: When you right-click on the variant statistics table, you can choose to inspect a particular variant, filter for the variant in the current row, copy the cell content to your clipboard, or export the whole variant statistics table.

Show variant details (1)
To inspect examples of this variant, you can choose the Show variant details option. For example, in Figure 14 we have sorted the variants based on the number of cases that follow them to focus on the most significant variants. Among the top-ten most frequent variants, we can see that variant 10 has a much longer median and mean duration (47 and 48 days) compared with the other variants in the top ten list. To investigate what might be causing the delay, we want to look at some example cases of this variant in more detail. The Show variant details option brings you directly to the variant in the Cases View (see also Search and Inspection Short-cuts).
Filter for variant (2)
Instead of just inspecting example cases for a variant, you can also use the Filter for variant short-cut to filter your data set to keep only this one variant. This can be useful, for example, if you want to see the process map (and the remaining statistics) just for the cases that follow this one variant. Refer to Filtering Cases, Variants, Activities, Resources, and Attributes for further information.
Copy to clipboard (3)
Sometimes you just want to copy the content of a particular cell to your clipboard, where you can paste it into another program outside of Disco. The Copy option is available via right-click for all tables throughout Disco. Refer to Copying Values and Results to the Clipboard for more information.
Export variant statistics (4)
The whole variant statistics table can be exported as a CSV file. The exported file will not only contain the variant summary that you see in Disco but also the variants themselves, including all the steps that make up the sequence in each variant. Refer to Exporting Variants for further information about the variant statistics export.

To let these additional options disappear again, you can simply click somewhere in the table with your left mouse button.

Activity Statistics

The second statistics view is the Activity statistics, see (5) in Figure 1. It contains performance metrics about the activities in your process.

Depending on your import configuration, the activity may have been composed out of multiple columns. For example, in the call center example in Figure 15, the activity name has been composed from the Operation and from the Agent Position column to distinguish activities that take place in the first level support and with the back-office specialists. Refer to Combining Multiple Case ID, Activity, or Resource Columns to lear more about how to combine multiple activity columns.

_images/Activity1.png

Figure 15: Activity statistics in Disco.

The following information is available in the Activity statistics (see Figure 15):

Statistics type (1)

Like in the Global statistics (see Overview Statistics), there are a number of different metrics available to display in the chart view:

  • Frequency How often each activity has occurred in the data set. This is the same metric that is also available as the Absolute frequency metric in your process map (see Frequency Metrics), but here you have a sorted chart view of all activities. If there are no start and end timestamps for the activity in your data set, then this is the only chart that is displayed. Refer to Including Multiple Timestamp Columns to learn how to include multiple timestamp columns.
  • Median duration (only available if you have start and end timestamps for your activities) The median time between the start and the end of each activity. This is the same metric that is also available as the Median duration performance metric in your process map (see Performance Metrics), but here you have a sorted chart view of all activities.
  • Mean duration (only available if you have start and end timestamps for your activities) The average time between the start and the end of each activity. This is the same metric that is also available as the Mean duration performance metric in your process map (see Performance Metrics), but here you have a sorted chart view of all activities.
  • Duration range (only available if you have start and end timestamps for your activities) The time span between the minimum observed execution time and the maximum observed execution time for each activity. By showing how much difference there has been between the fastest and the slowest executions of each your activities, you can see how homogenous or heterogeneous the execution times are.
  • Aggregate duration (only available if you have start and end timestamps for your activities) The sum of all executions for each activity. This is the same metric that is also available as the Total duration performance metric in your process map (see Performance Metrics). It shows you on which acvitity—in a cumulative view—most of the active time has been spent in your process.
Pareto or normal view (2)

All of the charts can be viewed either as a Pareto chart or as a normal histogram.

In Figure 15 you see the Pareto chart view for the activity frequencies, where like in some of the Global statistics charts (see Overview Statistics) a yellow line is displayed that shows the cumulative, relative sum (how much out of 100%) for the values below. To change to the histogram view, you can press the corresponding symbol in the upper right corner of the chart (2).

For example, Figure 16 shows the alternative histogram view for the same activity frequencies as Figure 15.

_images/Activity2.png

Figure 16: Histogram view of the chart.

Summary statistics (3)

In the overview information area, the following statistics are displayed:

  • Activities: The number of different activities.
  • Minimal frequency: How often the least frequent activity has occurred.
  • Median frequency: The median of how often each activity has occurred.
  • Mean frequency: How often each activity has occurred on average.
  • Maximal frequency: How often the most frequent activity has occurred.
  • Frequency std. deviation: The standard deviation for the frequency of activities.
Value statistics (4)

In the detailed table view, you can see a list of all activities sorted by their absolute and relative frequency. If start and end timestamps for the activity are available in your data set (refer to Including Multiple Timestamp Columns to learn more about how multiple timestamps can be imported in Disco), then also the median duration, the mean duration and the duration range for each activity are displayed.

You can sort the table by another column through clicking on the column header. For example, you sometimes want to see the activity statistics sorted by activity name, or their duration. By clicking on the column header again, the statistics table is sorted in the opposite direction. You can see which column the table is currently sorted by through the little triangle displayed to the left of the column title.

Finally, you can switch to a view where you only see those activities that have occurred as the first or the last activity in a case. This can be useful when you want to check whether you have incomplete cases in your data set: There are often only a number of valid start and end activities in a process, and all other cases have either been started before the data extraction period began or were not finished yet when the data was extracted.

The Endpoints Filter can be used to clean your data of incomplete cases.

The activity statistics give you an overview about your activity from a bird’s eye perspective. You can sort the table according to different dimensions to get an overview. When you right-click somewhere in the table, you get the following additional options (see 1-4 in Figure 17).

_images/Activity3.png

Figure 17: When you right-click on the activity statistics table, you can choose to filter for the activity in the current row, search for cases with the activity in the Cases view, copy the cell content to your clipboard, or export the whole activity statistics table.

Filter for activity (1)
If you want to filter your data set for all cases that contain a particular activity, you can also use the Filter for activity. This way, you will see the process map and the remaining statistics just for the cases that perform this process step. Refer to Filtering Cases, Variants, Activities, Resources, and Attributes for further information.
Filter for variant (2)
To inspect example cases that contain this activity, you can choose the Search activity option. For example, in Figure 17 we have discovered that an outbound call was made six times from the back-office specialists. Normally, all outgoing calls should be placed by the main contact in the first level support according to the business rules for this process. To investigate whether there was a special reason to make an exception from this rule, we want to look at some example cases of this variant in more detail. The Search activity option brings you directly to the Cases View with all the cases, where this activity has occurred (see also Search and Inspection Short-cuts).
Copy to clipboard (3)
Sometimes you just want to copy the content of a particular cell to your clipboard, where you can paste it into another program outside of Disco. The Copy option is available via right-click for all tables throughout Disco. Refer to Copying Values and Results to the Clipboard for more information.
Export variant statistics (4)
The whole activity statistics table can be exported as a CSV file. Refer to Exporting Charts and Tables for further information about the activity statistics export.

To let these additional options disappear again, you can simply click somewhere in the table with your left mouse button.

Resource Statistics

The Resource statistics view, see (6) in Figure 1, is built up in exactly the same way as the Activity statistics view, but shows you information about what you have configured as a Resource during import (see Import Configuration Settings).

For example, in Figure 18 you can see that there are 48 different people working in the callcenter process. Refer to Activity Statistics for details about the charts, overview statistics and table view.

_images/Resource.png

Figure 18: Resource statistics in Disco.

Note that only activity, case ID, and resource columns can be combined from multiple columns in your data set. If you have multiple data attributes that you would like to analyze in combination, the Resource import type can be used as a way to combine them without influencing the process map. Refer to Combining Multiple Case ID, Activity, or Resource Columns to lear more about how to combine multiple resource columns.

Finally, even if you have start and end timestamps for your activities in the data set, the duration statistics are only calculated for the activity and resource fields. Regular attributes (see Attribute Statistics) are only shown with their frequency metrics in the statistics view. Therefore, if you have particular data attributes for which you need to calculate these duration statistics, you could use the Resource import type for these attributes.

Attribute Statistics

Any column that you have included during import, but which was not assigned as a Case, Activity, Resource, or Timestamp column (but instead configured as ‘Other’ - Refer to Import Configuration Settings to learn more about the import settings), will be included as a separate attribute here in the Statistics view, see (7) in Figure 1.

These additional attributes often provide valuable information about your process. Typical attributes are, for example, product types in a support process, the value of a purchase in a purchasing process, the type of issue in an incident handling process, and so on.

The attribute statistics then give you a breakdown of the frequencies for the different values in each of these attributes. Particularly in combination with filtering, you can use them in two powerful ways:

  • Use attributes to split out different processes: Sometimes, you actually have different types of processes. For example, when a customer service process can be started either via the call center or through a dealer, then the processes that are performed in these two situations often differ, there may be different service level agreements in place, and different parts of your organization may be in charge of them. So, you want to analyze them in isolation.
This can easily be done if you have an attribute in your data set that indicates the channel. Simply use an Attribute Filter to first create a copy of your data set for the channel attribute value call center and then another copy for the channel attribute value dealer. As a result, you will have two separate data sets that can be compared and analyzed independently from each other.
  • Inspect attributes after filtering: In return, the attribute values can give you feedback about which types of cases have problems after you applied another, for example, performance-oriented filter. Say, for example, you have applied a Performance Filter to focus on cases that do not meet the agreed-upon service level because they took too long to complete. By inspecting the attribute value frequencies after filtering you can see for which channels, products, or other categories this problem is more prevalent than for others.
_images/Attribute.png

Figure 19: Attribute statistics in Disco.

Figure 19 shows a screenshot of an Attribute statistics view for the Agent Position attribute from the callcenter example, which has just two values. FL stands for frontline, which indicates that the activity took place in the first-level support desk, and BL stands for backline, which tells us that the activity was performed by a back-office specialist. You can see that ca. 94% of the activities occurred in the 1st level support (frontline) and ca. 6% were performed in the 2nd level support (backline).

In fact, in this situation the Agent Position attribute has been combined with the Operation attribute to form the activity name (see also Figure 15). Although Agent Position is used as part of the activity name here, the attribute statistics of each of the combined columns is still available through an individual attribute column. To remind you that the attribute is also used as part of the activity, there is a little envelope symbol embedded in the speech bubble symbol. This works in the same way if you choose to combine multiple columns to determine the resources in your process.

Refer to Combining Multiple Case ID, Activity, or Resource Columns to learn more about when it is useful to combine multiple Activity or Resource columns, and how to do it.

Sorting Tables

Throughout the statistics view, Disco provides you tables with statistics about some aspect of your process. You can find these tables in the:

It is often handy to sort these tables according to the dimension that most interests you. For example, if the throughput time is most critical for your process then you most likely want to sort the Case Statistics table based on the case duration.

You can do this by simply clicking on the header of the column that you want the table to be sorted by. If you click on the header again, the table will be sorted in the opposite direction.

_images/SortingTables.gif

Figure 20: Any table in Disco can be sorted by clicking on the header of the column you want to sort it by. If you click again, the table will be sorted in the opposite direction.

For example, in Figure 20 the Case Statistics table is sorted based on the number of steps that were made in each case.

The little triangle next to the header title indicates by which column the table is currently sorted by, and in which direction. If the triangle is pointing upwards, then the largest values appear at the top of the table. If the triangle is pointing downwards, then the largest values appear at the bottom.

Filtering Cases, Variants, Activities, Resources, and Attributes

You can filter cases, variants, activities, resources, and any other attribute by manually adding an Attribute Filter. However, when you use the filter short-cuts from the statistics view you will be even faster and you will add the filter directly from the context in which you are currently inspecting your process.

Filtering Cases

For example, let’s say that we are inspecting the purchasing process based on throughput times. We have sorted the Case Statistics table based on the duration column (see also Sorting Tables) and we see that there is one case that took 108 days and 7 hours. We now want to filter the data set to focus only on this extreme case. To do this, you can right-click on the first row in the table and select the Filter for case “655” option from the context menu (see screenshot in Figure 21).

_images/Filter-Case-1.png

Figure 21: Right-click on the row of the case that you want to filter and select the Filter for case option.

This will automatically add a pre-configured Attribute Filter with the right case selected. You can directly add the filter to the current data set (by using the Apply filter button) or create a copy by using the Copy and filter button (see screenshot in Figure 22).

_images/Filter-Case-2.png

Figure 22: A pre-configured Attribute Filter will be automatically added to your filter stack. All you have to do is to apply the filter.

Once you apply the filter, only the filtered case will remain in your data set. As a result, you now see all the statistics only for this one case. Furthermore, if you switch to the Map View then you will now see the process map only for this one case. for example, after filtering for case 655 in the purchasing process we can see that it went through the additional Amend Request for Quotation Requester step four times (see screenshot in Figure 23).

_images/Filter-Case-3.png

Figure 23: After applying the filter the statistics and the process map will be shown only for the filtered case.

Note that if your case has no repetitions and no parallelism, then the process map will show just a simple sequence. Filtering individual cases can be particularly useful for extreme cases and complex processes with parallelism or rework. Looking at the journey of one particular case can be instructive to understand what has gone wrong. Furthermore, by replaying the case in the animation (see Process Animation) you can bring it to live and tell a story based on the evidence in your data.

Be careful not to be caught up too much by the inspection of individual cases, because they might not be representative or outliers. However, especially once you have identified a general problem in your process, picking example cases that exhibit this behavior can be an important communication tool to illustrate your findings.

Filtering Variants

Similar to filtering cases, from the Variant Statistics table you can directly filter for one particular variant. For example, in Figure 24 we have seen that among the top ten most frequent variants Variant 10 takes considerably longer (ca. 47 days in the median and 48 days on average). To focus on this variant in our analysis, we can right-click on this row and select the Filter for “Variant 10” option from the context menu (see screenshot in Figure 24).

_images/Filter-Variant-1.png

Figure 24: Right-click on the row of the variant that you want to filter and select the Filter for variant option.

This will automatically add a pre-configured Attribute Filter with the right variant selected. You can directly add the filter to the current data set (by using the Apply filter button) or create a copy by using the Copy and filter button (see screenshot in Figure 25).

_images/Filter-Variant-2.png

Figure 25: A pre-configured Attribute Filter will be automatically added to your filter stack. All you have to do is to apply the filter.

Once you apply the filter, only the cases from the filtered variant will remain in your data set. As a result, you now see all the statistics only for this one case. For example, we can see that the case durations of the cases in this variant differ from each other quite a bit (see screenshot in Figure 26).

_images/Filter-Variant-3.png

Figure 26: After applying the filter the statistics will be shown only for the cases in the filtered variant.

Furthermore, if you switch to the Map View then you will now see the process map only for the cases in this one variant. For example, after filtering for Variant 10 in the purchasing process we can see that the biggest bottleneck for this variant is between the the process step Amend Request for Quotation Requester and between the Analyze Request for Quotation step (see screenshot in Figure 27).

_images/Filter-Variant-4.png

Figure 27: When you switch to the process map, only the process map for the cases in the selected variant will be shown.

Variant filtering can be useful in many situations. Sometimes, you just want to see the average or median waiting times between the process steps for one particularly dominant process map. Furthermore, sometimes it can also serve as a short-hand to quickly get to the variant filter and then expand the selection from just this one, pre-configured variant to a few more based on which variants reflect the ideal process (so they are the “good” variants).

If you want to simplify your process map based on the most frequent variants, you can also use the Variation Filter.

Filtering Activities

When you are inspecting the Activity Statistics, you can also make use of a short-cut to filter cases that contain a particular activity. For example, we can see that in the purchasing process the activity Settle dispute with supplier Purchasing Agent has been performed 26 times. To focus our analysis on just these cases containing this particular activity, you can right-click on this row and select the Filter for activity option from the context menu (see screenshot in Figure 28).

_images/Filter-Activity-1.png

Figure 28: Right-click on the row of the activity that you want to filter and select the Filter for activity option.

This will automatically add a pre-configured Attribute Filter with the right activity and the right filter mode selected. You can create a copy (by using the Copy and filter button) or directly add the filter to the current data set by using the Apply filter button (see screenshot in Figure 29).

_images/Filter-Activity-2.png

Figure 29: A pre-configured Attribute Filter will be automatically added to your filter stack. All you have to do is to apply the filter.

When we inspect the process map for the filtered data set, we can see that 24 cases perform this additional step (so, some of them go though the dispute more than once). Furthermore, we can see that in one of the cases also the Financial manager got involved in the dispute (see screenshot in Figure 30).

_images/Filter-Activity-3.png

Figure 30: After applying the filter the statistics will be shown only for the cases in the filtered variant.

This activity filter short-cut from the statistics view is equivalent to the Filter this activity short-cut from the process map (refer to Filtering Activities from the Process Map to learn how to use the process map filter short-cuts).

Filtering Resources

Similar to activities, you can make use of a short-cut to filter cases that contain activities performed by a particular resource when inspecting the Resource Statistics.

For example, when we sort the employees working on the purchasing process based on their median activity duration (refer to Sorting Tables for learning how to do that), we can see that Francis Odell is the fastest with a median time of just four minutes. To learn more about what he is doing right, we want to focus our dataset on just the cases containing this particular resource. To do this, you can right-click on the corresponding resource row and select the Filter for resource option from the context menu (see screenshot in Figure 31).

_images/Filter-Resource-1.png

Figure 31: Right-click on the row of the resource that you want to filter and select the Filter for resource option.

This will automatically add a pre-configured Attribute Filter with the right resource and the right filter mode selected. You can create a copy (by using the Copy and filter button) or directly add the filter to the current data set by using the Apply filter button (see screenshot in Figure 32).

_images/Filter-Resource-2.png

Figure 32: A pre-configured Attribute Filter will be automatically added to your filter stack. All you have to do is to apply the filter.

Filtering Attributes

Attributes in your data set can contain important information in all kinds of dimensions. For example, when we look at the Channel Attribute Statistics in the refund customer service process, we can see that there are three different channels through which the refund order can be handled. There are actually different guidelines and different processes behind these channels, so we want to analyze them in isolation.

To do this, you can simply right-click on the corresponding attribute row and select the Filter for attribute value option from the context menu (see screenshot in Figure 33).

_images/Filter-Attribute-1.png

Figure 33: Right-click on the row of the attribute that you want to filter and select the Filter for attribute value option.

This will automatically add a pre-configured Attribute Filter with the right attribute and the right attribute value selected. You can directly add the filter to the current data set (by using the Apply filter button). However, to keep all the different channels it is better to create a copy by using the Copy and filter button or (see screenshot in Figure 34).

_images/Filter-Attribute-2.png

Figure 34: A pre-configured Attribute Filter will be automatically added to your filter stack. All you have to do is to apply the filter.

As a result, we can now inspect the process just for the callcenter channel in isolation (see screenshot in Figure 35). Simply create two more copies for the other channels and you can easily switch back and forth to look at the processes for the different channels (see also Switching Between Data Sets in the Analysis View).

_images/Filter-Attribute-3.png

Figure 35: After applying the filter, the statistics, cases and process map will all be shown only for the cases with the filtered attribute value.

Search and Inspection Short-cuts

In many situations, you don’t need to actually filter (see Filtering Cases, Variants, Activities, Resources, and Attributes) but you just want to see some examples. Being able to look at a concrete case that, for example, takes very long, or looking at examples cases for a process variant that takes many steps, allows you to gradually build a better understanding of what is going on. And inspecting and searching for example cases is even faster than filtering.

When you look at an example case in the Cases view (see Analyzing Cases), then you can see the full history for this case including all the attributes, which helps you to understand the context in which it took place. Disco allows you to quickly “jump” to example cases from the Statistics view in the following three ways.

Inspecting Cases

From the Case Statistics table, you can directly jump to a case by right-clicking on the corresponding case row and selecting the Show case details option from the context menu (see screenshot in Figure 36). This will open the case right in the Cases view for further inspection.

For example, in Figure 36 we have the Case Statistics table sorted by the case duration (see Sorting Tables for how to do that). We can see that the longest case (with the case ID Case160) takes 60 days and 19 hours. To find out what exactly happened in this particular case, we simply-right click on the Case160 row and select the Show case details option to be able to look at the case history in more detail.

_images/Inspecting-Cases.gif

Figure 36: Using the Show case details option allows you to quickly inspect the history of a particular case.

Inspecting Variants

From the Variant Statistics table, you can directly jump to a variant by right-clicking on the corresponding variant row and selecting the Show variant details option from the context menu (see screenshot in Figure 37). This will open the variant with all the cases that follow this particular variant right in the Cases view for further inspection.

For example, in Figure 37 we have the Variant Statistics table sorted by the variant frequency. We can see that Variant 5 has the longest average throughput time among the top 10 variants. To find out what exactly happened in this particular variant, we simply-right click on the Variant 5 row and select the Show variant details option to be able to look at example cases following this variant in more detail.

_images/Inspecting-Variants.gif

Figure 37: Using the Show variant details option allows you to quickly inspect example cases for a particular variant.

Searching Activities, Resources, and Attribute values

Seeing example cases, where a particular activity, resource, or other attribute value was involved can also be a powerful way to build a better understanding of what is happening. Disco includes a powerful full-text search feature in the Cases view (see Search) and the Statistics view gives you convenient short-cuts to directly search your data set for any activity, resource or attribute value.

For example, when we look at the attribute statistics for the purchasing process, we see that the Amend Purchase Requisition activity only occurred 11 times in the whole data set. We are curious to see in which context these amendments were made and why. This can be particularly helpful if we have additional data attributes, for example, free-text or comment attributes that give us additional context about the case. To search for cases where a particular activity occurred, we can simply right-click on the activity name in the Activity Statistics table and choose the Search in data option from the context menu as shown in Figure 38.

_images/Searching-1.png

Figure 38: Using the Search in data option allows you to quickly inspect example cases that contain a particular activity, resource, or attribute value.

As a result, the activity name is automatically entered in the Cases view’s search field and Disco shows you all cases, where this particular activity as occurred. The matching event rows are highlighted in orange to quickly guide you to the right place in the case (see Figure 39). You can now start looking at the case histories, what happened before and after the highlighted activities, and potential additional case information in your data set.

_images/Searching-2.png

Figure 39: The search term is automatically entered into the search window in the Cases view and all matching cases are shown, with an orange highlight of the activities matching the search value.

This search short-cut does not only work from the Activity Statistics table but also from the Resource Statistics table and any of the other Attribute Statistics tables in the same way, allowing you to quickly inspect cases that went through a particular organizational group, status, or falling in some of your attribute categories without having to filter at all.

Furthermore, once you are in the Cases view, inspecting example cases, you can continue to use the search short-cut from the Cases view as well. For example, in Figure 39 we can see that the Amend Purchase Requisition activity was performed by the resource ‘Immanuel Karagianni’. If we would like to see more example cases where this person was involved, we can now right-click on the employee name in the case history table and select the Search in data option again (see Figure 40).

_images/Searching-3.png

Figure 40: The Search in data option is also available from the Cases view, which allows you to continue your search based on the example cases that you see.

As a result, the resource’s name is automatically entered in the Cases view’s search field and Disco shows you all cases, where this particular resource was involved. The orange rows highlight all process steps that were performed by this person (see Figure 41).

_images/Searching-4.png

Figure 41: The search term is automatically entered into the search window in the Cases view and all matching cases are shown, with an orange highlight of the events matching the search value.

Copying Values and Results to the Clipboard

The context-menu that pops up when you right-click into any table in Disco does not only allow you to filter (see Filtering Cases, Variants, Activities, Resources, and Attributes) or search the data set (see Search and Inspection Short-cuts), but it also gives you the option to copy the content of the current cell to the clipboard. The clipboard is computer mechanism to transfer data between documents or applications for copy and paste operations.

There are many different scenarios, where you might want to use the Copy to clipboard functionality in Disco. Here are three examples.

Example 1: Searching for an activity in the process map

Sometimes, you see an activity in the Activity Statistics, or in one of the cases in the Cases view (see Analyzing Cases), which you would like to further inspect in the process map to see how it fits into the overall process flow. Rather than remembering the name and manually searching for the activity in the process map (which can be cumbersome if your process map has many different activities), you can quickly find this activity back in the map in the following way.

Right-click on the activity name (either in the Activity Statistics or in the example case that you are currently inspecting) and choose the Copy option from the context menu (see Figure 42).

_images/CopyClipboard-1.png

Figure 42: The Copy option copies the value of the cell that you have clicked on into your computer’s clipboard.

This will copy the activity name into the clipboard of your computer. You can now change to the Map view (see Analyzing Process Maps) and paste the copied activity name into the search field at the top of the process map, which will find and highlight the activity in your process (see Figure 43).

_images/CopyClipboard-2.png

Figure 43: If you have copied an activity name to the clipboard, you can paste it into the search field above the process map to quickly find this activity back in your process.

Example 2: Looking up a case in another data set

One of the big powers of process mining and Disco is that you can focus your analysis on just a subset of the process very easily. You can take different views on the process by Copying Data Sets, and then you can navigate between them by Switching Between Data Sets in the Analysis View. However, sometimes you would like to look up a case from one data set in another data set view.

For example, in Figure 44, we have used the Trim mode of the Endpoints Filter to cut out the invoicing part of the purchasing process. While inspecting the variants of the invoicing phase, we discover that for some cases the mandatory process step Release Supplier’s Invoice has not been performed. This looks like a compliance problem.

To investigate whether the Release Supplier’s Invoice step might have occurred earlier in the process (in the part that we have cut off to inspect the final phase of the process more closely), we want to look up some example cases in the full data set. To do this, we can copy the case ID into the clipboard by right-clicking on the case ID as shown in Figure 44 and pressing the Copy button in the context menu.

_images/CopyClipboard-3.png

Figure 44: Right-click on the case ID in The Cases View to copy the case ID into the clipboard. This way you don’t have to remember the case ID if you want to look up the case in another data set.

You can then change the data set by clicking on the drop-down list at the top (see Switching Between Data Sets in the Analysis View). Navigate to the data set, where you want to look up the case. In our example scenario, we choose the data set Process map 100% detail, which contains the full purchasing process without the Endpoints Filter in Trim mode applied (see Figure 45).

_images/CopyClipboard-4.png

Figure 45: You can then change to the data set, where you want to look up the case.

Finally, you can paste the case ID into the Search field in the upper right corner of The Cases View. The case will be highlighted in the search result and you can inspect the full history of the case in more detail as shown in Figure 46.

_images/CopyClipboard-5.png

Figure 46: Paste the case ID from the clipboard to the Search field to find your case in the other data set.

Example 3: Copy and paste an attribute value outside of Disco

Imagine that you want to further research a case outside of Disco, for example, to look up this case in the operational system (or in your original source data). Or you want to copy and paste a particular activity name or attribute value into an email to your colleague, or into a report that you are creating. Of course, you can export all the statistics, maps, cases and variants (refer to the Export chapter to learn how to do that). But if you just need to copy this one attribute, activity, or case ID name, then copying the value to the clipboard is much faster.

To do this, just right-click on any cell in any of the tables of Disco. Then choose the Copy option from the context menu as shown in Figure 47. The content of the cell will be copied to the clipboard of your computer. Afterwards, you can paste it into any application, email or document outside of Disco.

_images/CopyClipboard-6.png

Figure 47: Right-click on any table cell and choose the Copy option from the context menu to copy the content of the cell to the clipboard.

Footnotes

[1](1, 2) How to analyze open cases with process mining: http://coda.fluxicon.com/assets/downloads/Training/Articles/Analyzing-Open-Cases-With-Process-Mining.pdf