Exploring OpenMethods Splunk Events

This page is a Quick Reference containing a punch list of run anywhere (copy/paste) Splunk searches to help explore the data in the OpenMethods Splunk events. The use of some advanced techniques is intentional and the foundational explanation for the techniques will be covered in other articles.

 


 


 

Conventions

In order to conserve space on this page with respect to writing queries and query results,
- It is a best practice, when readability matters, to author Splunk queries per above with the pipe symbol as the first character of each new line. Instead, the queries will be compacted.
- Search command code blocks are in gray, search results are in light blue.
- Sample results may not be included, or will be included as screenshots or converted to JSON and shown as a code block like this below (same results set as in 1.1 below).
[{"index":"1905405642","countofevents":"main"},{"index":"23781423","countofevents":"\"_audit\""},{"index":"10004574","countofevents":"\"_internal\""},{"index":"1755528","countofevents":"\"_introspection\""},{"index":"85111","countofevents":"summary"},{"index":"24343","countofevents":"\"_telemetry\""},{"index":"4","countofevents":"lastchanceindex"},{"index":"0","countofevents":"\"_thefishbucket\""},{"index":"0","countofevents":"demomatricindex"},{"index":"0","countofevents":"history"},{"index":"0","countofevents":"netaddinsimport"},{"index":"0","countofevents":"popflowscriptindex"},{"index":"0","countofevents":"tunnelmetricsindex"}]

Section 1: Foundations

1. Basic Internal Structure of Splunk

In this section, we’ll glance at Splunk’s own internal structure and how it manages indexes, sources, storage, event sizes, and types, and we’ll spot check the indexes _introspection and _internal.

1.1. What are the Splunk indexes where OM data is stored (even if there is currently zero data in the index)?
| eventcount summarize=false index=* index=_* | dedup index | rename count as countofevents | fields index countofevents | sort countofevents DESC
index=”main”: default index where all OM data lives
index=”demomatricsindex” (or ‘netaddinsimport’, ‘popflowscriptindex’, ‘tunnelmetricsindex’): indexes created by OM for targeted research/projects
all the remaining indexes or Splunk internal, more queries will be built up in this section as time permits, but for now focus is to shift to OM specific data.

 

2. Basic OpenMethods Topology

In this section, we’ll discover broadly where/how to look for events that give an overall view of how our software is deployed and being used.

2.0. I am not sure what I am looking for, how do I just explore Splunk data?
index="main" earliest="-30m" source="mediabar"
Per the Splunk Search App Primer , after running this query the data can be explored by looking at search app results and exploring fields in the Fields sidebar.
2.1. List of customers and their CTI environment/location of agents (or at least URL agents use to access HIS)?
Results (JSON + screenshot): (blank HIS URL would imply the site hasn’t been used in the given time window or it is Popflow-only)

 

2.1.a. What are agent states for UCCE and their stats?
2.2. List of customers and their sites/versions?
2.3. How to segment agent usage by production versus lower environments?
This is also the simplest form of unique agent logins by host (customer URL).
2.3.a. The Splunk ‘lookup’ data structure that made the above query possible:
2.4. How to convert Splunk events to look like the regular HIS/CS log statements I am used too?
2.4.a. Simplify Log-Style Statements to Fewer Fields
Or to simplify log-style statements above down to a few meaningful fields and one agent (but for this case let’s say we don’t know which agent so we are using the ‘top’ agent). If we know the agent id, the sub-search (starts with left bracket '[') can be removed. In reality, a sub-search will usually be a performance hit and can be avoided by restructuring almost any search.
2.5. How to identify, at a high level, the major components in use by the customer?
Currently, the majority of searches are centered around component names, ‘mb.className' and 'mb.functionName’, and string matching.
For example, at a quick glance simply of a component, it can be easily determined if an agent is getting screen pops from Harmony or another way.

 

 

2.5.a. Component names by version
Logging design is still undergoing changes, so the component names can vary by version.

3. Popflow Events

3.1. How to identify customer/agent using Popflow and how they are using it, aka Popflow Overview?
Explanation:
a) Why the use of: ‘| search crmgroup="" class=”*”’ clause and all the string matching?
i) As described previously on this page, we are still dependent on string matching and class names. Writing fixed data points or metrics will be a better interface.

ii) The field ‘msgctx’ is present for context and would be used in the case where we are not filtering out ‘mb.className’. You see we are trying to populate ‘mytitle’ and ‘jsonctx’ fields and in the case they are blank might mean there is a message that I am not expecting so the parsing isn’t working on it. Finally, collapsing 2 fields down to 1 is simply for saving space so I can still see the ‘message’ field without scrolling.

b) One of the most important statements in this query is the use of regular expressions (pattern matching):
| rex field=message "^(?<mytitle>[^{\n]*)(?P<myjson>{.*})"
there is a page dedicated to tools for pattern match and JSON manipulation for Splunk, keep checking back for updates.

 


From here, we are going to keep building upon the Popflow Overview, extract some new information, until we have a fully populated breakdown of the events.

There are workflows authored to act off events and "event detected" messages, which can have a corresponding action to fetch a workflow as "getting popflow for eventId" messages, followed by a "got popflow" message which loads workflow and starts to run activities of different types and tracks "starting activity" and "activity complete" messages.


 

3.2. What Popflow Events are Being Triggered and are the Most Frequent?

 

3.3. What Popflow Activities are Being Run?
In overall product usage tracking, I like to track workflows being run and the number of instructions (aka Activities) as an overall indicator of scale and volume. But let’s start with an Activity overview in a log-format style.

 

From Fields Panel, click on ‘custom.displayName’ for Top 10 Values

 

 

3.4. Start Normalizing the Data, Put Events, Popflow Scripts, and Activities All Together in Context

What did we add over the previous queries?

a) 2 or 3 ‘rex’ commands were all handled now in one ‘rex’ command.

b) we extracted ‘eventId’ by string parsing of the ‘message’ field and extracted ‘typeid’ (activity type id) from JSON and then used a lookup table to translate them to friendly names.

c) multiple ‘eval' commands got moved to a single pipe as there is overhead for each pipe

d) there is no single normalized field which is common to all event types (which makes it difficult to manipulate and combine the data later) so we added ‘msgtype

e) the search patterns on the ‘message’ field in the very first segment of the search, when Splunk finds a match in a pipe it stops processing the rest so I made search patterns more explicit and ordered them by frequency of occurrences so there is a higher chance Splunk will find a match and do less processing. note: the technique for finding frequency of occurrences of the ‘message’ field was the same as we’ve used on this page, which goes something like … '<your search> | stats count(msghdr) as cntmsghdr by msghdr' | sort cntmsghdr DESC

 

 

4. Omis Events

4.1. How to identify customer/agent using HIS/Harmony stack and how are they using it, aka Omis Overview?

 

 

4.2. What are all the possible Omis message types and how do I work with them?

 

 

4.3. How do I check if there are any Omis message types I don't know about?
Previously on this page, it was stated that if there is a long evaluation or conditional command (for example string match), Splunk would grab the first match and stop processing. Thus, it would reduce processing and improve performance in theory if the search matches are ordered in the frequency of occurrence.
While leveraging that concept, there wasn’t an immediate obvious performance impact but the side effect was a search command which verifies that your query is structured so that it processes every message type and if one is not known certain fields would be null. You could use a similar concept to uniquely identify every Omis ERROR across every CTI platform and customer, well possibly.

 

5. Interactions

 

6. Agents

 

 

7. Sessions

 

Section 2: Combining Primary Entities

 

1. Screenpops 3 Different Ways

 

 

2. Omis PopFlow Combined “Overview”

 

This document may contain confidential and/or privileged information belonging to OpenMethods. If you are not the intended recipient (or have received this document in error) please notify the sender immediately and destroy this document. Any unauthorized copying, disclosure, or distribution of the material in this document is strictly forbidden.