Exploring OM Splunk Data

This page is a Quick Reference containing a punch list of run anywhere (copy/paste) Splunk searches to help explore the data in the OpenMethods Splunk events. The use of some advanced techniques is intentional and the foundational explanation for the techniques will be covered in other articles.

Conventions

In order to conserve space on this page with respect to writing queries and query results,

  • It is a best practice, when readability matters, to author Splunk queries per above with the pipe symbol as the first character of each new line. Instead, the queries will be compacted.
  • Search command code blocks are in gray, search results are in light blue.
  • Sample results may not be included, or will be included as screenshots or converted to JSON and shown as a code block like this below (same results set as in 1.1 below).


 Internal Structure of Splunk

In this section, we’ll glance at Splunk’s own internal structure and how it manages indexes, sources, storage, event sizes, and types, and we’ll spot check the indexes _introspection and _internal. For now, let's just do one quick query and move on to the good stuff.

What are the Splunk indexes where OM data is stored (even if there is currently zero data in the index)?

| eventcount summarize=false index=* index=_* | dedup index | rename count as countofevents | fields index countofevents | sort countofevents DESC
| index              | countofevents |
|--------------------|---------------|
| main               | 1396348189    |
| _internal          | 65454703      |
| _audit             | 18321677      |
| _introspection     | 4582237       |
| summary            | 43476         |
| _telemetry         | 19029         |
| lastchanceindex    | 4             |
| _thefishbucket     | 0             |
| demomatricindex    | 0             |
| history            | 0             |
| netaddinsimport    | 0             |
| popflowscriptindex | 0             |
| tunnelmetricsindex | 0             |

Definitions:

  • index=”main”: default index where all OM data lives
  • index=”demomatricsindex” (or ‘netaddinsimport’, ‘popflowscriptindex’, ‘tunnelmetricsindex’): indexes created by OM for targeted research/projects
  • all the remaining indexes or Splunk internal, more queries will be built up in this section as time permits, but for now focus is to shift to OM specific data.



Basic OM Data Topology

In this section, we’ll discover broadly where/how to look for events that give an overall view of how our software is deployed and being used.

I am not sure what I am looking for, how do I just explore Splunk data?

index="main" earliest="-30m" source="mediabar"

Per the Splunk Search App Primer, after running this query the data can be explored by looking at search app results and fields in the Fields sidebar.

List of customers, their CTI environment, and the URL (sometimes spells out a location) agents use to access HIS?

Partial results (JSON + screenshot) are shown below.  In a related article, tools, we show how generate the JSON from the Splunk results.   Note, a blank HIS URL in the results would imply the site hasn’t been used in the given time window or it is Popflow-only.

[
  {
    "crm.customer": "arval",
    "hiscti": "\"RNA-I3\"",
    "hisurl": [
      "\"https://harmony_his_p.intra.corp:443",
      "https://harmony_his_s.intra.corp:443\""
    ]
  },
  {
    "crm.customer": "ascena",
    "hiscti": "",
    "hisurl": []
  },
  {
    "crm.customer": "chewy",
    "hiscti": "\"RNA-CiscoUCCE\"",
    "hisurl": [
      "\"https://chewy-fll2this1.openmethodscloud.com:8443",
      "https://chewy-fll2this2.openmethodscloud.com:8443",
      "https://chewy-iad1this4.openmethodscloud.com:8443\""
    ]
  }
]





What are agent states for UCCE and their stats?



index="main" earliest="-2h" source="mediabar" "network.his.model"=RNA-CiscoUCCE 
| top mb.agent.state by network.his.model 
| fields - network.his.model

Results:

mb.agent.statecountpercent
Handling interaction51484073.018975
Unavailable10802815.321447
Available435646.178616
Mixed states181212.570074
Connected to Harmony125691.782642
Wrap up79551.128246


How to identify, at a high level, the major components in use by the customer? 

index="main" earliest="8/3/2020:06:00:00"  latest="8/3/2020:06:30:00" source="mediabar" | eval crmcust='crm.customer' | eval agent='crm.id' | eval class='mb.className' . "-" . 'mb.functionName' | search crmcust="*" agent="*" class="*" | stats values(class) as lc, count(class) as cc by crmcust, agent | where ((crmcust="veritas" AND cc > 1850) OR (crmcust="chewy" AND cc > 400) OR (crmcust="arval" AND cc > 1200))

How to convert Splunk events to look like the regular HIS/CS log statements I am used too?

| tstats count WHERE (index="main"  earliest="8/1/2020:06:00:00"  latest="8/1/2020:06:30:00" source="mediabar" host="https://chewy.custhelp.com" ) BY _time logLevel crm.instanceId crm.groupId crm.id mb.className mb.functionName message span=1s

Currently, the majority of searches are centered around component names, ‘mb.className' and 'mb.functionName’, and string matching. With some familiarity, at quick glance simply of a component, it can be easily determined if an agent is getting screen pops from Harmony or not.

Components by Version

Logging design is still undergoing changes, so the component names ('mb.className' and 'mb.functionName') can vary by version.

| tstats count WHERE (index="main" earliest="8/17/2020:08:00:00" latest="8/18/2020:08:00:00" source="mediabar") BY  mb.version mb.className mb.functionName | eval major=mvindex(split('mb.version', "."), 0) | eval class='mb.className' . "-" . 'mb.functionName' | search class="*" 
| stats values(class) as lc, count(class) as cc by major

Simplify Log-Style Statements to Fewer Fields (save screen real-estate)

Show fewer fields in our log-style statements, in this case for one agent.  This demonstrates using a sub-search (starts with left bracket '[') for an inexpensive query and passing the result back to the main search pipe. 

| tstats count WHERE (index="main" earliest="-24h" host="https://faq.arval.it" 
[ | tstats count WHERE (index="main" earliest="-24h" host="https://faq.arval.it") BY crm.id | top limit=1 crm.id| rename count as c | rename percent as p | fields - c p | format]) BY _time logLevel crm.id mb.className mb.functionName message span=1s  | eval class='mb.className' . "-" . 'mb.functionName' | search class="*" | table _time logLevel crm.id class message

Significant 'mb.className' and 'mb.functionName' Values

HISPFQA
OmisServicePopflowRuntimeServiceQueueAdapterService
login, startSession, endSession, bind*, screenPopChat, updateAgentStatus, screenPoprun, executePopFlow, nextActivity, activityMapper, orkflowQueue.subscribe connect, disconnect, popChat, acceptEmail, chatComplete, chatConcludeRelease

How to filter data by production versus lower environments?

This query shows how to distinguish between production and lower environments and is also the simplest form of unique agent logins by host (customer URL).

| tstats distinct_count(crm.username) as agents_dc_per_h WHERE (index="main"  earliest="-1h@h" [|inputlookup spl-customer-host.csv | where cloudenv="prod" | fields displaycustomer hostlookup | lookup spl-customer-host.csv displaycustomer cloudenv OUTPUT hostlookup | fields - displaycustomer | rename hostlookup as host | format]) BY _time host span=1h

List of customers and their sites/versions?

| tstats count WHERE (index="main" earliest=-5d@d latest=now source="mediabar") BY crm.customer mb.version network.appManagerDomain network.crmHost | sort mb.version DESC | dedup network.crmHost 
| table crm.customer mb.version network.crmHost | sort crm.customer



The Splunk LookupTable that made the previous query possible:

| inputlookup spl-customer-host.csv | WHERE  NOT (displaycustomer in ("omdemo", "omdev", "omqa", "omtrain") ) | dedup displaycustomer  | lookup spl-customer-host.csv displaycustomer OUTPUT crmcustomer cloudenv hostlookup

Popflow Events

How to identify Popflow and how it is being used, aka "Popflow Overview"? 

| tstats count WHERE (index="main" earliest="8/17/2020:08:00:00" latest="8/18/2020:00:00:00" "mb.className"=PopflowRuntimeService  (( message="*Event '*' *ed" ) OR ( message="*Activity complete*" ) OR ( message="*Starting Activity*" ) OR ( message="*Activity event*" ) OR ( message="*Got*popflow*" AND message!="*Got* 0*" ) OR ( message="*Getting*" )) ) BY _time logLevel crm.customer crm.instanceId crm.groupId crm.id  mb.className mb.functionName message  span=1s | rex field=message "^(?<mytitle>[^{\n]*)(?P<myjson>{.*})" | eval jsonctx=substr(myjson, 1, 40), msgctx=substr(message, 1, 40) | eval class='mb.className' . "-" . 'mb.functionName',  crmgroup='crm.instanceId' . "-" . 'crm.groupId'| search crmgroup="*" class="*"
| table _time logLevel crm.customer crmgroup crm.id class mytitle jsonctx msgctx



Explanation:
  • The use of ‘| search crmgroup="" class=”*”’ clause and all the string matching is, as previously described, because we are still dependent on string matching and class names.
  • The field ‘msgctx’ is present for context and would be used in the case where we remove the filter ‘mb.className’=PopflowRuntimeService.
  • You see we are trying to populate ‘mytitle’ and ‘jsonctx’ fields in this way for 2 reasons: if they are blank it may indicate there is a message we aren't expecting and/or. parsing incorrectly; collapsing 2 fields down to 1 is simply for saving space to view the ‘message’ field without scrolling.
  • One of the most important statements in this query is the use of regular expressions (pattern matching):
    | rex field=message "^(?<mytitle>[^{\n]*)(?P<myjson>{.*})"
    there is a page dedicated to tools for pattern match and JSON manipulation for Splunk, keep checking back for updates.

From here, we are going to keep building upon the Popflow Overview, extract some new information, until we have a fully populated breakdown of the events.

There are workflows authored to act off events and "event detected" messages, which can have a corresponding action to fetch a workflow as "getting popflow for eventId" messages, followed by a "got popflow" message which loads workflow and starts to run activities of different types and tracks "starting activity" and "activity complete" messages.

What Popflow Events are Being Triggered and are the Most Frequent?

index="main" earliest="8/13/2020:08:00:00" latest="8/15/2020:02:32:54"  host="https://lanebryant.custhelp.com" mb.className="PopflowRuntimeService" (message="*Event '*' *ed") | eval evname=mvindex(split(message, "'"), 1)
| rex field=message "^(?<mytitle>[^{\n]*)(?P<myjson>{.*})" | eval contextyn=if(isnotnull(myjson), 1, 0)
| table _time host crm.id evname | stats count(evname) as cntevname by evname | sort cntevname DESC


evnamecntevname
On Answer1187
Open Create Incident620
Check Order/Billing608
Show Open Incidents607
Populate Custom Object578
Wismo140

What Popflow Activities are Being Run?

In overall product usage tracking, I like to track workflows being run and the number of instructions (aka Activities) as an overall indicator of scale and volume. But let’s start with an Activity overview in a log-format style.

index="main"  earliest="8/14/2020:08:00:00" latest="8/16/2020:00:00:00" "mb.className"=PopflowRuntimeService host="https://lanebryant.custhelp.com" ((message="*Starting Activity*"))
| rex field=message "^(?<mytitle>[^{\n]*)(?P<myjson>{.*})" 
| eval jsonctx = if(myjson!="null", substr('myjson', 1, 60), substr('custom.formData.content', 1, 60)), newmsg=if(isnotnull(mytitle), 'mytitle', 'message'), activityname=mvindex(split(message, "'"), 1)
| rex field=newmsg "(([[](INFO|DEBUG|TRACE)[]][[:blank:]])?)(?<msghdr>[^\n]*)"
| table _time logLevel crm.customer crm.id activityname msghdr jsonctx

From Fields Panel, click on ‘custom.displayName’ for Top 10 Values

Normalize the Data, Put Events, Popflow Scripts, and Activities Together

index="main" earliest="8/17/2020:08:00:00" latest="8/18/2020:00:00:00" "mb.className"=PopflowRuntimeService  crm.customer="helenoftroy" ((message="Got  1 popflow(s) from server") OR (message="[*] Got  1 popflow(s) from server") OR (message="Got  1 popflow(s) from cache") OR (message="Getting popflow*") OR (message="[*] Getting popflow*") OR (message="Event '*' detected") OR (message="[*] Event '*' detected") OR (message="Activity complete*") OR (message="[*] Activity complete*") OR (message="Starting Activity*")  OR (message="[*] Starting Activity*") OR (message="*Activity event*") ) | eval const_actstart_pattern="\bStarting Activity\b", const_actcompl_pattern="\bActivity complete\b", enum_eventtype_activity=1 | rex field=message "(([[](INFO|DEBUG|ERROR|EXCEPTION|TRACE|WARN)[]][[:blank:]])?)(?<msghdr>[^{\n]*)((?P<myjson>{.*})?)" | eval jsonctx=substr(myjson, 1, 80), msgctx=substr(message, 1, 80), s1=mvindex(split(msghdr, "'"), 1) | eval s1=if(isnull(s1) AND 'mb.className'=="PopflowRuntimeService", 'msghdr','s1') | eval evttype=case('mb.className' == "PopflowRuntimeService" AND match(msghdr, 'const_actstart_pattern'), 'enum_eventtype_activity', 'mb.className' == "PopflowRuntimeService" AND match(msghdr, 'const_actcompl_pattern'), 'enum_eventtype_activity')
| eval pfactvid=case('evttype' == 'enum_eventtype_activity' and isnotnull(myjson), spath(myjson,"typeId")), formdatactx=case('evttype' == 'enum_eventtype_activity' and isnotnull(myjson), spath(myjson,"formData"))
| rex field=message "((Getting[[:blank:]]popflow[[:blank:]]from[[:blank:]]server([.]{3})[[:blank:]]eventId:[[:blank:]]){1})(?<pfevid>[^\n][0-9]*)" | table _time crm.customer crm.id evttype s1 pfevid pfevname pfactvid pfactvname formdatactx msgtype msghdr msgctx jsonctx | lookup pfactivitytype.csv activityevent as pfactvid OUTPUT activityname as pfactvname | lookup pfeventtypesCSV.csv pfeventid as pfevid OUTPUT pfeventname as pfevname
| table _time crm.customer crm.id evttype s1 pfevid pfevname pfactvid pfactvname formdatactx msgtype msghdr msgctx jsonctx | eval msgtype=case(match(msgctx, "\bGetting popflow from server\b"), "Getting popflow from server", match(msgctx, "\bEvent '.*' detected\b"), "Event detected", match(msgctx, "\bStarting Activity\b"), "Starting Activity", match(msgctx, "\bActivity complete\b"), "Activity complete", match(msgctx, "\bGot  1 popflow\(s\) from server\b"), "Got  1 from server", match(msgctx, "\bGot  1 popflow\(s\) from cache\b"), "Got  1 from cache", match(msgctx, "\bActivity event\b"), "Activity event") | eval s1=if(isnull(s1), 'msgtype', 's1') | table _time logLevel crm.customer crm.id msgtype evttype s1 pfevid pfevname pfactvid pfactvname formdatactx msgtype msghdr msgctx jsonctx

Explanation:

What did we add over the previous queries?
a) 2 or 3 ‘rex’ commands were all handled now in one ‘rex’ command.
b) we extracted ‘eventId’ by string parsing of the ‘message’ field and extracted ‘typeid’ (activity type id) from JSON and then used a lookup table to translate them to friendly names.
c) multiple ‘eval' commands got moved to a single pipe as there is overhead for each pipe
d) there is no single normalized field which is common to all event types (which makes it difficult to manipulate and combine the data later) so we added ‘msgtype’
e) the search patterns on the ‘message’ field in the very first segment of the search, when Splunk finds a match in a pipe it stops processing the rest so I made search patterns more explicit and ordered them by frequency of occurrences so there is a higher chance Splunk will find a match and do less processing. note: the technique for finding frequency of occurrences of the ‘message’ field was the same as we’ve used on this page, which goes something like … '<your search> | stats count(msghdr) as cntmsghdr by msghdr' | sort cntmsghdr DESC

4. Omis Events

4.1. How to identify customer/agent using HIS/Harmony stack and how are they using it, aka Omis Overview?

4.2. What are all the possible Omis message types and how do I work with them?


4.3. How do I check if there are any Omis message types I don't know about?


5. Interactions


6. Agents

7. Sessions



This document may contain confidential and/or privileged information belonging to OpenMethods. If you are not the intended recipient (or have received this document in error) please notify the sender immediately and destroy this document. Any unauthorized copying, disclosure, or distribution of the material in this document is strictly forbidden.