This page is a Quick Reference containing a punch list of run anywhere (copy/paste) Splunk searches to help explore the data in the OpenMethods Splunk events. The use of some advanced techniques is intentional and the foundational explanation for the techniques will be covered in other articles.
1. Basic Internal Structure
In this section, we’ll tackle Splunk’s own internal structure and how it manages indexes, sources, storage, event sizes, and types, and we’ll spot check the indexes _introspection and _internal.
1.1 What is the list of all indexes even if there is zero data in the index?
| eventcount summarize=false index=* index=_* | dedup index | rename count as countofevents | fields index countofevents | sort countofevents DESC |
Note: in order to conserve space on this page for the remaining queries,
- It is a best practice, when readability matters, to author Splunk queries per above with the pipe symbol as the first character of each new line. Instead, the queries will be compacted.
- Sample results will either not be included or will be converted to JSON and shown as a code block like this below (same results set as above).
[{"index":"1905405642","countofevents":"main"},{"index":"23781423","countofevents":"\"_audit\""},{"index":"10004574","countofevents":"\"_internal\""},{"index":"1755528","countofevents":"\"_introspection\""},{"index":"85111","countofevents":"summary"},{"index":"24343","countofevents":"\"_telemetry\""},{"index":"4","countofevents":"lastchanceindex"},{"index":"0","countofevents":"\"_thefishbucket\""},{"index":"0","countofevents":"demomatricindex"},{"index":"0","countofevents":"history"},{"index":"0","countofevents":"netaddinsimport"},{"index":"0","countofevents":"popflowscriptindex"},{"index":"0","countofevents":"tunnelmetricsindex"}]
list of data size, usage, average no. of bytes, events, event size of _raw and license index
2. Basic OpenMethods Topology
In this section, we’ll discover broadly where/how to look for events that give an overall view of how our software is deployed and being used.
2.0 I am not sure what I am looking for, how do I just explore Splunk data?
index="main" earliest="-30m" source="mediabar"
2.1 List of customers and their CTI environment/location of agents (or at least URL agents use to access HIS)?
index="main" earliest="-4h" | stats values(network.his.uri) as hisurl, values(network.his.model) as hiscti by crm.customer
2.2 List of customers and their sites/versions?
| tstats count WHERE (index="main" earliest=-5d@d latest=now source="mediabar") BY crm.customer mb.version network.appManagerDomain network.crmHost | sort mb.version DESC | dedup network.crmHost | table crm.customer mb.version network.crmHost | sort crm.customer
2.3 How to segment agent usage by production versus lower environments?
| tstats count WHERE (index="main" earliest="-1h" [|inputlookup spl-customer-host.csv | where cloudenv="prod" | fields displaycustomer hostlookup | lookup spl-customer-host.csv displaycustomer cloudenv OUTPUT hostlookup | fields - displaycustomer | rename hostlookup as host | format]) BY _time crm.username host | stats dc(crm.username) as total_User by host
The Splunk ‘lookup’ data structure that made the above query possible:
| inputlookup spl-customer-host.csv | WHERE NOT (displaycustomer in ("omdemo","omdev", "omqa", "omtrain")) | dedup displaycustomer | lookup spl-customer-host.csv displaycustomer OUTPUT crmcustomer cloudenv hostlookup
The high water mark of unique agent logins across production:
| tstats distinct_count(crm.username) as dc1 WHERE (index="main" earliest="7/1/2020:00:00:00" latest="8/1/2020:00:00:00" [|inputlookup spl-customer-host.csv | where cloudenv="prod" | fields displaycustomer hostlookup | lookup spl-customer-host.csv displaycustomer cloudenv OUTPUT hostlookup | fields - displaycustomer | rename hostlookup as host | format]) BY _time host | timechart sum(dc1) as all_agents_prod_by_day span=1d | stats max(all_agents_prod_by_day) as all_agents_prod
2.4 How to convert Splunk events to look like basic log statements I am used to for troubleshooting?
| tstats count WHERE (index="main" earliest="8/1/2020:06:00:00" latest="8/1/2020:06:30:00" source="mediabar" host="https://chewy.custhelp.com" ) BY _time logLevel crm.instanceId crm.groupId crm.id mb.className mb.functionName message span=1s
Or to simplify down to a few meaningful fields and one agent (but we don’t know which agent). If we know the agent id, the sub search can be removed which can be a performance issue in some cases.
| tstats count WHERE (index="main" earliest="-24h" host="https://faq.arval.it" [ | tstats count WHERE (index="main" earliest="-24h" host="https://faq.arval.it") BY crm.id | top limit=1 crm.id| rename count as c | rename percent as p | fields - c p | format] ) BY _time logLevel crm.id mb.className mb.functionName message span=1s | eval class='mb.className' . "-" . 'mb.functionName' | search class="*" | table _time logLevel crm.id class message
2.5 How to identify, at a high level, the major components in use by the customer?
index="main" earliest="8/3/2020:06:00:00" latest="8/3/2020:06:30:00" source="mediabar" | eval crmcust='crm.customer' | eval agent='crm.id' | eval class='mb.className' . "-" . 'mb.functionName' | search crmcust="*" agent="*" class="*" | stats values(class) as lc, count(class) as cc by crmcust, agent | where ((crmcust="veritas" AND cc > 1850) OR (crmcust="chewy" AND cc > 400) OR (crmcust="arval" AND cc > 1200)) At a quick glance simply of component, it can be easily determined if an agent is getting screen pops from Harmony or another way. |