Splunk

From wikinotes

Splunk is a log indexing/searching tool.

TODO:

thoroughly read manual, organize and document with query syntax filed under SPL2

Documentation

official docs https://docs.splunk.com/Documentation/Splunk
search docs https://docs.splunk.com/Documentation/Splunk/8.0.6/Search/GetstartedwithSearch

Query Syntax

Ad-Hoc Log Contents

You can define a record with your own custom fields ad-hoc, and test queries against them.
It executes much more quickly.

| makeresults | eval foo=split("a,b,c", ",") | eval c=mvcount(foo)

Target Hosts

Splunk queries will show up within splunk. Allowlist your host, or denylist splunk hosts.

searchterm host!=host.domain.com

Time Ranges

# earliest/latest can be specified in searchbar
# and can use relative time ranges
#
# DateFmt: %m/%d/%Y:%H:%M:%S
#
# Timezone of UTC is assumed

Error earliest=07/06/2020:17:57:05 latest=+10m

earliest=08/26/2020:15:00:00  # like watching log
searchterm  earliest=07/06/2020:17:57:05  latest=+10m
| rex "id=(?<id>\d+)" # regex match, extract 'id'
| where status="404"  # select by attribute values
| dedup id            # remove duplicate results
| table id colA       # select only these fields in result

Operators

users IN(will,alex,maize)  # grouped or statements
user=will OR user=alex     # statements separated by or (note parentheses inneffective)

Sorting

sorting is performed over a field.
if it is not already defined, you can extract a field with rex.

"SELECT * FROM"
  | rex field=_raw "(?<query_duration>\(\d+\.\d+ms\))"   # assigns 'query_duration' field
  | sort query_duration                                  # sorts results by 'query_duration'

Counts, Sums, Distinct-Counts

MyJob | stats count(job_id) AS distinct_job_id_count  # distinct job_id values in records

RestAPI

CLI