Exploring Splunk for Investigative Analysis
Comprehensive Splunk Series — Part 2
TryHackMe — Splunk 2 — Writeup
In the earlier article, as detailed in the previous post, we established the installation of a Splunk Daemon on a Linux-based host. Today, our focus shifts towards delving deeper into Splunk Investigations.
To embark on this investigative journey, we require a dataset. I’ve imported a substantial dataset conveniently accessible via this GitHub. This dataset is generously available in the public domain, and we’ll harness its resources to acquire insights into the art of conducting investigations through Splunk.
Our mission will involve thoroughly examining this dataset to address the questions presented within the TryHackMe room, which can be accessed through this link.
Data Credits: The data included in this app was generated in August of 2017 by members of Splunk’s Security Specialist team — Dave Herrald, Ryan Kovar, Steve Brant, Jim Apger, John Stoner, Ken Westin, David Veuve, and James Brodsky.
Installing Apps
To effectively manipulate our data, it’s necessary to install several applications as outlined on the GitHub page.
Below, you’ll find a brief example illustrating how to go about installing applications.
Exploring Splunk for Investigative Analysis
Metadata Command
| metadata type=sourcetypes index=botsv2 | eval firstTime=strftime(firstTime,"%Y-%m-%d %H:%M:%S") | eval lastTime=strftime(lastTime,"%Y-%m-%d %H:%M:%S") | eval recentTime=strftime(recentTime,"%Y-%m-%d %H:%M:%S") | sort - totalCount
The metadata command in Splunk retrieves metadata information about events in a specified index or set of events based on certain criteria. Metadata includes details about the structure, characteristics, and attributes of the events themselves, such as the fields present in the events, the sourcetype, host, source, and various statistical data related to the events.
TSTATS Command
| tstats count WHERE index=botsv2 by sourcetype
This is a transforming command that stands for “TimeSeries STATSistics.” It provides statistical information about each indexed field. This command is highly efficient because it doesn’t require the raw data to be loaded but instead works on the indexed data, making it faster and more resource-efficient.
It is used to perform a statistical analysis of event counts based on the sourcetype field within the specified index, which is “botsv2” in this case.
100 Series Questions
Question 1
Amber Turing was hoping for Frothly to be acquired by a potential competitor which fell through, but visited their website to find contact information for their executive team. What is the website domain that she visited?
To kick things off, we’ll initiate a search for the term “amber” within the “botsv2” index. This initial step significantly reduced our search results to approximately 56,000 events. However, this still doesn’t quite meet our requirements.
Reviewing the first few events, we’ve identified that the sourcetype
is consistently marked as "PAN:traffic." To refine our search further, we can click on the different sourcetype and specifically choose "PAN:traffic" as the selected sourcetype.
Having identified Amber’s IP address as 10.0.2.101, our next step is to create a targeted search query using her IP address. Given our focus on HTTP traffic, we will refine our search to specifically include events with the sourcetype “stream:http.”
index="botsv2" 10.0.20.101 sourcetype="stream:http"
Since we are focused on finding which website Amber visited, we will narrow our search using the ‘site’ interesting field.
index="botsv2" 10.0.2.101 sourcetype="stream:http" | dedup site | table site | sort -count
Question 2
Amber found the executive contact information and sent him an email. What image file displayed the executive’s contact information? Answer example: /path/image.ext
Now that we’ve identified the website Amber visited, we can proceed to craft a more precise query to isolate and filter the HTTP traffic specifically related to that website.
index="botsv2" 10.0.2.101 sourcetype="stream:http" www.berkbeer.com
index="botsv2" 10.0.2.101 sourcetype="stream:http" www.berkbeer.com | table uri_path
Question 3
What is the CEO’s name? Provide the first and last name.
Question 4
What is the CEO’s email address?
Building upon the previous question, I searched Amber’s email address. To accomplish this, I devised a query to refine my search, focusing specifically on SMTP traffic.
index="botsv2" amber sourcetype="stream:smtp"
I searched Amber’s email and came across an email exchange involving mberk@berkbeer.com. I incorporated this into the search, resulting in only one remaining event. After examining the ‘Content Body,’ it became evident that the CEO’s name is Martin Berk.
index="botsv2" amber sourcetype="stream:smtp" aturing@froth.ly | spath sender | search sender="mberk@berkbeer.com"
Question 5
After the initial contact with the CEO, Amber contacted another employee at this competitor. What is that employee’s email address?
Running this Search reveals 16 events, Upon reviewing the events we can see that the other employee Amber contacted was Heinz Bernhard.
Question 6
What is the name of the file attachment that Amber sent to a contact at the competitor?
The query described above also provided the solution to this particular question.
Question 7
What is Amber’s personal email address?
In an email exchange with Heinz Bernhard, he inquired about Amber’s personal email address. Amber promptly responded in the subsequent email, as evidenced by the email’s subject line.
After analyzing the content body, it appears to be encoded, likely in base64 format, corroborated by the presence of the “content_transfer_encoding.” Utilizing Cyberchef simplifies the process of decoding and extracting the information within.
200 series questions
Question 1
What version of TOR Browser did Amber install to obfuscate her web browsing? Answer guidance: Numeric with one or more delimiter.
The questions asks for version of TOR browser, Amber installed. Therefore, let’s begin by conducting a keyword search for ‘Amber’ and ‘TOR.’
index="botsv2" amber tor
This initial search yields over 300 events, which can be overwhelming to review comprehensively. To narrow down our search results, let’s incorporate another keyword. Since we know she installed the TOR browser, let’s include the keyword ‘install’ in our search.
index="botsv2" amber tor install
With the inclusion of the keyword ‘install,’ our search results have been reduced to just over 100, making it a more manageable number to review. Interestingly, the very first event contains the information we are seeking.
Question 2
What is the public IPv4 address of the server running www.brewertalk.com?
As we know that we are communicating with a web server operating on port 80, we will construct a search query incorporating keywords and the destination port number.
index="botsv2" sourcetype="stream:http" www.brewertalk.com dest_port=80
We can determine the IP address of the web server from the event.
Question 3
Provide the IP address of the system used to run a web vulnerability scan against www.brewertalk.com.
Upon running the search and inspecting the src_ip
field, it becomes evident that 45.77.65.211 is the IP address that is generating the highest volume of traffic directed towards the web server.
index="botsv2" www.brewertalk.com
While we can reasonably assume that 45.77.65.211 is the IP address responsible for scanning the web server, we should refine our search further to confirm this with greater certainty.
index="botsv2" www.brewertalk.com src_ip="45.77.65.211"
After examining the “form_data” field, we have attained a high level of confidence and certainty that 45.77.65.211 is indeed the IP address responsible for scanning the web server. Achieving absolute certainty is our goal.
Question 4
The IP address from Q#2 is also being used by a likely different piece of software to attack a URI path. What is the URI path? Answer guidance: Include the leading forward slash in your answer. Do not include the query string or other parts of the URI. Answer example: /phpinfo.php
index="botsv2" src_ip="45.77.65.211"
We have narrowed down our search to the attacker’s IP address and have explored the “uri_path” field. It is evident that “/member.php” is the URI path that receives the highest number of hits.
Question 5
What SQL function is being abused on the URI path from the previous question?
To provide an answer to this question, we will build upon the previous search query and continue our investigation.
index="botsv2" src_ip="45.77.65.211" uri_path="/member.php"
By examining the ‘form_data’ field, it becomes apparent that there is evidence of an SQL injection attack.
Question 6
What was the value of the cookie that Kevin’s browser transmitted to the malicious URL as part of an XSS attack? Answer guidance: All digits. Not the cookie name or symbols like an equal sign.
As usual, we’ll commence with a keyword search for ‘Kevin.’ We have successfully identified the full name of Kevin as Kevin Lagerfield.
With some understanding of XSS, we can refine our search by focusing on the value ‘document.cookie,’ which is often associated with XSS attacks involving the manipulation of cookies within a web application.
index="botsv2" sourcetype="stream:http" document.cookie
We have observed a ‘cookie’ value listed under the interesting fields, and we can include this in our search query to obtain the answer we are seeking.
Question 7
What brewertalk.com username was maliciously created by a spear phishing attack?
We initiate our search by performing a keyword search for ‘brewertalk.com.’
Recognizing that we are in search of a user creation event, I focus and filter the events by utilizing different sources and specifically opting for ‘xmlwineventlogs.’ This narrows down the search to 3 events.
Upon reviewing the second event, it becomes apparent that these attackers are employing subtle and deceptive tactics.
300 series questions
Question 1
Mallory’s critical PowerPoint presentation on her MacBook gets encrypted by ransomware on August 18. What is the name of this file after it was encrypted?
For our search, we have the following information: the user’s name is Mallory, they work on a MacBook, the date is August 18, and we are searching for a specific file. Let’s commence the search with a keyword search based on this information.
The search has returned an excessive number of events, and we need to further refine it. To narrow down our search, we can utilize the “host” field, and we are specifically seeking a PowerPoint presentation.
index="botsv2" host="MACLORY-AIR13" (*.ppt OR *.pptx)
Question 2
There is a Games of Thrones movie file that was encrypted as well. What season and episode is it?
In this instance, we are searching for a movie file, which could have various extensions like .avi or others. We have information about the host and sourcetype from our previous investigation, and we also know that the file extension after encryption is ‘crypt.’ Let’s leverage this information to formulate a search query.
index="botsv2" host="MACLORY-AIR13" sourcetype=ps *.crypt
Question 3
Kevin Lagerfield used a USB drive to move malware onto kutekitten, Mallory’s personal MacBook. She ran the malware, which obfuscates itself during execution. Provide the vendor name of the USB drive Kevin likely used. Answer Guidance: Use time correlation to identify the USB drive.
After conducting a keyword search for ‘kutekitten’ and ‘usb,’ we have successfully reduced the number of events to 40. Further examination of the ‘interesting_fields’ reveals the presence of ‘tag’ and ‘vendor.’ Let’s proceed to refine our search by incorporating these fields.
A swift online search for the vendor_id has unveiled the answer we were seeking.
Question 4
What programming language is at least part of the malware from the question above written in?
We initiate our search with a keyword query and promptly observe the presence of the user ‘mkraeusen.’
index="botsv2" kutekitten
index="botsv2" kutekitten mkraeusen
I examine the various intriguing fields to further refine my search, and during this process, I happen to come across the ‘name’ field.
index="botsv2" kutekitten mkraeusen name=file_events
After examining three out of the five events, I discover a hash value. I intend to submit this hash value to VirusTotal for further analysis.
It certainly appears to be malicious, and interestingly, there is even an indication of the programming language used in the suspicious content.
Question 5
When was this malware first seen in the wild? Answer Guidance: YYYY-MM-DD
Question 6
The malware infecting kutekitten uses dynamic DNS destinations to communicate with two C&C servers shortly after installation. What is the fully-qualified domain name (FQDN) of the first (alphabetically) of these destinations?
Question 7
From the question above, what is the fully-qualified domain name (FQDN) of the second (alphabetically) contacted C&C server?
The relations tab provides all the information needed for the task.
400 series questions
Question 1
A Federal law enforcement agency reports that Taedonggang often spear phishes its victims with zip files that have to be opened with a password. What is the name of the attachment sent to Frothly by a malicious Taedonggang actor?
Let’s commence our search with a keyword query. Based on the information provided, we are specifically looking for zip files and emails.
index="botsv2" sourcetype="stream:smtp" *.zip
This concise search query effectively reduces our search results from 60 million events to just 6 events, demonstrating the impressive capabilities of Splunk.
Question 2
What is the password to open the zip file?
Question 3
The Taedonggang APT group encrypts most of their traffic with SSL. What is the “SSL Issuer” that they use for the majority of their traffic? Answer guidance: Copy the field exactly, including spaces.
In line with the question’s focus on finding the “SSL Issuer,” let’s initiate a keyword search using terms like SSL, TLS, and HTTPS.
By examining the ‘sourcetype’ field, we will proceed to filter the events specifically by ‘stream:tcp.’ We have a lot of interesting fields related to ssl including ‘ssl_issuer’.
Having identified multiple SSL certificate issuers and being aware of the attacker’s IP address, let’s further narrow down our search, resulting in just one SSL issuer.
index="botsv2" (SSL OR TLS OR HTTPS) sourcetype="stream:tcp" dest_ip="45.77.65.211"
Question 4
What unusual file (for an American company) does winsys32.dll cause to be downloaded into the Frothly environment?
Conducting a search using keywords, as per the question.
index="botsv2" winsys32.dll
We have FTP being run on the system which is used to download something on the system.
index="botsv2" sourcetype="stream:ftp" loadway=Download
Question 5
What is the first and last name of the poor innocent sap who was implicated in the metadata of the file that executed PowerShell Empire on the first victim’s workstation? Answer example: John Smith
Question 6
Within the document, what kind of points is mentioned if you found the text?
Question 7
To maintain persistence in the Frothly network, Taedonggang APT configured several Scheduled Tasks to beacon back to their C2 server. What single webpage is most contacted by these Scheduled Tasks? Answer example: index.php or images.html
We should concentrate on scheduled tasks and create a search query accordingly.
index="botsv2" schtasks.exe
We narrow it down by using ‘Account_Domain’ as a refinement since there is only one, and this reduces the number of events to a more manageable 36.
I’ve come across a particularly intriguing event in which a PowerShell command is being executed, and this has furnished us with additional information for our next search.
index="botsv2" \\Software\\Microsoft\\Network
We will further narrow down our search by focusing on “WinRegistry.”
These events pertain to modifications made to Registry values. To find the answer, decode the base64-encoded data included in these events.
I must admit, this task turned out to be far more enjoyable than originally anticipated. The learning curve was quite steep, but the experience was truly remarkable. It’s astonishing to think that with just a few straightforward searches, I managed to reduce the number of events from 60 million to just 6.
I want to express my gratitude to all the viewers for reading. Please stay tuned, as there will be more to come.