Ransomware has occupied the news headlines in the past few weeks with the WannaCry infection significantly impacting global organisations. As of this writing, it is estimated that over 400,000 computers have been infected. In this blog we explore how the Elastic Stack can be used during the triage phase of a malware outbreak to identify potential infections within your organisation.
The ability to quickly search through network and operating system events can enable the rapid identification of machines which have been compromised, given our knowledge of specific malware signatures. The Elastic Stack cannot prevent infection - that requires a combination of people, process, and other technology - or exhaustively identify new malware attack vectors, but it lets you gain rapid insight into your current situation.
If your organisation was recently impacted by the rapid spread of WannaCry, or even if you would just like additional reassurance that no machines in your infrastructure are exhibiting signs of exploitation, this blog provides some quick triage techniques you can use if Packetbeat and Winlogbeat are running in your infrastructure. If you don’t have these installed, we provide some simple instructions to prepare yourself for the next time...maybe HaftaCry, GonnaCry, WillCry, SurelyCry?
To test these techniques, we simulated a WannaCry infection in our malware lab, by configuring two Windows 7 SP1 VMs (unpatched as distributed by the Edge initiative) and a Windows 2012 R2 server VM (also unpatched). The latter was responsible for acting as domain controller, DNS, and DHCP server, as well as managing the SMB shares. Whilst the latter may not be required, we experienced issues (similar to those reported here) getting the malware to propagate when machines belonged only to a workgroup. A fourth Linux VM hosted our Elasticsearch and Kibana instances. This entire environment was hosted on an OSX host and configured with internal networking only i.e., no external access.
We installed the following Elastic Stack components on each Windows VM:
- Packetbeat to monitor network traffic with the default configuration which collects flow events.
- Winlogbeat, combined with the sysinternals Sysmon tool, to collect detailed system events on the windows host. Setup and details inspired by the following community post describing the installation and setup process. The default configuration of Winlogbeat is sufficient with a minor addition to the events_logs section to capture the events generated by Sysmon.
winlogbeat.event_logs: - name: Microsoft-Windows-Sysmon/Operational
The capabilities of Sysmon extend far beyond this post. This tool is hugely configurable and is capable of monitoring a plethora of Windows system events such as process creations, network connections, and changes to file creation time. Our Sysmon configuration, heavily inspired by this excellent example, with accompanying beat configurations have been provided here for reference.
WannaCrypt or WannaCry is an interesting combination of old-time worm and Ransomware, with infection occurring due to a SMBv1 vulnerability. For our purpose, we deliberately infect a machine and track its infection, thus producing signatures you can subsequently identify using Kibana capabilities.
The specific behaviour of WannaCry varies depends on its variant. The well publicised kill switch, for example, is no longer present in more recent copies. We summarise its behaviour below. This isn’t an exhaustive analysis - others have performed comprehensive analysis at 1, 2 and 3. Sites such as hybrid-analysis.com have allowed the community to identify a comprehensive set of signatures for each variant by uploading samples.
After initial infection, the WannaCry ransomware may proceed to:
- Attempt to resolve one of two external domains before issuing a GET request to the select target. If a response is received, the malware terminates. This kill switch was discovered by a researcher who subsequently registered the domains, thus preventing execution of the virus in some environments. We explore the detection of this termination below. More recent variants are now in distribution with this kill switch removed.
- Assuming the kill switch is not present, or is not successful as the domain(s) cannot be reached e.g., due to a proxy, execution continues, and the malware proceeds to encrypt files on the user’s hard drive which match a set of extensions. During this phase of the attack, numerous signatures are generated that can be captured by Sysmon and Winlogbeat. The user is presented with a desktop background and prompt requesting a bitcoin transfer in exchange for the decryption keys required to recover the files.
- Spawns 2 threads - the first checks the IP address of the infected machine and attempts to connect via SMB on port TCP/445 to each host in the subnet. The second thread proceeds to generate random public IP addresses and attempts the same action. External traffic from your organisation over port TCP/445 is highly unusual and thus suspicious.
- If WannaCry is able to connect to a machine via SMB, it proceeds to exploit a known SMBv1 vulnerability addressed by Microsoft in the bulletin MS17-010 (ETERNALBLUE) in order to implant the DOUBLEPULSAR backdoor. The backdoor is used to execute WannaCry on the newly compromised system.
- The above process repeats, causing the infection to rapidly spread through an organisation. Interestingly variants also exist which do not attempt to propagate via this exploit - rather just encrypting network shares and relying on user execution of these files when accessing these shares.
The specific SMB exploit has been well publicised and has been patched by Microsoft. Aspects of the following assume unpatched systems.
Applying the Elastic Stack
For purposes of completeness we selected a variant of the malware which exhibits all of the above signatures. For those looking to replicate the following, this variant has a SHA256 hash of 24d004a104d4d54034dbcffc2a4b19a11f39008a575aa614ea04703480b1022c. Obtaining the this file is left to the user, who should take the typical precautions when dealing with such files.
Detecting a WannaCry Download
As a first step we can utilise the Elastic Stack to identify instances where your users may have inadvertently downloaded or received a copy of the virus. To achieve this we exploit Sysmon’s ability to detect when Alternative Data Streams or ADS’s are added to a file. ADS's are used by browsers and email clients to mark files as originating from the Internet or other foreign sources. In our Sysmon configuration we configure the FileCreateStreamHash event. This causes Sysmon to generate an event when it detects an ADS has been added to a file for a specific set of locations e.g. the “Downloads” folder. Included in this event is a hash for the file contents. These events are subsequently indexed into Elasticsearch by Winlogbeat.
To simulate this, we host a copy our malware with filename “run_me.exe” on our linux image serving over HTTP. Accessing the webserver from our one of our Windows 7 instances, we download the file to the local machine.
This causes the sequence of events we describe above. At this point we should have a document in Elasticsearch in the winlogbeat-* index describing the creation of the Filestream with a field indicating the SHA256 hash. In order to identify if any WannaCry variants have been downloaded in your infrastructure we could simply search for all of the known SHA256’s*. In our artificial case, we are aware of the identifier and simply search using Kibana discover for “24d004a104d4d54034dbcffc2a4b19a11f39008a575aa614ea04703480b1022c”.
The Elasticsearch document provides the details required to confirm the download of the WannaCry executable to our Windows 7 machine.
Expanding on beat.name on our left hand column allows us to quickly identify machines exhibiting this signature - in this case fortunately just our test image! From here a Kibana data table might be an effective means of visualising a comprehensive list of machines at risk.
In practice, we would likely wish to receive proactive alerts if subsequent risky files were downloaded. Most organisations maintain a list of known threat hashes. Using the Elastic Stack Alerting capabilities coupled with these threat identifiers, a simple watch could in turn inform administrators of the potential risk during early stages of infection.
Detecting WannaCry Execution
Hopefully at this point your infrastructure is not showing signs of potential infection. If you’re unfortunate enough to detect potential compromise, your next likely question is “Where are there signs of successful execution.” At this point we’re likely using Elasticsearch only to help with triaging of the threat in large infrastructures, given WannaCry’s obvious visual indicator of successful infection i.e., after we execute on our Windows 7 VM:
However, for other malware infections, signs may be more subtle, so looking for other signs of infection may be valuable.
The following list of signatures is not an exhaustive list, but provides a set of fairly good indicators that WannaCry has successfully executed and they are easily found by searching the Elasticsearch indices.
Many of the below processes and commands occur in a randomly hex named subdirectory located in “C:\ProgramData”. Sysmon captures activities through ProcessCreate and ProcessAccess events, as well identifying when the files are created via FileCreate.
Presence of a suspicious SHA256 hash for a ProcessCreate Event.
Sysmon provides a SHA256 hash for processes created. Results for this search indicate the root WannaCry process has executed.
Execution of the process mssecsvc.exe. This in turn executes tasksche.exe.
Mssecsvc.exe is dropped by the infection on initial execution. This uses tasksche.exe to test for the kill switch domains. tasksche.exe is used below also to change directory permissions and find files for encryption.
tasksche.exe spawning a number of processes, including but not limited to:
tasksche.exe checks for disk drives and network shares - specifically for files matching a set of extensions. This file is responsible for the encryption itself. This process also assigns permissions to files in the current directory and those beneath through the icacls and attrib commands.
Creation of the files taskse.exe and taskdl.exe by tasksche.exe
tasksche.exe creates these file on the host in order to achieve a number of tasks. Taskdl.exe deletes temporary files whilst taskse.exe is used in the next step.
Launching of @WanaDecryptor.exe
taskse.exe is responsible for launching the WanaDecryptor.exe which in turn displays the above ransom note to the user. This process is also indirectly responsible for Tor traffic.
The tor.exe is dropped into a directory “Tor/” inside the hex named Program Directory with its supporting dlls.
The tor.exe file is executed by @wanadecryptor@.exe. This initiates network connections to Tor nodes thus allowing WannaCry to preserve anonymity via the Tor network.
Network activity to known addresses and domains e.g. 126.96.36.199 and gx7ekbenv2riucmf.onion as well traffic from source port 9050.
Exact addresses used depend on the variant.
Misc files on disk e.g. @Please_Read_Me@.txt
These files indicate the user files have been encrypted and accompany user files on disk.
Deletion of shadow copies on the machine
By deleting shadow copies this makes recovery extra tricky.
Detecting the Infection Spread
After executing the ransomware, we eventually see our second Windows VM infected.
As an extra analysis step, it may be interesting to see indicators of machines through which the infection has attempted to spread, via the SMB exploit described above. We should see SMB outbound activity on port TCP/445 and TCP/139, specifically:
- TCP/445 traffic to each host in the subnet of the infected machine
- TCP/445 traffic to random IP address on the Internet
If we look for activity on port TCP/445 from our infected host we can see where it has been possibly communicating internally - via the packetbeat flow data. A search for dest.port:445 quickly highlights SMB traffic flowing between the hosts.
Using a quick time series builder visualisation we can immediately see the spike in 445 traffic occurring in our small network at the point of execution.
Likewise we can identify those hosts which have recently experienced a sudden increase in the amount of SMB traffic, by visualising a derivative of the sum of bytes on the 445 port and grouping by host (beat.name).
The sysmon data also allows us to detect the the original process responsible for the infection by searching within the Winlogbeat event data for “event_data.DestinationPort:445” and referencing the event_data.Image field.
The spreading of the malware within the network relies on a compromised srv2.sys (SMB driver) injecting a launcher.dll into the user-mode process lsass.exe on other target hosts. This acts as the loader for the mssecsvc.exe described here. As other machines in the network are compromised, the process repeats. This process injection actually caused our second Windows 7 instance to memory dump and restart in our tests, before also succumbing to the infection. This infection exploit can be examined by searching for “"lsass.exe" AND *taskse.exe” across the Winlogbeat events.
We weren’t able to identify outbound traffic coming from our infected hosts. Given that they were isolated from the Internet, it is possible the malware terminates this thread, or our variant simply does not contain this behaviour. This would be an excellent signature to alert on using the Elastic Stack alerting capabilities, given external SMB traffic is particularly unusual.
Detecting Kill Switch Activity
If you detect incomplete execution of WannaCry, with no reports of ransom notes in your infrastructure, it is possible that execution of the malware was halted by the kill switch.
As discussed earlier, prior to execution WannaCry is known to test several domains by attempting a DNS resolution followed by a HTTP GET request to confirm their availability. A successful response causes the malware to terminate. This kill switch was possibly inserted for testing purposes. The following domains are contacted:
Whilst these domains have subsequently been registered, helping reduce the spread of the infection, our environment is isolated from the Internet. In order to elicit this behavior, we add both domains to our DNS server, hosted on the Windows 2012 VM, prior to executing. Each of these domains return the Windows Server as the host, causing the malware to issue a GET request on port TCP/80. To ensure a response is also provided to this request, we configured IIS to run and respond on this port.
Note the malware HTTP connection is unfortunately not proxy aware, so machines behind proxies did not benefit from the registration of the above domains. Note: If proxy logs are being collected in Elasticsearch, then searching for blocked requests to these domains would be a good indication of kill switch activity.
After execution of our downloaded binary we can see the DNS request by searching for the domains above in our “Packetbeat-*” index.
This is followed by a HTTP request to our windows server at 192.168.56.101.
This again might be a signature that warrants an alert using the Elastic Stack.
This blog post shows how the Elastic Stack can be used to quickly detect signatures related to the download, infection, spread, and kill switch activity of the WannaCry ransomware, helping to gain insight into the state of infection within your infrastructure, during initial triage.
Further work in this area could include refining the Sysmon configuration to detect and analyse additional signatures. Additionally, the new machine learning capabilities for the Elastic Stack may be applicable in larger infrastructures to detect unusual network activity associated with this or similar malware.
As usual we’d love to hear from community members who have datasets which prove these theories!
The data collected for the purposes of this blog has been made available as an Elasticsearch snapshot with instructions for restoring here.