Editor’s Note: Elastic joined forces with Endgame in October 2019, and has migrated some of the Endgame blog content to elastic.co. See Elastic Security to learn more about our integrated security solutions.
Last week, WannaCry left its mark across the globe, affecting hundreds of thousands of machines in over 100 countries. While it certainly has been more widespread than previous ransomware, WannaCry is just the latest example of the growing prevalence of ransomware. As I explained in a previous post, ransomware is now a billion dollar industry, and is only growing in popularity among attackers due to the risk calculus and profitable business model.
Although WannaCry included fairly unsophisticated ransomware, it leveraged the Eternal Blue SMBv1 exploit from the Shadow Brokers data dump to propagate to other hosts. It is important to catch these kinds of attacks as early as possible in the attack chain, but what happens when those steps are circumvented by extremely customized and sophisticated techniques? This is why a layered approach to detection and prevention is necessary. While Endgame's layered protection approach ensures that we have the means to prevent and detect threats throughout multiple steps of the attack chain, there needs to be an additional line of defense in the event that ransomware or other destructive malware manages to be invoked on a host. Endgame ransomware protection provides that capability.
The core functionality of ransomware often is different compared to other malware. Ransomware attackers have a very straightforward mission: make files inaccessible and collect ransom payments from the affected users in order to restore access. Therefore, the code can be as simple as modifying files and providing recovery capabilities. It often does not contain large amounts of, or any, code for network communications with a command and control server, evading endpoint defenses, persistence mechanisms, or user surveillance capabilities, all of which are common attributes of other types of malware. Because of this, simple ransomware often gets through other endpoint protection mechanisms. This, combined with the enormous impact of a ransomware campaign, means that defenders require specialized capabilities to detect and block ransomware activity on a host at runtime before critical data is lost.
Endgame has provided customers with ransomware protection in our endpoint agent, and will continually enhance this feature going forward. Our approach to solving this problem at runtime involves many pieces operating in parallel. I will discuss three of these in this post:
- File operations
- Shannon entropy
- Anomaly scores
File operations will drive analysis of file data and metadata, Shannon entropy is utilized to make key observations regarding the file content in a mathematical context, and anomaly scores are derived from file entropy and a variety of other measurements and characteristics. These concepts combine to provide introspection into each active process and allows us to detect ransomware activity.
File Operations and Anomaly Scores
The first building block is to gain visibility on activities on the filesystem. While a Windows host is active, files are constantly being created, modified, and read in the background. These file operations contain a wealth of information that can serve as the foundation for any ransomware detection capability, but the number of operations that may occur in a minute can be in the tens or even hundreds of thousands. In the face of this deluge of data, filtering plays a crucial role in separating the signal from the noise.
Our filtering approach is based around the following three pieces of data:
- Anomaly scores
If we were to analyze a file every single time one is opened, closed, read, or modified, that would result in an overwhelming amount of redundant analysis. When it comes to ransomware, though, we are primarily concerned with whether a file has been modified. For the sake of data simplification, we group file modifications by the following high level operations:
Once we're limited to those four file operations, we attribute each incoming file operation to a single process. Since we are attempting to distinguish ransomware activity on a per-process basis, we need the means to analyze each relevant operation and how it relates to the process by which it was invoked. This grouping will help us maintain metrics on processes over time and quickly determine anomalous activity as it begins to occur.
Each file that is modified by a particular process, along with the process itself, is assessed a score that reflects the level of anomalous characteristics that were discovered through analysis. The effect that each characteristic has on the anomaly score will vary depending on how abnormal the characteristic is when compared to a particular baseline. This weighted score is based along a scale, with more anomalous attributes being weighted higher. The scale is derived from a combination of applied mathematics along with domain expertise honed through thorough manual and automated analysis of ransomware samples.
Finally, the affected filepath provides several key data points that can be used to filter out and group file operations:
- File extension
- File name
Certain file directories might typically see higher volumes of data modifications than others. In these cases, we can opt for less rigorous analysis of these directories to avoid overtaxing our detector. There are also other directories that are only typically modified by one or more specific processes with a certain level of privileged access, so any processes modifying these directories that do not fit within the normal range would immediately appear to be anomalous.
Depending on the extension of the file that has been modified, a particular operation may be viewed as more or less relevant. For instance, consider a file with a known temporary file extension compared to a Word document file. Both files will be analyzed in the same manner, but a higher weight will be factored into any anomaly score calculations for operations relating to the Word document as opposed to those pertaining to the file with the temporary extension.
The number of unique file extensions, the number of files per unique file extension, and the specific file extensions that are modified all factor into the anomaly score for each process. A process that modifies several files across extensions that are known to be typically targeted by ransomware would generally be viewed as more anomalous than a process that modifies mostly temporary or helper files avoided by ransomware.
While I could write a long-winded description of Shannon entropy and how it relates to file contents, I'd rather recommend this excellent write-up by Lance Mueller of ForensicKB instead. In short, entropy is a measure of the randomness of a specified set of data. The more random the data is, the higher its resulting entropy will be when it is calculated. Low-to-middle entropy data tends to contain only a subset of the 256 possible byte values (0x0 - 0xFF), while high entropy data contains byte values that span the entire range. This can be extrapolated to presume that typical file types (e.g. XML, HTML, TXT) will generally have lower entropy values than binary file types (e.g. EXE, DLL, MSI), among others.
So, how can entropy be used to help detect ransomware? For individual file types / extensions, we can devise expected entropy ranges based on manual inspection of the file specification as well as calculating the average entropy for a sufficiently large set of sample files. Since typical encryption algorithms produce high entropy output, we're interested in files that have been modified or created and now possess high entropy values that exceed their predetermined range (based on their file type). Also, if the average entropy of the files being modified by a given process is higher than the expected average entropy based off of their file types, this can be reasonably assessed as an even stronger indicator of potentially anomalous activity than that of a single file exceeding its typical entropy range.
Take for instance an XML file. XML files typically consist of text data that is represented by bytes within the ASCII range (0x0 - 0x7F), though they do not use the full extent of the characters within that range. As 8.0 bits / byte is the highest possible entropy value and 0.0 bits / byte is the lowest possible entropy value, we should generally expect XML files to fall somewhere within the middle range of possible entropy values due to its relatively limited usage of byte values.
Now, when that same file is run through an encryption algorithm (AES-256 in CBC mode in this instance), we can see that the contents become scrambled and incoherent, and no discernible words in English are readable.
It should come as no surprise, then, that the entropy value has significantly increased from 5.212 to 7.918. For a file type such as XML, the encrypted entropy value far exceeds its expected range and is much closer to being perfectly random than a normal file of its type should be, which would come across as very anomalous to our protection feature.
Detection of high entropy is not a complete solution. Compressed data must be handled. It tends to possess high entropy, so the acceptable entropy range for certain file types needs to be adjusted. This makes it difficult to tell the difference between compressed data and encrypted data. While Monte Carlo pi approximation, chi square distribution, and other calculations may help distinguish between encrypted and compressed content, the additional overhead introduced by these calculations may cause unacceptable slowdown when processing thousands of files per minute.
Another issue is that there are also ransomware variants that employ encryption routines which produce much lower entropy data than typical encryption algorithms, so a file lacking high entropy is not definitively an indicator that it is likely benign. Other approaches are needed to deal with this case, which are beyond the scope of this post.
Entropy is not the be-all, end-all measurement that can affirm whether or not a file contains encrypted content, but it provides a very useful window through which we can gather further evidence of processes that may be modifying files in an abnormal manner.
Each file operation will be subjected to further proprietary anomaly screening beyond filepath and entropy analysis, which results in modifications to the file anomaly score. The additional data that is yielded, when combined with the results of filepath and entropy analysis, provides an extensive overview of a given file operation and allows for immediate detection of anomalous behavior. Various approaches were tested and integrated into our scoring throughout our research, leading to the scoring system in the product today.
How It All Comes Together
As each file operation passes through our ransomware-detecting Rube Goldberg machine, all necessary data associated with the affected file will be extracted and analyzed, and a minimal amount of data summarizing the operation is maintained for posterity. The file's anomaly score will then be logged and added to the process anomaly score.
In the event that the process anomaly score meets or exceeds a predetermined threshold, the process will be suspended immediately. A pop-up dialog will alert the user to the suspended ransomware activity and provide them with the option to terminate or resume the offending process.
Case Study: WannaCry
When WannaCry began grabbing headlines earlier in May, as we detailed, our research team immediately obtained the dropper and the core encryptor binary (tasksche.exe) in order to perform offline testing against our ransomware protection feature. The embedded video below walks through launching the encryptor on a virtual machine with ransomware protection enabled.
Endgame ransomware protection detects the presence of ransomware activity on the machine quickly after the encryptor launched and before thousands or even hundreds of files could be encrypted. The speed with which the ransomware is detected and mitigated protects against critical data loss, expediting the return to business as usual. An alert containing detailed process activity data is also generated and sent back to the Endgame sensor management platform, allowing for further triage of the ransomware and the workflow which resulted in it being invoked on the system. In addition to MalwareScore™, Endgame ransomware protection serves as another line of defense against extensive critical data loss caused ransomware such as WannaCry.
As long as ransomware remains a profitable criminal venture, attackers will continue to pursue new means to compromise networks and deploy ransomware. Just like other forms of malware, ransomware can be stopped at various points along the attack chain. However, given the customization of techniques, and persistence of targeted attackers, it is essential to be able to provide an additional line of defense. Endgame’s ransomware protection provides this, integrating detection techniques based on filepaths, entropy, and our own proprietary algorithms to protect against the broad range of ransomware in the wild today. As the WannaCry example demonstrates, our ransomware protection is effective at stopping well known but also emergent strains of ransomware.