Tech Topics

How Endgame Protects Against Phishing From Macro-Enabled Documents

Editor’s Note: Elastic joined forces with Endgame in October 2019, and has migrated some of the Endgame blog content to See Elastic Security to learn more about our integrated security solutions.

Phishing continues to be one of the most effective methods of compromise according to Verizon’s Data Breach Investigations Report. Adversaries often use crafted documents containing malicious macros and a deceptive lure to achieve initial access to target users and networks. These macro-based attacks remain difficult to stop for numerous reasons. Adversaries are becoming more clever with their phishing schemes, and mail filtering will never stop a determined adversary from delivering payloads to inboxes. In addition, human nature and a lot of history tells us that users will open and interact with malicious attachments. Finally, security products do a poor job in providing the necessary safety net of detection and prevention without prior knowledge of the specific attack.

Today, we are introducing MalwareScore for macros - a machine learning-based detector for malicious macro-enabled Microsoft Office documents - to protect against phishing designed to gain a foothold on a targeted system. This new capability was just released into VirusTotal and prevents compromise before malicious macro documents are even opened. Endgame already provides extensive signatureless protection against this class of attack, including prevention against documents exploiting a vulnerability in rendering software like Adobe Reader or Microsoft Word, inline protections against non-macro attacks, tradecraft analytics on suspicious behaviors of commonly targeted software like Office, and our full suite of protections against payloads delivered via the initial macro-based attack. With the addition of MalwareScore for macros, Endgame provides unparalleled protection against phishing campaigns with macro-enabled documents that seek to gain access to endpoints.

World, Meet MalwareScore for Macros

Endgame has already proven excellence in known and unknown malware detection with MalwareScore for Windows PE files and Mac executable files. To counter malicious macros, Endgame Research built MalwareScore for macros, a static machine-learning driven malware protection for malicious Office macros. This high-efficacy macro malware classifier was released today in VirusTotal and will soon be available to Endgame customers in our 3.0 product release.

In creating MalwareScore for macros, we applied many lessons-learned from building and maintaining MalwareScore for Windows and Mac. However, creating a classifier for macros has its own quirks and challenges to overcome. There are multiple file types to consider, unique issues related to file similarity, totally new feature engineering requirements, difficulty in gathering large benign and malicious datasets, and difficulty generating high quality labels on the training and test data. These are just a few of the challenges we encountered. Before addressing how we overcame each of these challenges in creating MalwareScore for macros, let’s take a quick look at why this protection is so essential in the first place.

Macros Gone Bad

Visual Basic for Applications (VBA) has been used in Office documents since it was introduced in Excel in 1993. One of the first widespread macro viruses, Melissa, appeared in 1999 and forced several tech giants to shut down their email systems to prevent the virus from spreading. Since then, malicious Word and Excel documents with seemingly important information have been flowing freely to inboxes everywhere with cleverly constructed content encouraging the user to click “Enable Content”. The most effective of today’s macro-enabled attacks are not the easy to spot scam emails of previous eras, but rather are extremely sophisticated. They leverage everything from modifications to real documents to the vast wealth of personal information available online to blurry text and fake mentions of unlocking encryption to successfully deliver both targeted and widespread phishing attacks.

Email remains the most frequent delivery mechanism for phishing attacks, but social media also is an increasingly popular attack vector. Groups such as Iran’s Cobalt Gypsy/Oil Rig target individuals at strategic organizations, connecting with them via social media and eventually convincing them to download malicious macro-enabled documents onto their corporate networks. Earlier this year, the Pyeongchang Winter Olympics served as a decoy to target organizations with a macro-based phishing campaign, hoping the malicious document would enable compromise and access to corporate information. APT28 and other aggressive Russian actors have consistently used macro-enabled documents to gain access to their highest value targets in the US and abroad.

We could cite dozens or hundreds of additional examples, but campaigns have similarities. The weaponized macro-based documents often evade detection because they are multi-stage, take advantage of legitimate functionality within Windows and the Office toolsuite, and leverage credible-looking user prompts for execution. Given how frequently the attack documents change and how legitimate these attacks appear, a machine learning-based classification approach can drastically improve prevention rates when modeled carefully and robustly. However, this is not a trivial task.

Challenges with Creating A Macro-Based Classifier

When creating a macro-based classifier, there are a range of unique considerations. Many years of evolving versions of Office which the classifier must support, several distinct file types, and the absence of trustworthy labels on samples, especially newly in-the-wild malicious documents, are some of the special challenges to overcome. There also are the typical considerations of guaranteeing high detection efficacy, negligible false positives, and performing at scale and speed. I’ll address four of the most important challenges we solved when creating this new capability: 1) Parsing macros effectively; 2) Feature engineering; 3) Lack of solid labels; and 4) Identifying similar samples across different documents.


Parsing macro-enabled Office docs is an exercise in enumeration and iteration. Not only are there multiple file types such as Word documents, Excel spreadsheets, and other document types to deal with, there are also multiple Office versions that have different ideas of file structure. Pre-2007 Office versions use a binary file format whereas those after 2007 use an XML file format, which is effectively a zip file to house the contents.

By checking for specific combinations of byte strings in the file, we can determine the type and format of the file and thus the the location of relevant code within the file. With that in hand, the OLE (Object Linking and Embedding) Streams must be parsed to get the full account of macro text. OLE Streams in documents are analogous to an internal filesystem and can be comprised of very few streams or many thousands. Parsing these streams helps determine which contain macro code and collecting that as text for analysis.

Finally, once all code streams (see Figure 1 below) are parsed, the text segments are passed to our feature generation process, described next.

Figure 1: Office VBA File Format Structure (Source: Microsoft)


As with most applied machine learning problems, feature engineering for malicious macro classification is one of the most important and impactful, and thus guarded, steps in the model creation process. Feature engineering means converting raw input, in this case the code streams we parsed in the previous step, into an array of numbers which our chosen modeling software can use for building a model.

For this feature, our feature engineering focused on analysis of code streams driven by close collaboration between reverse engineers, threat researchers, and data scientists. Some of our features depend on counting reserved keywords such as “connect” and “thisdocument”. Others collect string metrics, perhaps to detect obfuscation or encoding, while others conduct more involved textual analysis. All-in-all, we generate hundreds of features per sample for analysis. The final feature set is the end product of months of iteration, experimentation, and testing to ensure our desired levels of efficacy.


Supervised classifiers require large, reliably labeled datasets. This is a significant challenge for any machine learning problem in security, but good labels for macros are especially hard to come by. Having no AVs call a file bad is not a stellar indicator of non-maliciousness for any file in security, and that’s an issue we’ve dealt with successfully with past classifiers. In the realm of macro enabled documents, this issue is especially challenging.

To increase certainty in a label, the industry often relies on internal, and sometimes external, crowdsourcing to generate signatures and compile blacklists. We sought to build upon that idea by developing a framework we dubbed Active Labeling to make it quick and easy for Endgame reverse engineers to provide a label for a given sample, and have that feed back into the training pipeline.

First, we generated the list of samples that would make the “biggest” impact to classification performance, specifically targeting samples that have significant uncertainty or that can most influence the decision boundary of our classifier. These samples are often scored at the good/bad threshold for a given model (e.g. 0.49-0.51). Next, we automated the extraction of the macro, IOCs, and any other metadata that could aid in the labeling of a sample and display to the analyst in a web UI. This provides an intuitive interface for the analysts to grab a sample and make a judgement as quickly as possible. These "human labeled” samples are fed back into our machine learning training to further improve performance. Active Labeling allows us not only to detect troublesome samples, but to efficiently enhance and refine our future machine learning models to better predict new and unknown samples. Our classifier would not have been shippable without significant effort in this area.


One of the few things everyone (mostly) agrees about in security is how to identify a file. The file hash, such as sha256, uniquely identifies each malware sample. With PE or Macho executable malware, because of polymorphic malware and code modifications over time, looking for a hash you already know about is an imperfect method for finding that malware in the future. Despite it not being a great solution for robust future detections, a hash is a very useful and in fact an industry-standard quality to key off of when it comes to whitelisting, blacklisting, and similar actions. We need to think a little differently about macro-enabled documents.

Just like with executables, the hash of a document can be used to find that exact same file in a network. However, if someone changes even one cell in the fake spreadsheet the entire file hash changes while the malicious macro, the part we are most concerned about, remains unaffected. We should instead look at the macro itself, not the phishing content, as our anchor for sameness..

We’ve implemented an idea similar to ImpHash which we internally call MacroHash. Instead of hashing the entire file, we perform some light sanitation on the OLE Streams, order them, and hash them in aggregate. This way we uniquely fingerprint the same combination of macros across multiple host Office documents, and are unaffected by changes in the file contents and thus file hash when seeking identical samples in our training set or providing necessary customer-facing features like whitelisting.


MalwareScore for macros is now live in VirusTotal! Macro-enabled phishing attacks aren’t going away anytime soon. They continue to be the easiest way into many target networks, and defenses have been woefully inadequate. Creating MalwareScore for macros was truly a collaborative process across Endgame requiring significant cross-functional innovation. We’re very excited to share it with the wider security community through inclusion in VirusTotal. The description of challenges we faced when building MalwareScore for macros furthers our commitment to demystifying security and provides transparency to our current and future customers about how Endgame takes features from idea to product.