Classification and Identification of Malware

So far, we have gone through various malware analysis topics from introduction to the various methods of malware analysis where we have specialized on static analysis to this point. In this article, I will take you through malware classification and identification and through the creation of Yara rules to help you identify malware.

In order to distinguish the different unique types of malware, it is important to classify these malwares. This helps give a better understanding of how these malwares can infect an organization’s computers and personal devices.

What is malware classification?

An organization faces different kinds of malware attacks, it is important that the security engineers classify malware that an attacker will commonly use to attack the organization and protect against them. This also helps to ease up the analysis process since after gathering some information about the malware during analysis, they can quickly identify and classify the malware and stop it. It will also enable the security team to effectively prioritize incidents that may arise.

The malware analysis phase in incident response majorly involves identifying and understanding the type of malware detected. The end of results will be used as input for the malware classification. A good analysis result includes a set of Indicators of Compromise (IoCs) and detailed information such as the characteristics, propagation method and behavior of the malware.

In an ideal word, classification of malware is hard and very dependent on the goal the security team wants to achieve. I say this because real world malware often has a wide range of protection methods, propagation methods and target distribution. Additionally, malware families share numerous similarities but have minor modifications that cause confusion during classification.

Many of us may think that classifying malware using the unique hash generated is a good idea. But for me it isn’t and here are some of the reasons why:

  • Cryptographic hashing is only accurate if the data remains the same. When even one line of code is changed, the hash changes.
  • An attacker changes the contents of the malware sample to evade hash-based identification/classification which he/she knows will be done by any malware analyst or forensic analyst.

Note that: An attacker does not have to change the whole code of the malware sample even changing just a small part of it will change the hash completely as much as he/she may keep its functionality the same.

For example, attackers will normally use random strings to change the hash of the malware sample and avoid hash-based detection. This is what malware analysts call Garbage strings. They can also use them to waste the time of the analysts when performing malware analysis since one of its main aim is to hide the main functionality of the malware sample.

With this I would say, hash-based detection/identification is inaccurate and should not be relied on for accurate malware classification/identification. This will mostly affect malware samples that have been randomly collected from the internet.

An example of a tool that helps in malware identification and classification is YARA.

Yara is a tool that works by matching patterns across various malware samples for the purpose of malware identification and classification. Some of the things that Yara can do include:

  • Generating rules that identify particular signatures that can be used in future detection of similar infections.
  • Signature based identification on particular signatures.

For one to use Yara effectively, you need to create Yara rules. These rules will be used to identify malware based on the specific strings or binary data you indicate.

Basic Structure of Yara Rules

For Introduction to Yara, you can look at my article Threat Hunting with Yara.

Meta: includes the description of the malware and the rule for example you can include your name on the author section, creation date, a description, file hashes of the file, name of the malware among others.

Strings: In this section, you are required to include the strings of the malware which can include the file PE header. These strings are important features that will be used to identify the malware we want to identify.

Condition: this is the logic part of the rule whereby we state what we want the rule to do based on the strings we have called out in the above section and it is evident in the screenshot above.

After performing static analysis, we have now collected important information about the malware we are analyzing. Using that information, a threat/malware analyst can create a Yara rule that will help identify future infections from malwares that have similar patterns or from that specific malware.

Article by Christine Wambiru. Wambiru is a final year student, Bachelors of Science (Mathematics and Computer Science), at Machakos University. She is passionate about tech, especially cybersecurity. She is a vibrant member of SheHacks KE and a trainer-she has trained on information gathering and social engineering. Engage with her on her socials; LinkedIn: Christine Wambiru, Twitter: @cwambiru

A community of Women in Cybersecurity from various backgrounds and counties across Kenya.