Practical Malware Analysis (.PDF)

Arslan Sabir
System Weakness
Published in
4 min readFeb 9, 2022

--

In this blog we are going to analyze a malicious Pdf file. We are going to use multiple tools for analysis. Mainly we are going to use Didier stevens pdf tools.

Tools To Have

https://docs.remnux.org/install-distro/get-virtual-appliance
https://blog.didierstevens.com/didier-stevens-suite/

Test Case:

You are working as malware analyst in ABC Company and your EDR solution detect a malicious PDF file. You check its hash reputation in different threat intelligence platform and you see it is marked as malicious by many security vendors even if it is not marked as malicious you will manually check the file if it has any malicious URL or Scripts in it.

First you will setup your sandbox in which you are going to examine the file. You can use remnux or you can install didierstevenssuite in kali what ever you are comfortable with. Your mindset should be like ….

Now lets start the Investigation, there are multiple tools you can use for investigation but some tools in my idea which are easy to use are mentioned below you can use different tools you like if you are not comfortable with these:

1. Pdfid

Identifies PDF object types and filters Useful to triage PDF documents

pdfid.py filename

As you can see it tells us about the pdf header its objects and streams and then a lot of different keywords that are found in PDF. It tells us it has /openaction keyword and multiple /Javascript keyword in it which tell us that something is malicious in it. We will now go ahead and use pdf-parser to examine these.

2. Pdf-Parser

Parsers, searches, and extracts data from PDF documents

Below are some flags you use with Pdf-parser while analyzing.

pdf-parser.py — search openaction file name

By using this command, we will search the specific word in the object, as we can see we have found /openaction keyword in catalog object and /openaction has javascript in it which will run as soon we open the file. If we did not know that there is javascript we know now.

Now we will search for the JavaScript in the file using below command

pdf-parser.py — search JavaScript badpdf.pdf

As you can see, we have JavaScript in 3 objects and in those objects. Now we will see the object 13 using below command

pdf-parser.py — object 13 badpdf.pdf

As you can see object 13 pdf-parser shows it is a stream object, but it also has a filter placed in its /FlateDecode. By default, pdf-parser will not decode the filter.

Now we will decode it using below command

pdf-parser.py — object 13 -f -w badpdf.pdf

  • -f for filters decoding
  • -w for showing output in raw format

This is the JavaScript in the file which is going to run as soon the file is open we can also analyze the JavaScript but we will do later sometime in another blog.

--

--