Diaphora - how to automated diffing?

Diaphora is a plugin for IDA Pro that aims to help in the typical BinDiffing tasks. It’s similar to other competitor products and open sources projects like Zynamics BinDiff, DarunGrim or TurboDiff. However, it’s able to perform more actions than any of the previous IDA plugins or projects.

Diaphora is distributed as a compressed file with various files and folders inside it. The structure is similar to the following one:

1.diaphora.py: The main IDAPython plugin. It contains all the code of the heuristics, graphs displaying, export interface, etc…

2.jkutils/kfuzzy.py: This is an unmodified version of the kfuzzy.py library, part of the DeepToad project, a tool and a library for performing fuzzy hashing of binary files. It’s included because fuzzy hashes of pseudo-codes are used as part of the various heuristics implemented.

3.jkutils/factor.py: This is a modified version of a private malware clusterization toolkit based on graphs theory. This library offers the ability to factor numbers quickly in Python and, also, to compare arrays of prime factors. Diaphora uses it to compare fuzzy AST hashes and call graph fuzzy hashes based on small-primes-products (an idea coined and implemented by Thomas Dullien and Rolf Rolles first, authors or former authors of the Zynamics BinDiff commercial product, in their “Graph-based comparison of Executable Objects – Zynamics” paper).

4.Pygments/: This directory contains an unmodified distribution of the Python pygments library, a “generic syntax highlighter suitable for use in code hosting, forums, wikis or other applications that need to prettify source code”

Download

git clone https://github.com/joxeankoret/diaphora.git

Running

To run Diaphora, simply, unpack the compressed distribution file wherever you prefer and directly execute “diaphora.py” from the IDA Pro menu File → Script file. Once the script diaphora.py is executed, a dialog like the following one will be opened: alt

The first field is the path of the SQLite file format database that will be created with all the information extracted from the current database. The 2nd field is the other SQLite format database to diff the current database against. If this field is left empty, Diaphora will just export the current database to SQLite format. If the 2nd field is not empty, it will diff both databases. The other fields, the check-boxes, are explained below:

1.Use the decompiler if available. If the Hex-Rays decompiler is installed with IDA and IDA Python bindings are available, Diaphora will use the decompiler to get much interesting information that will help during the bindiffing process.

2.Export only non-IDA generated functions. Self-explanatory, only functions with non-IDA autogenerated names will be exported.

3.Do not export instructions and basic blocks. Export only function summaries. When exporting huge databases, it may help speed up operations. However, the diffing capabilities will be more limited.

4.Use probably unreliable methods. Diaphora uses many heuristics to try to match functions in both databases being compared. However, some heuristics are not really reliable or the ratio of similarity is very low. Check this box if you want to see also the likely unreliable matches Diaphora my find. Unreliable results are shown in a specific list, it doesn’t mix the “Best results” (results with a ratio of 1.00) with the “Partial results” (results with a ratio of 0.50 or higher) or “Unreliable results”.

5.Use slow heuristics. Some heuristics can be quite expensive and take longer. For medium to big databases, it’s disabled by default and is recommended to leave unchecked unless the results from an execution with this option disabled are not good enough. It will likely find better matches than the normal, not that slow, heuristics, but it will take significantly longer.

6.Relaxed calculations of different ratios. Diaphora uses, by default, a kind of aggressive method to calculate difference ratios between matches. It’s possible to relax that aggressiveness level by checking this option. Under the hood, the function SequenceMatcher.quick_ratio is used when this option is unchecked and SequenceMatcher.real_quick_ratio when this option is checked. Also, when the option is checked, Diaphora will use to the different ratio of the primes numbers calculated from the AST of the pseudo-code of the 2 functions, calculating the highest ratio from the AST, assembly and pseudo-code comparisons.

7.Use experimental heuristics. It says it all: experimental heuristics are enabled only if this check-box is marked. Disabled by default as they are likely not useful.

8.Ignore automatically generated names. When performing the comparison between databases, it tells Diaphora to ignore in the “Same name” heuristic functions with the same IDA’s autogenerated name (i.e., when there are two function sub_01020304 in both databases but they aren’t actually the same function). Used only when comparing.

9.Ignore all function names. Just disable the “Same name” heuristic. Used only when comparing.

10.Ignore small functions. Ignore functions with less than 6 assembly instructions. Used only when comparing.

0x01 - setting the environment variables

$ export DIAPHORA_AUTO=1
$ export DIAPHORA_EXPORT_FILE=/path/to/your/new.idb
$ export DIAPHORA_USE_DECOMPILER=1 # optionally, use the decompiler
$ idaq -A -B -Sdiaphora.py your_binary

0x02 - run IDA in batch mode

the diffing results can be exported to a .sqlite database, and it can be automated.

For console export, use the following environment variables:

DIAPHORA_AUTO_DIFF: Set to 1. DIAPHORA_EXPORT_FILE: Where to put the exported data. DIAPHORA_AUTO: Set to 1. Otherwise, it considers it's running from an interactive IDA.

Harry Ren

Diaphora - how to automated diffing?

Comments