EVIL: Exploiting Software via Natural Language

EVIL: Exploiting Software Via Natural Language
Pietro Liguori⋆, Erfan Al-Hossami † , Vittorio Orbinato⋆,
Roberto Natella⋆, Samira Shaikh †, Domenico Cotroneo⋆, Bojan Cukic †
⋆ Università degli Studi di Napoli Federico II, Naples, Italy
† University of North Carolina at Charlotte, NC, USA
⋆{pietro.liguori, vittorio.orbinato, roberto.natella, cotroneo}@unina.it
† {ealhossa, samirashaikh , bcukic}@uncc.edu
The 32nd International Symposium on Software Reliability Engineering

ISSRE, October 25-28, 2021 pietro.liguori@unina.it 2
Offensive Security

Shellcode Attack
Shellcode
(assembly)
Antivirus
Target System
Encoded Shellcode
Encoder
(Python)
Decoder
(assembly)
x31xc0x50x68x2fx2fx73x68x
68x2fx62x69x6ex89xe3x50x8
9xe2x53x89xe1xb0x0bxcdx80
0x4e,0x70,0x51,0x61,0x58,0xf4,0
xdb,0xe1,0xef,0xef,0xef,0x6a,0x2
a,0x41,0xdb,0x4c,0xdb,0x20,0xef
,0xbf,0x3b,0x78,0xcb,0x77,0xfb,0
x57,0xc5,0x90,0x23,0x62,0x58,0x
f0,0xc5,0xe1,0x33,0xe5,0x28,0x9
d,0xc5,0x3d,0x43,0xf6,0x56,0x29
,0xad,0x29,0x02,0x57,0x55,0x34,
+
Payload
The payload decodes
itself when executed
0x31,0xc0,0x50,0
x68,0x2f,0x2f,0x7
3,0x68,0x68,0x2f,
0x62,0x69,0x6e,0
x89,0xe3,0x50,0x
89,0xe2,0x53,0x8
9,0xe1,0xb0,0x0b,
0xcd,0x80,
Decoded
Shellcode
Shellcode

Proposed Solution
 Writer needs to master programming and obfuscation
techniques to develop a successful exploit
 We propose an approach, EVIL, to automatically
generate exploits in assembly/Python language from
descriptions in natural language by using Neural
Machine Translation (NMT) techniques
Move /bin//sh into
the ebx register
push 0x68732f2f
push 0x6e69622f
mov ebx, esp

Contributions
 A substantive dataset containing exploits
collected from shellcode databases and their
descriptions in the English language
 A new approach that applies NMT techniques
to automatically generate exploits based on
their description in the English language
 An extensive experimental study to evaluate
the feasibility of the proposed approach at
generating real exploits.

EVIL Approach
Intent Parser Standardizer Embedding
Pre-processing
Seq2Seq CodeBERT
NMT Models
De-
standardizer
Code
Snippet
Cleaning
Post-processing
Training Data
Perform xor between
dl and 0xbb and jump
to next_cycle if the
result is zero
Perform xor var0 and var1
and jump var2 if result zero
xor var0, var1 n jz var2
xor dl, 0xbb
jz next_cycle
User
Processed
Training Data
1
2
3
4
5
6

Data
NL Intents
Python Code
Snippets
Unique Lines 15,421 14,034
Unique
Tokens
10,605 9,511
Avg. Tokens
per Line
14.90 9.53
NL Intents
Assembly
Code
Snippets
Unique Lines 3,689 2,542
Unique
Tokens
1,924 1,657
Avg. Tokens
per Line
9.53 4.75
Python (Encoder) Dataset
15,540 pairs intents-snippets
Assembly (Decoder) Dataset
3,715 pairs intents-snippets
It extends the Django dataset
"Learning to Generate Pseudo-Code from Source
Code Using Statistical Machine Translation", Oda
et al., ASE, 2015.
It extends the Shellcode_IA32 dataset
”Shellcode_IA32: A Dataset for Automatic Shellcode
Generation", Liguori et al., NLP4Prog, 2021.

Test Set
 The test sets cover 20 different exploits (i.e., it is not
randomly spitted)
• 20 Python encoders
• 20 assembly decoders
 The pairs intents-snippets in the test set are unique and
are not included in the training/dev sets.
 Detailed information on the test sets can be found in the
appendix of the paper
https://github.com/dessertlab/EVIL/tree/main/Appendix
Statistics
Python
Encoders
Assembly
Decoders
Average 23.4 26.4
Median 19 24
Min 11 17
Max 64 46

Automated Evaluation
Model
BLEU-1
(%)
BLEU-2
(%)
BLEU-3
(%)
BLEU-4
(%)
ACC
(%)
Seq2Seq no
IP
72.34 62.55 56.88 52.2 31.2
Seq2Seq with
IP
86.92 83.68 81.42 79.69 45.33
CodeBERT
no IP
79.23 74.12 69.84 65.76 48.00
CodeBERT
with IP
89.22 86.78 84.94 83.50 56.00
Model
BLEU-1
(%)
BLEU-2
(%)
BLEU-3
(%)
BLEU-4
(%)
ACC
(%)
Seq2Seq no
IP
35.83 26.1 21.38 17.69 25.25
Seq2Seq with
IP
88.38 86.37 85.51 85.05 40.98
CodeBERT
no IP
33.42 28.39 25.38 22.72 45.9
CodeBERT
with IP
88.21 85.53 84.05 82.99 45.9
Python
Dataset
Assembly
Dataset

Human Evaluation
 NL Intent: res2 is the result of the bitwise and operation
between res2 and val1
 Predicted Snippet: res2 = val2 | val1
Model
Syntactic
Correctness
(%)
Semantic
Correctness
(%)
Seq2Seq
with IP
88.53 50.13
CodeBERT
with IP
93.60* 67.73*
Python
Dataset
Assembly
Dataset
Model
Syntactic
Correctnes
s (%)
Semantic
Correctnes
s (%)
Seq2Seq
with IP
90.16 56.36
CodeBERT
with IP
87.05 61.90*
Syntactically Correct Semantically Incorrect

Whole Exploit Evaluation
𝑆𝑦𝑛𝑡𝑎𝑐𝑡𝑖𝑐 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑛𝑠𝑠 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑖 =
𝑛𝑠𝑦𝑛
𝑖
𝑛𝑡
𝑖 𝑆𝑒𝑚𝑎𝑛𝑡𝑖𝑐 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑛𝑠𝑠 𝑝𝑟𝑜𝑔𝑟𝑎𝑚 𝑖 =
𝑛𝑠𝑒𝑚
𝑖
𝑛𝑡
𝑖
 𝑛_𝑠𝑦𝑛^𝑖 is the number of automatically-generated snippets that
are semantically correct of the i-th program
 𝑛_𝑠𝑒𝑚^𝑖 is the number of automatically-generated snippets that
are semantically correct of the i-th program
 𝑛_𝑡^𝑖 is the number of total lines of the i-th program

Qualitative Analysis
NL Intent Ground Truth Snippet Model Output
Append 0xAA to the string
encode
encode += “0xAA” encode += “0x02x” %170
Convert the value of leader to
hexadecimal, then slice it at
index 3, convert it to an int16
and set its value to the
variable sb
sb =
int(hex(leader)[3:],
16)
sb =
int(hex(leader)[3:],
32)
NL Intent Ground Truth Snippet Model Output
Perform the xor between the
current byte of the shellcode
and the dl register
xor byte [esi], dl xor byte [esi], dl
Zero out the ecx register and
move 25 in the lower 8 bits of
the register
xor ecx, ecx n
mov cl, 25
xor ecx, ecx n
mov al, 25
Python Dataset
Assembly Dataset

Conclusion (1/2)
 We presented EVIL, an approach that applies NMT
techniques for automatic exploit generation for
security assessment purposes
 We developed and released two datasets of real
exploits in Python and assembly language to enable
neural network training and experimental evaluation
 We evaluated the feasibility of the approach showing
its ability in generating software exploits with high
syntactic and semantic correctness

Conclusion (2/2)
 Future work includes
 the improvement of the post-processing to increase the
accuracy of the program generation task
 the development of a single engine that generates the
encoding and decoding schemes at the same time
 We invite the readers to give a look at our GitHub
repo containing the datasets and the code to
reproduce the experiments in the paper:
https://github.com/dessertlab/EVIL

EVIL: Exploiting Software via Natural Language

Recommended

Recommended

More Related Content

Similar to EVIL: Exploiting Software via Natural Language

Similar to EVIL: Exploiting Software via Natural Language (20)

Recently uploaded

Recently uploaded (20)

EVIL: Exploiting Software via Natural Language