Lit Ctf 2022 Pwn Writeup - Part 2: Regex

A challenge with a new heap exploitation technique: House of Muney. In this challenge, all we need is: a leak, an arbitrary free, and only one malloc (with input) to control RIP.
And only applicable to binary without FULL RELRO protection.

Regex (1 solve)

regex
I didn’t manage to solve this challenge sadly :(
In the same manner of the last challenge, this is also another very restrictive binary with a more subtle bug, I couldn’t exploit the bug I found due to it using a very recent exploitation technique.
It also does not help that this is one of the more obscure technique that you most likely don’t know about.

Before we head into the actual challenge you should read more about House of Muney
How we are gonna solve it is the same idea as shown in the blog.

Understand the binary, and try to look for anything abnormal

int __cdecl __noreturn main(int argc, const char **argv, const char **envp)
{
  const char *v3; // rdi
  int v4; // [rsp+4h] [rbp-Ch]
  char v5; // [rsp+Bh] [rbp-5h]
  int v6; // [rsp+Ch] [rbp-4h]

  puts("Welcome to my regex pattern searching program!\n");
  puts("First you need to initialize the pattern.\n");
  v6 = -1;
  while ( v6 == -1 || v6 == 8 || v6 == 7 || v6 == 9 || v6 == 5 || v6 == 10 || v6 == 2 || v6 == 13 )
  {
    v6 = input_pattern();
    if ( v6 )
    {
      puts("Invalid expression, try again.\n");
      free(preg);
    }
  }
  puts("Pattern initialized.\n");
  divider();
  v3 = "Now you can match strings to your pattern.\n";
  puts("Now you can match strings to your pattern.\n");
  while ( 1 )
  {
    do
    {
      match_string(v3, argv);
      puts("Would you like to match another string? (y/N)");
      v5 = getchar();
      getchar();
      v3 = &byte_2018;
      puts(&byte_2018);
    }
    while ( v5 == 'y' );
    if ( v5 != 89 )
    {
      regfree(preg);
      divider();
      puts("I hope you liked our program!\n");
      puts("Before you go, could you leave a review?\n");
      puts("Input your review length:");
      v4 = insz();
      free(s);
      s = (char *)malloc(v4);
      puts("Input your review:");
      instr(v4);
      puts("Thanks!");
      _exit(0);
    }
  }
}

We first look at 2 input function: insz (input size) and instr (input string)

__int64 insz()
{
  int v1; // [rsp+Ch] [rbp-4h] BYREF

  __isoc99_scanf("%d", &v1);
  getchar();
  puts(&byte_2018);
  if ( v1 <= 0 || v1 > 196608 )
    invalid("size, out of bounds");
  return (unsigned int)v1;
}

We can malloc a very large size here smaller than 0x30000 bytes

int __fastcall instr(int a1)
{
  char *v1; // rbx

  fgets(s, a1 + 1, stdin);
  v1 = s;
  v1[strcspn(s, "\n")] = 0;
  return puts(&byte_2018);
}

Input string uses fgets thus makes it very safe.

__int64 input_pattern()
{
  int v1; // [rsp+Ch] [rbp-4h]

  puts("Input the size of your pattern:");
  v1 = insz();
  free(s);
  s = (char *)malloc(v1 + 1);
  puts("Input your pattern:");
  instr(v1);
  preg = (regex_t *)malloc(0x40uLL);
  return (unsigned int)regcomp(preg, s, 0);
}

Throughout the binary, global s is used for every input string operation, and will get freed before malloc again.
Also there is nothing that looks like vulnerability, it does simple job of input string and regex compile it.

int match_string()
{
  int v1; // [rsp+4h] [rbp-Ch] BYREF
  int sz; // [rsp+8h] [rbp-8h]
  unsigned int i; // [rsp+Ch] [rbp-4h]

  puts("Input the size of the string you'd like to match:");
  sz = insz();
  free(s);
  s = (char *)malloc(sz + 1);
  puts("Input the string you'd like to match:");
  instr(sz);
  puts("Input the number of subexpressions matches you'd like to list:");
  __isoc99_scanf("%d", &v1);
  getchar();
  puts(&byte_2018);
  if ( v1 < 0 || v1 > 4096 )
    invalid("# subexpressions, out of bounds");
  ++v1;
  free(pmatch);
  pmatch = (regmatch_t *)malloc(8LL * v1);
  regexec(preg, s, v1, pmatch, 0);
  puts("Matching complete.\n");
  puts("Here is the first match interval found:");
  printf("[%d, %d]\n", (unsigned int)pmatch->rm_so, (unsigned int)(pmatch->rm_eo - 1));
  puts(&byte_2018);
  puts("Here are the subexpressions found:");
  for ( i = 1; (int)i < v1; ++i )
    printf("%d: [%d, %d]\n", i, (unsigned int)pmatch[i].rm_so, (unsigned int)(pmatch[i].rm_eo - 1));
  return puts(&byte_2018);
}

match_string just simply input string to s, and then regexec it with number of subexpressions of our choice.
The pmatch array is also malloc with enough size, thus no overflow either.

if ( v5 != 89 )
    {
      regfree(preg);
      divider();
      puts("I hope you liked our program!\n");
      puts("Before you go, could you leave a review?\n");
      puts("Input your review length:");
      v4 = insz();
      free(s);
      s = (char *)malloc(v4);
      puts("Input your review:");
      instr(v4);
      puts("Thanks!");
      _exit(0);
    }

After we are done with matching string, the binary regfree(), malloc and gives us the last time to input.
There is a off-by-null vulnerability here, but due to the binary exiting right after it becomes not possible to exploit.
It happens because, it malloc(v4) but in instr we will fgets(v4+1) instead, so when fgets puts null byte at the end of the string, it will be out of bound.

After a quick review, that’s the only abnormal thing I found.
There is also a leak inside the match_string function, because the function print out everything inside pmatch array.
However there is situation when pmatch remains uninitialized after regexec thus it will print out stuffs on the heap, which means we can leak heap address and libc address

The most abnormal thing is the long condition for a GLIBC function

while ( v6 == -1 || v6 == 8 || v6 == 7 || v6 == 9 || v6 == 5 || v6 == 10 || v6 == 2 || v6 == 13 )
  {
    v6 = input_pattern();
    if ( v6 )
    {
      puts("Invalid expression, try again.\n");
      free(preg);
    }
  }

GLIBC function often returns 0 for success, so the while condition being that long is always a red-herring for bug.
With that condition there will be cases when regcomp() fails, preg gets freed, but the binary continues moving forward, leading to an UAF

The rabbit hole I felt into: Digging too deep into GLIBC source code for regexec()

The two times we gets to interact with the freed preg is either regexec or regfree at the end.

  • Regfree will give us the ability to arbitrary free, because regfree will have to free a lot of thing in struct re_pattern_buffer stored at preg, which with UAF we can modify the struct.
  • Regexec is a more complicated function, that try as I might I could not understand how our fake struct re_pattern_buffer could lead to any meaningful exploitation.

However due to the binary exiting right after regfree, I could not think of a way to exploit that. In a similar way I solved House of Cockarocha, I tried to find anything useful in regexec.

And that’s how I spent the majority of my time during the CTF. Obviously, I could not find anything useful :d

The newly discovered territory: The mmap chunk

We know that we have arbitrary free, but freeing a normal chunk does not help us much. Because we only get to malloc and input once then the binary exits.
Thus the solution is to free a fake mmap chunk.

Mmap chunk like the name suggests, is a chunk that was allocated through syscall mmap.
And when that chunk gets free, it gets munmap-ed and that memory section gets deleted out of existence.

  • So what if we free a fake mmap chunk? We can delete or munmap an entire memory section.

  • So what if we delete the read-only memory section of libc? We can then try to malloc a very large chunk which triggers malloc to mmap a new memory section, which might overlap with the read-only section and then we can write into that section.

  • So what if we can write into read-only memory section of libc? We can then write into .dynsym segment, which means we can control what function gets returned from _dl_run_time_resolve

That’s why it’s only applicable to binary without FULL RELRO protection because everything is resolved beforehand.

That was the abstract idea of House of Muney

The details of the technique

First we need to know what makes a valid mmap chunk so we can free it later on.

static void
munmap_chunk (mchunkptr p)
{
  size_t pagesize = GLRO (dl_pagesize);
  INTERNAL_SIZE_T size = chunksize (p);

  assert (chunk_is_mmapped (p));

  /* Do nothing if the chunk is a faked mmapped chunk in the dumped
     main arena.  We never free this memory.  */
  if (DUMPED_MAIN_ARENA_CHUNK (p))
    return;

  uintptr_t mem = (uintptr_t) chunk2mem (p);
  uintptr_t block = (uintptr_t) p - prev_size (p);
  size_t total_size = prev_size (p) + size;
  /* Unfortunately we have to do the compilers job by hand here.  Normally
     we would test BLOCK and TOTAL-SIZE separately for compliance with the
     page size.  But gcc does not recognize the optimization possibility
     (in the moment at least) so we combine the two values into one before
     the bit test.  */
  if (__glibc_unlikely ((block | total_size) & (pagesize - 1)) != 0
      || __glibc_unlikely (!powerof2 (mem & (pagesize - 1))))
    malloc_printerr ("munmap_chunk(): invalid pointer");

  atomic_decrement (&mp_.n_mmaps);
  atomic_add (&mp_.mmapped_mem, -total_size);

  /* If munmap failed the process virtual memory address space is in a
     bad shape.  Just leave the block hanging around, the process will
     terminate shortly anyway since not much can be done.  */
  __munmap ((char *) block, total_size);
}  

We have to bypass this check:

if (__glibc_unlikely ((block | total_size) & (pagesize - 1)) != 0
      || __glibc_unlikely (!powerof2 (mem & (pagesize - 1))))
    malloc_printerr ("munmap_chunk(): invalid pointer");

pagesize - 1 is often just 0xfff, so to bypass:

__glibc_unlikely ((block | total_size) & (pagesize - 1)) != 0

We just need block and toltal_size both begins with 0x000. Like this:
fake
And also the size of the chunk has to have second bit set (to show that this is a mmap chunk)
We just have to calculate the size a little bit so part of LIBC gets munmap-ed.

Now we just need to malloc a large chunk, so we will get a mmap chunk instead. I’m gonna reference the blog above:

“Very large chunks (mmap chunks) are assigned in a completely different way in order to prevent fragmentation with one-off extremely large chunks. The mmap_threshold is used in order to determine the largest chunk size that should go onto the standard section of the heap. In order to allocate an mmap chunk, the size has to be larger than the mmap_threshold.

When an mmap chunk is freed, the size of the chunk is checked with the previously largest freed mmap chunk (or the default size), which is known as the mmap_theshold. If the size being freed is larger, then the mmap_threshold is updated. For example, mmap_theshold is 0x100000. If we free a chunk of size 0x200000 then the mmap_theshold gets updated to 0x200000.

If the size of the chunk being allocated is NOT larger than the mmap_threshold value, then the chunk will be put into the normal heap section instead of getting a special mmap chunk.”

To get a mmap chunk we needs to malloc a size larger than mmap_threshold. But if we free a mmap chunk with size larger than mmap_threshold, the mmap_threshold gets updated. So maybe after update, our limited malloc might not be enough for a mmap chunk?
Luckily this is not a issue, because mmap_threshold has a MAX limit.

if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* See if the dynamic brk/mmap threshold needs adjusting.
	 Dumped fake mmapped chunks do not affect the threshold.  */
      if (!mp_.no_dyn_threshold
          && chunksize_nomask (p) > mp_.mmap_threshold
          && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX
	  && !DUMPED_MAIN_ARENA_CHUNK (p))
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

If the chunksize is larger than DEFAULT_MMAP_THRESHOLD_MAX then mp_.mmap_threshold is not updated.

define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))

DEFAULT_MMAP_THRESHOLD_MAX is only 0x2000000 bytes which is smaller than the size we need to munmap LIBC region.
With some empirical testing, we can find that very often mmap chunk is placed right below LIBC when we malloc with size >= 0x21000
If I remember correctly when the size is larger than some threshold, it will be placed above LIBC and under LD.

Now we can just write into GLIBC read-only segment, we can just copy libc data completely and only change the part that is important.
In our case because _exit will gets resolved we only need to change _exit in .dynsym.

typedef struct {
        Elf64_Word      st_name;
        unsigned char   st_info;
        unsigned char   st_other;
        Elf64_Half      st_shndx;
        Elf64_Addr      st_value;
        Elf64_Xword     st_size;
} Elf64_Sym;

st_value is where we change it to one_gadget, the other info will stay the same.

The full exploit plan

  • We will input an invalid regex pattern, but still pass the while condition check => leading to UAF.
  • We will use UAF, to write into re_pattern_buffer, replace struct re_dfa_t *__REPB_PREFIX(buffer) (it’s the first 8 bytes) with address that points to region filled with null. This will make regexec returns immediately and pmatch array is not initialized, thus leaking content on heap. Use this to leak both heap and libc address.
  • Use UAF again, to point struct re_dfa_t *__REPB_PREFIX(buffer) to a fake mmap chunk.
  • regfree will free our fake mmap chunk, LIBC read-only segment gets munmap-ed
  • malloc a chunk with size larger than 0x21000 bytes, it will gets placed under libc. We want to achieve this:
    mmap
  • Write into libc data, to change Elf64_Sym of _exit.

Exploit script from the challenge author because I didn’t solve it.

Conclusion

Sadly I could not solve this due to not knowing this recent exploitation technique. But glad that I learnt something new from the ctf.
There is also this great arcticle Munmap Madness that I found last month, but I never read it :(
If I did, I might be able to solve this challenge.

piers

My personal blog on the journey of learning how2pwn


Writeup on the pwn challenge: Regex

By Piers, 26-07-2022