Do you like to read? I can take over your Kindle with an e-book

August 6, 2021

Research By: Slava Makkaveev

Introduction

Since 2007, Amazon has sold tens of millions of Kindles, which is impressive. But this also means that tens of millions of people could have potentially been hacked through a software bug in those same Kindles. Their devices could be turned into bots or their private local networks could be compromised, and perhaps even information in their billing accounts can be stolen.

The easiest way to remotely reach a user’s Kindle is through an e-book. A malicious book can be published and made available for free access in any virtual library, including the Kindle Store, via the “self-publishing” service, or sent directly to the end-user device via the Amazon “send to kindle” service.
While you might not be happy with the writing in a particular book, nobody expects to download one that is malicious. No such scenarios have been publicized. Antiviruses do not have signatures for e-books.
But… we succeeded in making a malicious book. If you were to open this book on a Kindle device, it could have caused a hidden piece of code to be executed with root rights. From this moment on, you can assume that you have lost control of your e-reader.

The issues we found were reported to Amazon in February 2021 and fixed in the 5.13.5 version of Kindle’s firmware in April 2021. The patched firmware will be installed automatically on devices connected to the Internet.

Kindle Touch architecture

Basically, the Kindle OS is a Linux kernel with a set of native programs mainly provided by busybox, the LIPC subsystem for inter-process communication, and the Java and Webkit subsystems for user interface (UI) and services.

Figure 1: Kindle Touch architecture.

The LIPC is a D-Bus-based IPC library and its environment that links all Kindle components together. A Kindle process can use this library to start apps, expose application properties/settings, listen for or emit events. For example, a Webkit application, written in HTML and Javascript, can use the LIPC to interact with a Java service or a native application.

Most of the UI is written in Java. The Java subsystem (the framework) provides LIPC handlers for both services and the UI (so-called Booklets). For example, the Kindle home UI window is the com.lab126.booklet.home booklet managed by the framework.

The Webkit subsystem (HTML5 and Javascript) is another way to create UI elements. The built-in experimental browser is a part of the Webkit subsystem. The pillow is a library that allows access to the LIPC from Javascript.

Who parses e-books?

The latest version (5.13.4) of the Kindle e-reader firmware is publicly available for download on the official Amazon website. The source code is also partially available there. But the source code did not help in our research because it mainly consists of third-party open-source projects, including the Linux kernel, with small Amazon tweaks. There is no source code for the components responsible for parsing and rendering e-books.

Our first goal was to discover a vulnerability in the e-book parsing framework. For this we have enough files from the firmware and there is no need for a real Kindle device.
Let’s look at the components responsible for handling e-books.

The /mnt/us/documents is the regular e-books’ directory, when you download a new book on your Kindle device. Who is going to handle the file first?
The /usr/bin/scanner service periodically scans the document directory for new files and, depending on the file extension, uses one of the “extractor” libraries to extract metadata from the e-book. All extractors are listed in the /var/local/appreg.db sqlite database. There is a handler for each of the supported Kindle e-book formats:

File format Extractor
kfx /usr/lib/ccat/libyjextractorE.so
azw1, tpz /usr/lib/ccat/libtopazE.so
pdf /usr/lib/ccat/libpdfE.so
azw3 /usr/lib/ccat/libmobi8extractorE.so
azw, mbp, mobi, prc /usr/lib/ccat/libEBridge.so

If the scanner does not match the file extension or a parsing error occurs, the e-book is not shown to the user.
We did not go deep into the scanning process because extracting metadata is too simple an operation to suggest parsing errors.

After the scanner does its job, a thumbnail of the new book is displayed on the home screen. From this moment on, the Java framework is responsible for opening the book when you click on it. Java archive (JAR) files that implement the logic for opening and rendering e-books can be found in the /opt/amazon/ebook/lib firmware directory. Primarily, these are MobiReader-impl.jar, YJReader-impl.jar, PDFReader-impl.jar, HTMLReader-impl.jar and TopazReader-impl.jar files.
For further research, we decided to focus our attention on the PDF file format, as it’s one of the most common, and yet at the same time, complex formats.

Let’s take a look at the implementation of the PDF book opening function in the PDFReader-impl.jar (com.amazon.ebook.booklet.pdfreader.impl.PDFModel class):

As you can see, this function is only a wrapper over the nativeOpenPDFDocument native function with the body in the /usr/java/lib/libPDFClientJNI.so library.

The nativeOpenPDFDocument function starts the PDF server /usr/bin/pdfreader, forking the process, and synchronously sends it an “openBook” message via the open source HTTP client/server library /usr/lib/libsoup-2.4.so. In fact, it sends a GET request to https://127.0.0.1:7667/command/openBook.

The pdfreader server is the main target of our research. Eventually, we will run our payload in the context of this process.
At startup, the pdfreader server lowers itself to the permissions of the “framework” user (uid 9000) with a setuid call. Then it launches a soup server listening on port 7667, defining dozens of handlers for high-level PDF operations, including the “openBook” and “startRendering” ones that we are interested in.
The /usr/lib/libFoxitWrapper.so library, written by Amazon, provides an API for working with PDF files. The pdfreader uses this library in its soup handlers. For example, the “openBook” handler looks like this:

Note the following significant functions of the libFoxitWrapper.so library:

  • openPDFDocumentFromLibrary(char *file, char* password, uint32_t* handle) – Opens the PDF document.
  • getCurrentPage(uint32_t handle, uint32_t page, uint32_t flag) – Parses the PDF page to internal structures.
  • renderPageFromLibrary(uint32_t handle, uint32_t page, uint32_t width, uint32_t height, float scale, uint8_t landscape, uint8_t* out) – Renders the PDF page converting it to an image. When called, the stream filters begin to be parsed.

These functions are good entry points for fuzzing a PDF tree structure.

As the name implies, libFoxitWrapper.so is a wrapper for a popular Foxit PDF SDK presented on Kindle devices by the /usr/lib/libfpdfemb.so library. The libfpdfemb.so is a closed-source library proprietary to Foxit Software Inc. The Foxit Embedded PDF SDK manual can be found on the Internet.

Fuzzing PDF filters

We tried to fuzz the mentioned functions from the libFoxitWrapper.so library, but this approach did not bring any result, except for a set of null pointer exceptions. A more promising approach to the PDF format is to choose one specific object or stream filter as the target for the test. So, we decided to fuzz the libfpdfemb.so library.

But first, let’s take a look at the classic fuzzing model.
The easiest way to fuzz any closed-source library is to write an executable file that loads the library into memory and calls the target functions. This loader takes a file with permuted data as a command line parameter, reads it in, and passes the data to the function under test. Next, the loader is instrumented or run on an emulator to collect the code coverage matrix for each test case. One of the third-party fuzzers/permutors is used to generate new test cases based on the coverage matrix.
To fuzz the libfpdfemb.so library, we chose a combination of American Fuzzy Lop (AFL) and Quick emulator (Qemu). The host machine is Ubuntu.


Figure 2: The fuzzing scheme.

We need to note one more thing. A Kindle device is based on an ARM processor. Therefore, our loader was compiled using arm-linux-gnueabi-g++. The Qemu easily emulates ARM on x86.

A simple search for the words “CPDF” and “Codec” in the libfpdfemb.so library allowed us to find all the possible stream filters/codecs: Predictor, Decrypt, Flate, Fax, Lzw, AsciiHex, RunLen, Ascii85, Jpeg, Jbig2 and Jpx. Let’s take a look at one of them with an example.

Figure 3: Fragment of PDF page with jbig2 filter.

As you can see, an image Im1 with jbig2 filter is declared. Jbig2 is an image compression standard for bi-level images. The jbig2 encoder segments the input page into regions: text, halftone images, refinement, and others. These regions are held in the JBIG2Globals stream. When rendering a PDF page, libfpdfemb.so parses the JBIG2Globals stream and reconstructs the image.

The Jbig2Module object, defined in the libfpdfemb.so library, is responsible for decoding jbig2 compressed images.

Figure 4: Jbig2Module object.

Its StartDecode method is declared as follows:


Among other filters, we fuzzed the jbig2 decoding algorithm using the StartDecode function as the entry point and permuted the image size (width and height arguments), the image stream (src_buf, src_size) and the JBIG2Globals stream (global_data, global_size). Below you can see the harness we used to invoke the StartDecode. The base variable is the address of the libfpdfemb.so library in memory.


As a result, we discovered a valuable heap overflow vulnerability in the JBIG2Globals decoding algorithm.

CVE-2021-30354. Heap overflow

Let’s take a look at the following JBIG2Globals stream:


Figure 5: Malformed JBIG2Globals stream.

Two page regions are defined here:

  • The image information region (first 0x23 bytes). The image width is 0x80, the height is 1 and the stride is 0x10. The stride is calculated as ((width + 31) >> 5) << 2.
  • The “refinement” region (from 0x23 to 0x4D bytes). This region contains jbig2 encoded information to refine the image. As only a part of the image can be refined, it also contains the coordinates of the refining rectangle. In our case, the provided rectangle parameters are: width – 0, height – 0x10, x – 0, y – 0x40000000.

This is a malformed stream. An oversized rectangle is defined in the refinement region.
What happens in this case? The algorithm tries to expand the base image to the new dimensions. The height of the new image is recalculated as height + y, and (height + y) * stride heap memory is allocated for the resized image. But there is a mistake in the expanding function that leads to a heap overflow: a missed check for INT_MAX when calculating the size in memory of the new image. The 32-bit register overflows, and 0x100 bytes is allocated for the image instead of 0x400000100.


Figure 6: The expand function.

This means that by using refinement regions, we can “refine” the data outside of the image, and get the arbitrary write primitive. In the following example, the second refinement region overwrites 0x10 (stride) bytes at an offset 0x1234 * 0x10 bytes from the beginning of the image in the heap. The data blob (0x71 to 0x79 bytes) is decompressed by the jbig2 algorithm and then XORed with the heap content.


Figure 7: Controlled heap overflow.

We can create any number of refinement regions and overwrite parts of memory that are at a distance from each another. In addition, the fact that the writing is done through a XOR operation allows us to fix only specific bits of memory, but not whole words, and bypass ASLR protection if required.

As mentioned previously, the libfpdfemb.so library is part of the pdfreader process. The data and heap segments of this process are read/write/execute. ASLR is built into the Linux kernel and is controlled by the parameter /proc/sys/kernel/randomize_va_space. Its default value on Kindle devices is 1, which means the base address of the data segment is located immediately after the end of the executable code segment. In other words, there is no randomization for the data segment and the heap. These two facts make exploiting the discovered jbig2 vulnerability trivial.

CVE-2021-30355. Improper Privilege Management

We now have RCE vulnerability in the context of the pdfreader process. A user downloads the PDF book to his Kindle device. When the book is opened, a malicious payload is launched.

The pdfreader process has the framework user rights: uid=9000(framework) gid=150(javausers) groups=150(javausers). It can send LIPC messages, access special internal files, but it is still limited. We want to be a root to reset all restrictions.
So, the second stage of the research is to find an LPE vulnerability that allows the framework user to run a code under the root user.

First, we jailbroke one of our Kindles because it is not enough just to have files from the firmware to search the logical LPE. We need to see running processes and opened ports, and to be able to debug Kindle services.

A software jailbreak for some versions of Kindle firmware can be found on the Internet. But the most general way is to jailbreak through the serial port. Although this requires disassembling the device, this is what we did.


Figure 8: Jailbreak the Kindle via the serial port.

We got a jailbroken device, and then analyzed the services that have root rights, as well as the resources they access. Eventually, we found a logical error, or more accurately, an improper privilege management, in one of the Kindle services. Great, there is no need to fuzz the device drivers.

The framework user has full access to /var/tmp/framework directory, where he can create any executable file. Actually, this is the user’s working directory. For example, we can create a bash script file payload.sh that logs user privileges:

The framework user has read/write access to the /var/local/appreg.db sqlite database that is essentially an application registry. This means that we can fix a database entry using the /usr/lib/libsqlite3.so library or by simply editing the file. We want to patch one of the “command” entries in the properties table.


Figure 9: properties table in appreg.db.

For example, we can patch the entry com.lab126.browser: set the value field to /var/tmp/framework/payload.sh instead of /usr/bin/mesquite. The following SQL request does the work:

The framework can request the application manager, represented by the appmgrd service, to start an arbitrary application. We can send an LIPC message to open the browser app using the /usr/lib/liblipc.so library. This shell command does the same:

The application manager is responsible for launching built-in apps. To do this, it listens for the appropriate LIPC events. To start the browser app, it reads the entry com.lab126.browser from the appreg.db, and executes the command specified in the value field. As we patched this database entry, our payload.sh script is launched.

The appmgrd service has root rights. The “root: uid=0(root) gid=0(root)” string is logged by the payload.sh.

The described LPE vulnerability can be easily exploited from the pdfreader process that we owned. The libsqlite3.so and liblipc.so libraries are already loaded into the process memory. By combining the two discovered vulnerabilities, any malicious payload can be run as root.

Conclusion

We demonstrated how an e-book can function as malware. As the malware code is executed with root user rights, just opening such a book could have led to irreparable damage. The attacker could have deleted your e-books, potentially gain full access to your Amazon account, could have converted your Kindle to a bot, attacked other devices in your local network, and more.

The described vulnerabilities were reported to Amazon in February 2021 and fixed in the 5.13.5 version of Kindle’s firmware in April 2021.