Hacker News new | comments | show | ask | jobs | submit login
Reverse-Engineering WebAssembly [pdf] (pnfsoftware.com)
76 points by ingve 5 days ago | hide | past | web | favorite | 23 comments

WebAssembly is actually simple to work with.

If you want to obtain a "C" pseudocode, you can give a wasm file to wasm2c [1].

You can re-obtain a WebAssembly folded-expression text format using wasm2wat [1].

You can obtain a call-graph from a WebAssembly module by generating the wat representation using wasm2wat and pasting it into main.wat on https://webassembly.studio/ (-> Empty Wat Project). Then save and build; right click the new main.wasm and select "Generate Call Graph."

That said, check out this encrypted and anonymous "pastebin" I built [2] with the crypto being written in Rust and bindings generated using wasm-bindgen [3]. It surprisingly hard to debug when optimized using wasm-opt [4].

[1] Part of WebAssembly Binary Toolkit: https://github.com/WebAssembly/wabt

[2] Source code on Github: https://github.com/psychonautwiki/impis/blob/master/core/src... — Demo paste: https://imp.is/n/7NFsfEiCjkFBVgC6A4JS6GyqN7puN5Sg7ed11m8VrtT...

[3] https://github.com/rustwasm/wasm-bindgen

[4] Part of Binaryen: https://github.com/WebAssembly/binaryen

Re WebAssembly.Studio, you can also just drag your .wat, .wasm file in without creating a project.

WebAssembly is not "simple to work with", especially when it comes to analyzing non-trivial, large, optimized programs. The tool [1] generates a one-by-one equivalence of wasm instructions to C code. I guess you could qualify that as a "decompiler", but real decompilers - the ones used for malware analysis such as JEB or IDA - are optimizing decompilers that provide an output of higher level (eg more legible) than the input disassembly/binary.

The future looks like everyone's going to use his fav language to compile to WebAssembly

Am I the only one who feels like it's the end of the web as we knew it in the 90s and 00s, where you could open any web page, understand how it works and learn from it ?

I think that disassembling WebAssembly is easier than trying to make sense of a highly minified/obfuscated JavaScript bundle. The time of easily readable web code has been dead for quite a while now I would say.

I don't think it's that different from minimized and sometimes intentionally obfuscated JS. In fact I think it's been getting better recently as there are tools developed to debug that kind of code - sourcemaps etc.

If it means the end of JavaScript, I'm all in. Let's get back to having real application languages for applications, and markup languages for text.

I hope that we'll come around to the idea that this two-decades long fascination with abusing the hell out of web technology was a fever dream, and go on to build something better on more substantial foundations.

I'm probably going to be disappointed...

I don't see Javascript going away anytime soon, tbh. We've already reached the point where it's the first language for a mind-boggling number of developers, it absolutely dominates the frontend frameworks and it's been steadily becoming more popular on the server side.

> where you could open any web page, understand how it works and learn from it ?

Web browsers will/already show wasm disassembly when opened in the browser tools. A file can contain label metadata which makes it very readable.

Yep it is the plugins' revenge, the only way to be competitive against native apps it for the browsers to become yet another general purpose VM.

I think that time is long gone, as so much of the web today is simply bespoke remote UIs for black box databases rater than static documents.

I thought wasm was going to have a human-readable equivalent, called wast. See: https://webassembly.org/getting-started/advanced-tools/

My understanding (maybe wrong) was that this was going to be available in the browser.

It's only human-readable like disassembling a binary into assembly code. https://webassembly.github.io/spec/core/text/index.html Whether you can make heads or tails of the code in that format depends on how friendly the compiler was that produced the binary.

I'm not an expert, but my understanding is that WASM has two formats: a text-based format called WAT, and a binary format called WASM.

In order to run the code in the browser, the code will have to be compiled to the binary format.

So where WAT comes in is your methods for producing WASM files now become one of the following:

Source in <otherlang> -> WAT -> WASM

Source in <otherlang> -> WASM


So the human-readable WAT can either be used as a compile target for another language, which can easily be compiled into WASM, or you can write the WAT manually and compile it. Alternatively other languages might be able to compile directly to the binary format, skipping WAT representation entirely.

Generally, WAT is produced from the binary format. Compilers don't go through WAT; they produce the binary output directly.

The translation between WAT and the binary format is lossless, so there's no advantage of producing WAT as an intermediate step.

Since WAT -> WASM is already easy to do, compiling <otherlanguage> to WAT makes it really easy for people to create their own abstractions for writing WebAssembly in nearly _any_ other programming language, not just those that can compile directly to the binary format.

I don't understand why that would be true.

It's also just as easy to get WASM from WAT as it is WAT from WASM. I don't know of any languages that compile to WAT and then compile to WASM; as far as I know 100% of languages compile directly to WASM.

I think it's quite the opposite in terms of languages. Javascript will have the best WASM interop story for quite a while.

As WASM gets adopted we'll see it get used in all sorts of places outside the browser. Many projects need a high-level scripting language and JavaScript will be the obvious choice.

Well, someone is bound to make disassemblers/decompilers/debuggers for us to dissect the innards of the VM. As a side note, I am half expecting the announcement of a web assembly ISA any day.

It is just a matter of someone bothering to create a FPGA for it. :)

It has been explored and it's been decided that it's better to JIT on a classical CPU. Obviously it's still very early so we'll see, but I redirected my excitement to JITs and kernel mode / ring 0 execution.

WASM instructions are fairly straightforward so an obfuscator can be written quite easily. I could easily create a proxy tool that introduces randomization/non-determinism on a per download basis if it were worth it. There is no execution of arbitrary memory so there are limits. JS can create new WASM mods and link them at runtime, but invocations across import/export might have a performance hit. But moving around functions, subdividing functions, etc is really easy.

Also, the paper has Emscripten-specific reverse engineering details (such as locations in the mem for where stack starts vs where heap starts) that don't apply to many other WASM compilers.

Radare2 can also disassemble, analyze, assemble and even decompile wasm via r2dec. The support for wasm has been added in March 2017.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact