Emulator MetaLanguage

Towards a faster MAME


What's the big idea?

Emulators are hard to write. Repetitive, error-prone and basically pretty boring. The excitement of the goal drives people to write them, but man is it hard work!

Writing a fast emulator is even more challenging. And repetitive. Gone is

if ((opcode & 0xC0) == 0xC0)		regptr = &regs.A;
in favor of a bigass jump table and a different routine for every opcode.

Now suppose that instead of emulating a CPU, you want to recompile a whole pile of CPU code at once - a ROM image, for example. You want maximum speed. Or, you want excellent speed but a controlled memory footprint, so you want to implement a Just-In-Time compiler with dynamic compilation of "hotspots".

Now suppose you want to do that, for 20 different CPUs.

Machines are our slaves. Let's write a program to do this.

And you want to do this ... why?

MAME is a great program - some of the nicest code I've seen in a long time ... and I see a lot of code! Furthermore, it does something excellently cool - it emulates old arcade games. Now, it's not the fastest emulator, because the MAME team value other things than speed. The quality of the code is a part, but above all else is the idea that MAME serves to document these old machines, not just emulate them. Emulation is then a neat side-effect. But, within these parameters, the faster, the better ...

No, that's not the reason at all. But it's a brilliant post-hoc rationalization. The real reason is that I thought it would be really cool.

So how does it work?

We define the CPU using a meta-programming language, or meta-language. The language in this instance is called EML - the Emulator MetaLanguage. This language is then read by a compiler which can produce a variety of things based on the description in the EML file.

The first pass would be simple and relatively non-threatening. We write a compiler which produces an emulator written in C. This can be verified against working emulators. With automatic loop unrolling and a few other tricks I have up my sleeve, there should be a 10%-20% speed gain over existing emulators.

The second pass would be more spine-tingling. We write a compile which produces an emulator written in ASM. The benefit of ASM is that you call the shots with register allocation and (in particular) with the function call mechanism. Function calls can be made as simple as a JSR in ASM, while in C there is endless baggage. It would be possible to reserve registers so even the register mine that is the x86 would be able to keep important data onboard. An ASM emulator should achieve a 50%-100% gain over a portable hand-written C emulator.

The third pass is dangling by your teeth from a helicopter a mile above a volcano, with an atomic bomb strapped to your groin. We write a program which emulates the CPU, but monitors itself doing so. When it encounters a common piece of code, it compiles it there and then, and starts using that code. It caches these just-in-time (JIT) compiled code fragments and discards old fragments to keep the cache size reasonable. I expect the gains from this to be anything from 500% to 1000%.

Each pass seems reasonable, given the existence of the pass before.

Do you want to know the really wacky thing? The emulated code itself is a program in a language. So when we write a CPU emulator we are writing a program which writes a program to compile a program into native code. Don't think about that for too long!

Incidentally, don't get scared by all this talk of meta-programming. Every time you write a macro in C, or a template in C++, you're meta-programming. It's the same stuff, except you can pass some really wild things to functions - like a variable name, rather than its contents or its address.

What is EML?

It looks like C. It has some features from C++. It has many brand-new features. It is bad-ass strict about casting. It allows you to access individual bits in a number very easily.

The basic types are uN and sN, where N is any number greater than zero. So in EML you can manipulate a 1024-bit value as easily as an 8-bit value. There are masked arrays (which automatically wrap).

The definitive statement of EML currently is the sample code for the 6809 emulator

This project is on hold.


[Home] [TinyTed]