EmbeddedRelated.com
Forums

8052 emulator in C

Started by joolzg May 24, 2011
Hi Anders,

On 5/24/2011 3:28 PM, Anders.Montonen@kapsi.spam.stop.fi.invalid wrote:
> hamilton<hamilton@nothere.com> wrote: >> I took a compiler class 30 years ago, and my professor at the time >> stated that it was not possible. >> With the better compiler available today it would be even more impossible. > > Have you looked at eg. Hex-Rays? From what I've seen of it, it's pretty > good at what it does.
Yes. It has knowledge of how various compilers generate code and uses that "backwards" to deduce what the code should have been, based on the binary.
On 5/25/2011 2:52 AM, upsidedown@downunder.com wrote:
>>>> You don't what a emulator, you want a de-compiler or reverse compiler. >>>> >>>> An emulator will just execute the binary code as the real hardware would. >>>> >>>> Using the binary to get the C back is impossible !!!! >>> >>> Actually, for some simple-minded compilers, you can often reverse >>> engineer the code to get much of the "C" source (neglecting >>> variable names, some expressions, etc.). This is especially >>> true of old/early compilers that didn't do much optimization. >> >> For years I have heard that story. >> >> I have always asked to show me any links with the compiler in question, >> So I will ask if you have any links to this "simple compiler" ? >> >> I took a compiler class 30 years ago, and my professor at the time >> stated that it was not possible. >> With the better compiler available today it would be even more impossible. > > I guess "simple compiler" refers to some 1970's compilers for PDP-11, > Intel Intellecs and Motorola Exorcisers.
In my case, these were PC based tools. The only compilers I had access to on the MDS were silly things like PL/M (which, actually, was *an* improvement)
> Writing compilers for these platforms was problematic due to the 64 > KiB address space limit. Overlay loading helped a lot (each > compilation phase in a separately loaded overlay branch), but you > still had to reserve space for the symbol table, that had to be kept > constantly in memory. Overlay loading with floppies was also very > slow, thus, much optimization could not be done. For this reason, > getting assembly output from a compiler was not the standard > situation.
I think the problem from that timeframe (mid 80's, in my case) was a combination of things: - targets were pretty crippled. They really weren't designed with HLL's in mind (with the exception of the bigger 16/32 bit machines). E.g., support for stack frames was tedious at best. And, even then, limited (e.g., "index registers" with +- 128 byte offsets) - there were *lots* of different processor *families*. 6800/3/5 6809, 68HC11, 8080/85/Z80/Z180, Z8000, 68000, 9900, 99000, 1802/5, 6502/816, 2650, 8x300, etc. With no single market leader. A "compiler vendor" was almost forced to try to address *all* of these targets to increase the chance for a sale. So, you ended up with a core compiler and varying backends. - the PC was becoming a viable development platform (previously, we used CP/M boxes or vendor supported "development systems"). So, you had lots of folks putting forth products to try to sell to that "development system". Almost all were "command line" driven tools, no IDE's, etc. - users were anxious to get their hands on *anything* that could expedite development. ASM was just *painfully* slow for bigger projects. - resources were starting to become affordable. The $50 2KB EPROMs were a thing of the past. And, you could actually think of putting more than a few *hundred* bytes of RAM into a system!
> I once wrote an object code disassembler for PDP-11. Compared to > ordinary disassemblers, the object code disassembler can also display > the global symbols defined in this module as well as displaying any > external function names (including library function names) in plain > text.
Cool! From this, you could probably port to a 68K disassembler with little trouble. Or even a 32K.
> I analyzed quite a few object codes generated by Fortran, Pascal and C > compilers and I was capable of detecting by "organic matching" how > each compiler will generate code. After this, it was quite easy to > reverse engineering some algorithms.
Exactly. The compiler tends to do the same thing, the same way. It's hard for "hand-written" code to achieve that same level of discipline.
> These days with good compilers, it is much harder to reverse > engineering things based on purely the executable code.
Have you looked at some of the code compiled for PICs?
Hi David,

On 5/25/2011 11:14 AM, David Brown wrote:
> On 24/05/11 10:11, joolzg wrote: >> Anybody got a simple 8052 emulator in C source, im trying to reverse >> engineer >> some code and would like to emulate/simulate the code to get a better >> understanding as it looks like it was written in C and compiled by a >> very bad compiler > > It shouldn't be too hard to write a simulator yourself for a processor > like this. It's quite an effort if you want it to be fast, or to > accurately simulate interrupts and peripherals, but the core itself is > easy - you have an array to hold "ram", and array for "flash", a struct > holding the registers, and a huge switch statement interpreting each > instruction.
You know, given the OP's apparent need to modify "whatever" to accommodate the peculiarities of his situation, this is probably the *best* (quickest to having something "useful") answer! Even IRQs and peripherals could be hacked in -- if you *don't* need it to be fast! it would be a great "school project" to show how a processor works and how to write "boilerplate" code (to implement the machine).
D Yuniskis wrote:
>I remember thinking about "peephole optimizers" and wondering how >they could be effective ("Shirley the compiler knows what code it >*just* emitted? Why would it ever do something as inane as >'STORE X; LOAD X'?"). But, if you saw how stanzas were "pasted" >together, you could see lots of opportunities for this kind >of micro-optimization!
My favorite example of obvious but missed optimization was a Fortran compiler for HP-1000 minis, that would always generate a return instruction at the end of a function even when the last statement was RETURN. (OK, for purists, the "return" instruction was an indirect jump through the first word of the function, lets call it JUMPBACK for now) so, FUNCTION xxx ... END would generate a bunch of instructions ending in ... JUMPBACK while this, FUNCTION xxx ... RETURN END would generate a bunch of instructions ending in ... JUMPBACK JUMPBACK To hamilton, Yes - For "simple minded" compilers you can reverse generate code that is very close to the original. This FORTRAN compiler was very predictable, but I doubt you can find a Hewlett-Packard RTE-II system running it. You may try Ron Cain's original Small-C compiler for the 8080. -- Roberto Waltman [ Please reply to the group. Return address is invalid ]
In message <irjg31$mee$1@speranza.aioe.org>, Anders.Montonen@kapsi.spam.
stop.fi.invalid writes
>Chris H <chris@phaedsys.org> wrote: >> In message <irhc0q$p5d$1@speranza.aioe.org>, Anders.Montonen@kapsi.spam. >> stop.fi.invalid writes >>>joolzg <joolzg@btinternet.com> wrote: >>>> Anybody got a simple 8052 emulator in C source, im trying to reverse >>>> engineer some code and would like to emulate/simulate the code to get a >>>> better understanding as it looks like it was written in C and compiled >>>> by a very bad compiler >>>There's the Daniel's s51 simulator[1] which is used in the SDCC[2] >>>debugger. >> I doubt it will work. > >Of course you do.
I know what the SDCC does and I know what the ADuC84* is. -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
On 25/05/11 21:16, D Yuniskis wrote:
> Hi David, > > On 5/25/2011 11:14 AM, David Brown wrote: >> On 24/05/11 10:11, joolzg wrote: >>> Anybody got a simple 8052 emulator in C source, im trying to reverse >>> engineer >>> some code and would like to emulate/simulate the code to get a better >>> understanding as it looks like it was written in C and compiled by a >>> very bad compiler >> >> It shouldn't be too hard to write a simulator yourself for a processor >> like this. It's quite an effort if you want it to be fast, or to >> accurately simulate interrupts and peripherals, but the core itself is >> easy - you have an array to hold "ram", and array for "flash", a struct >> holding the registers, and a huge switch statement interpreting each >> instruction. > > You know, given the OP's apparent need to modify "whatever" to > accommodate the peculiarities of his situation, this is probably > the *best* (quickest to having something "useful") answer! > > Even IRQs and peripherals could be hacked in -- if you *don't* > need it to be fast! > > it would be a great "school project" to show how a processor works > and how to write "boilerplate" code (to implement the machine).
Actually, fast run-time is not /that/ hard either - you just have to separate the "interpret" phase from the "run" phase, and have the "interpret" phase generate C code equivalent to the disassembly (i.e., when you see an instruction "add a, b", you write out the line "regs.a += regs.b". The joy is getting the flag registers right, and perhaps the timing, and including some way for interrupts to jump in. In the end, you have a humongous C function with a few lines per original disassembly line. Run that through your host compiler, and your 8052 runs at a few hundred MIPS.
In message <1ZGdndkNyORi8UDQnZ2dnUVZ7rOdnZ2d@lyse.net>, David Brown
<david.brown@removethis.hesbynett.no> writes
>On 25/05/11 21:16, D Yuniskis wrote: >> Hi David, >> >> On 5/25/2011 11:14 AM, David Brown wrote: >>> On 24/05/11 10:11, joolzg wrote: >>>> Anybody got a simple 8052 emulator in C source, im trying to reverse >>>> engineer >>>> some code and would like to emulate/simulate the code to get a better >>>> understanding as it looks like it was written in C and compiled by a >>>> very bad compiler >>> >>> It shouldn't be too hard to write a simulator yourself for a processor >>> like this. It's quite an effort if you want it to be fast, or to >>> accurately simulate interrupts and peripherals, but the core itself is >>> easy - you have an array to hold "ram", and array for "flash", a struct >>> holding the registers, and a huge switch statement interpreting each >>> instruction. >> >> You know, given the OP's apparent need to modify "whatever" to >> accommodate the peculiarities of his situation, this is probably >> the *best* (quickest to having something "useful") answer! >> >> Even IRQs and peripherals could be hacked in -- if you *don't* >> need it to be fast! >> >> it would be a great "school project" to show how a processor works >> and how to write "boilerplate" code (to implement the machine). > >Actually, fast run-time is not /that/ hard either - you just have to >separate the "interpret" phase from the "run" phase, and have the >"interpret" phase generate C code equivalent to the disassembly (i.e., >when you see an instruction "add a, b", you write out the line "regs.a >+= regs.b". The joy is getting the flag registers right, and perhaps >the timing, and including some way for interrupts to jump in. In the >end, you have a humongous C function with a few lines per original >disassembly line. Run that through your host compiler, and your 8052 >runs at a few hundred MIPS.
AFAIR the ADuC84* is not a standard 8052 core and neither are the peripherals or memory map. -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Hi David,

On 5/25/2011 1:56 PM, David Brown wrote:

>>> It shouldn't be too hard to write a simulator yourself for a processor >>> like this. It's quite an effort if you want it to be fast, or to >>> accurately simulate interrupts and peripherals, but the core itself is >>> easy - you have an array to hold "ram", and array for "flash", a struct >>> holding the registers, and a huge switch statement interpreting each >>> instruction. >> >> You know, given the OP's apparent need to modify "whatever" to >> accommodate the peculiarities of his situation, this is probably >> the *best* (quickest to having something "useful") answer! >> >> Even IRQs and peripherals could be hacked in -- if you *don't* >> need it to be fast! >> >> it would be a great "school project" to show how a processor works >> and how to write "boilerplate" code (to implement the machine). > > Actually, fast run-time is not /that/ hard either - you just have to > separate the "interpret" phase from the "run" phase, and have the > "interpret" phase generate C code equivalent to the disassembly (i.e., > when you see an instruction "add a, b", you write out the line "regs.a > += regs.b". The joy is getting the flag registers right, and perhaps the > timing, and including some way for interrupts to jump in. In the end, > you have a humongous C function with a few lines per original > disassembly line. Run that through your host compiler, and your 8052 > runs at a few hundred MIPS.
I was thinking of this more from the standpoint of debugging code in an IDE... "step, step, run-to-here, examine registers, etc.". I.e., if the OP doesn't know what the code *does*, I would think he would want to watch it work and *see* what it is doing instead of just hoping to catch some "results" in a "console window", etc. (i.e., how would you know what to feed it if you can't see what it is expecting).
On 5/25/2011 3:09 PM, Chris H wrote:
> > AFAIR the ADuC84* is not a standard 8052 core and neither are the > peripherals or memory map. >
I understand extra peripheral registers added by a vendor to there 8052 core, but the base core registers should be the same, no? hamilton
On Tue, 24 May 2011 13:46:31 -0600, hamilton <hamilton@nothere.com>
wrote:

>On 5/24/2011 1:30 PM, D Yuniskis wrote: >> >> Actually, for some simple-minded compilers, you can often reverse >> engineer the code to get much of the "C" source (neglecting >> variable names, some expressions, etc.). This is especially >> true of old/early compilers that didn't do much optimization. > >For years I have heard that story. > >I have always asked to show me any links with the compiler in question, >So I will ask if you have any links to this "simple compiler" ? > >I took a compiler class 30 years ago, and my professor at the time >stated that it was not possible. >With the better compiler available today it would be even more impossible.
Either your professor was mistaken or you misunderstood. The point of decompiling is not to recover the original code, but rather simply to get something that's easier to work with than an assembler listing. You can reverse engineer the output of any compiler. However, the best you can achieve is /equivalent/ source to that which was originally compiled. Only for very simple programs can anything looking like the original source be recovered.
>>> Even if you have the compiler sources and understood the compile >>> process, you still would not be able to get the binary -> C conversion >>> to work.
Compilers generate code from templates - each construct in the source language is mapped to one or more sequences of assembler that are strung together to realize the construct. Once you learn the patterns generated by your compiler, many source language constructs are easily identifiable in /unoptimized/ assembly. Certain optimizations make things a lot harder. Inline expansion, loop and function fusion and others can modify the code so that patterns and boundaries visible in the original source no longer are recognizable. However, the ordering of data dependencies has to be maintained (otherwise the optimizer is broken), so by identifying these dependencies and constructing chains of dependent operations, you can always work backwards to a sequence of source language code that will produce equivalent results. This is how decompilers work. There are a number of them available for various languages. George