EmbeddedRelated.com
Forums
Memfault Beyond the Launch

8052 emulator in C

Started by joolzg May 24, 2011
In reply to "Anders.Montonen@kapsi.spam.stop.fi.invalid" who wrote the 
following:

> joolzg <joolzg@btinternet.com> wrote: > > Anybody got a simple 8052 emulator in C source, im trying to reverse > > engineer some code and would like to emulate/simulate the code to get > > better understanding as it looks like it was written in C and compiled > > by a very bad compiler > > There's the Daniel's s51 simulator[1] which is used in the SDCC[2] > debugger. > > -a > > [1] <http://mazsola.iit.uni-miskolc.hu/~drdani/embedded/ucsim/ > [2] <http://sdcc.sourceforge.net/
thanks but its in delphi, pascal so would have to learn pascal again to do my mods joolz -- --------------------------------- --- -- - Posted with NewsLeecher v5.0 Beta 6 Web @ http://www.newsleecher.com/?usenet ------------------- ----- ---- -- -
On Tue, 24 May 2011 13:46:31 -0600, hamilton <hamilton@nothere.com>
wrote:

>On 5/24/2011 1:30 PM, D Yuniskis wrote: >> Hi Hamilton, >> >> On 5/24/2011 12:23 PM, hamilton wrote: >>> On 5/24/2011 2:11 AM, joolzg wrote: >>>> Anybody got a simple 8052 emulator in C source, im trying to reverse >>>> engineer >>>> some code and would like to emulate/simulate the code to get a better >>>> understanding as it looks like it was written in C and compiled by a >>>> very bad compiler >>> >>> You don't what a emulator, you want a de-compiler or reverse compiler. >>> >>> An emulator will just execute the binary code as the real hardware would. >>> >>> Using the binary to get the C back is impossible !!!! >> >> Actually, for some simple-minded compilers, you can often reverse >> engineer the code to get much of the "C" source (neglecting >> variable names, some expressions, etc.). This is especially >> true of old/early compilers that didn't do much optimization. > >For years I have heard that story. > >I have always asked to show me any links with the compiler in question, >So I will ask if you have any links to this "simple compiler" ? > >I took a compiler class 30 years ago, and my professor at the time >stated that it was not possible. >With the better compiler available today it would be even more impossible.
I guess "simple compiler" refers to some 1970's compilers for PDP-11, Intel Intellecs and Motorola Exorcisers. Writing compilers for these platforms was problematic due to the 64 KiB address space limit. Overlay loading helped a lot (each compilation phase in a separately loaded overlay branch), but you still had to reserve space for the symbol table, that had to be kept constantly in memory. Overlay loading with floppies was also very slow, thus, much optimization could not be done. For this reason, getting assembly output from a compiler was not the standard situation. I once wrote an object code disassembler for PDP-11. Compared to ordinary disassemblers, the object code disassembler can also display the global symbols defined in this module as well as displaying any external function names (including library function names) in plain text. I analyzed quite a few object codes generated by Fortran, Pascal and C compilers and I was capable of detecting by "organic matching" how each compiler will generate code. After this, it was quite easy to reverse engineering some algorithms. These days with good compilers, it is much harder to reverse engineering things based on purely the executable code.
joolzg <joolzg@btinternet.com> wrote:
> In reply to "Anders.Montonen@kapsi.spam.stop.fi.invalid" who wrote the > following: >> joolzg <joolzg@btinternet.com> wrote: >> > Anybody got a simple 8052 emulator in C source, im trying to reverse >> > engineer some code and would like to emulate/simulate the code to get >> > better understanding as it looks like it was written in C and compiled >> > by a very bad compiler >> There's the Daniel's s51 simulator[1] which is used in the SDCC[2] >> debugger. >> [1] <http://mazsola.iit.uni-miskolc.hu/~drdani/embedded/ucsim/ >> [2] <http://sdcc.sourceforge.net/ > thanks but its in delphi, pascal so would have to learn pascal again to do my > mods
The DOS version was written in Pascal, the Unix version is written in C++ as you would have noticed if you'd downloaded the source code. -a
In message <4ddcbba5$0$1206$c3e8da3$4e334b76@news.astraweb.com>, joolzg
<joolzg@btinternet.com> writes
>In reply to "hamilton" who wrote the following: > >> On 5/24/2011 2:11 AM, joolzg wrote: >> > Anybody got a simple 8052 emulator in C source, im trying to reverse >> > engineer >> > some code and would like to emulate/simulate the code to get a better >> > understanding as it looks like it was written in C and compiled by a very >> > bad >> > compiler >> > >> > joolz >> > >> > >> > >> You don't what a emulator, you want a de-compiler or reverse compiler. >> >> An emulator will just execute the binary code as the real hardware would. >> >> Using the binary to get the C back is impossible !!!! >> >> Except for very simple programs. >> >> Even if you have the compiler sources and understood the compile >> process, you still would not be able to get the binary -> C conversion >> to work. >> >> But, have fun and good luck. >> >> hamilton > >Ive got that already, i want to SIMULATE THE CODE and give the code >real inputs >so i can validate my findings > >I will be rewriting it for another cpu as well so want to find out as much > >joolz
Use the Keil simulator -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
In message <irhc0q$p5d$1@speranza.aioe.org>, Anders.Montonen@kapsi.spam.
stop.fi.invalid writes
>joolzg <joolzg@btinternet.com> wrote: >> Anybody got a simple 8052 emulator in C source, im trying to reverse >> engineer some code and would like to emulate/simulate the code to get a >> better understanding as it looks like it was written in C and compiled >> by a very bad compiler > >There's the Daniel's s51 simulator[1] which is used in the SDCC[2] >debugger.
I doubt it will work. -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
In message <4ddcbb4c$0$1469$c3e8da3$fb483528@news.astraweb.com>, joolzg
<joolzg@btinternet.com> writes
>In reply to "Chris H" who wrote the following: > >> In message <4ddb6833$0$1509$c3e8da3$efbdef2c@news.astraweb.com>, joolzg >> <joolzg@btinternet.com> writes >> > Anybody got a simple 8052 emulator in C source, im trying to reverse >> > engineer >> > some code and would like to emulate/simulate the code to get a better >> > understanding as it looks like it was written in C and compiled by a very >> > bad >> > compiler >> >> What is the target MCU? The 51 family is huge (over 600 variants) and >> whilst the cores are similar there are some big differences. >> >Analog Devices ADuC84x
This is NOT a true 8051/52 core. Read the documentation it is "based on" an 8052. Not all they 8051 simulators will handle the non standard 8051 parts like this one.
>> Why do you want the source of the simulator? >> >So i can add in a serial driver, also the output display, you know make the >simulator behave like the real thing with inputs and outputs
Then use the Keil Simulator that can do this already.
>> How do you know the binary was written in C? >I can tell from the way the code is written!! cant you tell the differnece >between human and machine created code
Yes... However you can not tell which HLL was used.
>> How big is the binary? >64k but not all used
>> What is it supposed to do? >cant say
Use the Keil Sumulator. -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

joolzg wrote:

> In reply to "Chris H" who wrote the following: > > > In message <4ddb6833$0$1509$c3e8da3$efbdef2c@news.astraweb.com>, joolzg > > <joolzg@btinternet.com> writes > > > Anybody got a simple 8052 emulator in C source, im trying to reverse > > > engineer > > > some code and would like to emulate/simulate the code to get a better > > > understanding as it looks like it was written in C and compiled by a very > > > bad > > > compiler > > > > What is the target MCU? The 51 family is huge (over 600 variants) and > > whilst the cores are similar there are some big differences. > > > Analog Devices ADuC84x > > > Why do you want the source of the simulator? > > > So i can add in a serial driver, also the output display, you know make the > simulator behave like the real thing with inputs and outputs > > > > How do you know the binary was written in C? > > > I can tell from the way the code is written!! cant you tell the differnece > between human and machine created code > > > How big is the binary? > > > 64k but not all used > > > What is it supposed to do? > > > cant say
You are going to a lot of work to reverse engineer an application. Why is this needed? w..
Chris H <chris@phaedsys.org> wrote:
> In message <irhc0q$p5d$1@speranza.aioe.org>, Anders.Montonen@kapsi.spam. > stop.fi.invalid writes >>joolzg <joolzg@btinternet.com> wrote: >>> Anybody got a simple 8052 emulator in C source, im trying to reverse >>> engineer some code and would like to emulate/simulate the code to get a >>> better understanding as it looks like it was written in C and compiled >>> by a very bad compiler >>There's the Daniel's s51 simulator[1] which is used in the SDCC[2] >>debugger. > I doubt it will work.
Of course you do. -a
On 24/05/11 10:11, joolzg wrote:
> Anybody got a simple 8052 emulator in C source, im trying to reverse engineer > some code and would like to emulate/simulate the code to get a better > understanding as it looks like it was written in C and compiled by a very bad > compiler > > joolz >
It shouldn't be too hard to write a simulator yourself for a processor like this. It's quite an effort if you want it to be fast, or to accurately simulate interrupts and peripherals, but the core itself is easy - you have an array to hold "ram", and array for "flash", a struct holding the registers, and a huge switch statement interpreting each instruction.
Hi Hamilton,

On 5/24/2011 12:46 PM, hamilton wrote:
>>> Using the binary to get the C back is impossible !!!! >> >> Actually, for some simple-minded compilers, you can often reverse >> engineer the code to get much of the "C" source (neglecting >> variable names, some expressions, etc.). This is especially >> true of old/early compilers that didn't do much optimization. > > For years I have heard that story. > > I have always asked to show me any links with the compiler in question, > So I will ask if you have any links to this "simple compiler" ? > > I took a compiler class 30 years ago, and my professor at the time > stated that it was not possible.
<grin> It's relatively easy to disprove a negative. :> I'll drag out some examples and post them here. I think you;ll see that most of these early compilers were pretty "straightforward" in the way they emitted code. You could look at stanzas and deduce from what they were created (of course, you couldn't tell "a == b" from "b == a" -- though sometimes you could distinguish "a > b" from "b < a"!). I remember thinking about "peephole optimizers" and wondering how they could be effective ("Shirley the compiler knows what code it *just* emitted? Why would it ever do something as inane as 'STORE X; LOAD X'?"). But, if you saw how stanzas were "pasted" together, you could see lots of opportunities for this kind of micro-optimization! Perhaps Walter can shed some light on what his products were doing in the mid 80's and how they've progressed (along with *why*)?
> With the better compiler available today it would be even more impossible.
A lot depends on the code being compiled, the level of optimization used, the optimizations *available* and the actual target itself. E.g., older "single register" machines required lots of shuffling to get arguments into an accumulator where they could be operated on. Also, older devices didn't have niceties like "MUL" or (gasp!) "DIV". So, the repertoire of "helper functions" gave you lots of insight into what the code was actually doing. And, those helpers didn't have "short-circuits" where the compiler could do a "partial" operation, etc.
>> I was able to recreate C source for a client's libraries from >> binaries using this approach. Though it required a fair bit of >> "organic computing" to recognize the "patterns" in the code >> (a decompiler wasn't available). Of course, familiarity with >> the product (application) goes a long way -- especially when >> it comes to annotating the sources! > > Being familiar with the code is the only way to get back the C code.
I disagree. You can get back code that will recompile into the same binary. You can further embelish that with some ideas as to what the code is *likely* doing. As far as the ultimate application... <shrug> If you have the compiler (and binary libraries) available, you have a huge headstart. You can feed it test cases to see what the code looks like for various C constructs. You can see which helper functions get dragged in and, thus, start giving those real "names". If you have the hardware available (or at least the memory map), you have known starting points for the code -- instead of picking a spot "at random". Chances are, it uses some part of the standard libraries. These are relatively easy to recognize. So, you can put names on their entry points and back-annotate all references to them as they are encountered. It's trivial to identify the strings in most applications (though some might go to some lengths to protect or hide them -- but that is rare and starts competing with the compiler since *it* has a notion of what constitutes a "string"). So, library functions that use strings (e.g., printf et al.) can be identified. Also, strings often give you information about the data *referenced* there -- "%d records processed.\n"). Finally, most older processors used in embedded systems were small. Few systems could afford gobs of (EP)ROM for multimegabyte images. Likewise, tens of KB of RAM was a lot. It's not like trying to reverse engineer MSBloatware...
> But the OP seems to have no knowledge of the application. > > I have lost sources in disk crashes and have had to re-create the C > sources by watching the operation of the application. > > reverse-engineering is always easier when you have a good idea of what > is suppose to happen.
Sure. But it isn't a necessary prerequisite. There are (big name) firms whose businesses are based on reverse engineering other people's products -- e.g., to make something "compatible" with a closed system. In the process, one can often find obvious "mistakes" or opportunities for improvement that the original designers overlooked. One of my first jobs was at a firm that designed marine navigation equipment (among other things). I recall the "excitement" when a Japanese firm expressed an interest in one of our RADAR sets. I think they purchased 25 of them "for evaluation". Some time later, *they* produced a similar product. It was very obvious that it was "heavily inspired" (avoiding the term "copied") by our set. My boss grumbled at the lost business and having been "suckered". In the next breath, he pointed out how the "competing design" had lots of little changes that were incredibly obvious after-the-fact... but, that had been omitted in our design! E.g., the antenna (rotor) emitted rotational pulses to tell the display which way it was pointed. This allowed the sweep in the display to be synchronized (angularly) to the antenna's position. Of course, this was done by mounting an optointerrupter and encoder wheel (slotted disc) on the antenna's shaft. I think the encoder had perhaps 1 degree azimuth resolution -- or something like that. It was relatively costly to manufacture the disc since it was done photographically, etc. The competing product had a crude disc with perhaps 9 (!) slots cut in it. It looked like something that a child would fashion out of cardboard. *But*, the disc was mounted on the high side of the reducing gearbox that drove the antenna shaft. So, it rotated 40 times faster than the antenna! (i.e., same sort of information coming from the antenna but much lower manufacturing costs). Without seeing "our design" with that modification made to it, I doubt it ever would have occurred to anyone! <:-(

Memfault Beyond the Launch