Minix Port
Home ] New Stuff ] [ Minix Port ] Magic-2? ] Overview ] Photo Gallery ] Construction ] Technical Info ] My Other Projects ] Links ]

 

Minix Port Diary

1/04/09

Well, it's been quite a while since I gave a Minix update.  First, the biggest change is that I finally got around to rewriting the boot loader such that I no longer have to boot Minix from files loaded into my old Magic-1 Monitor/OS.  The new boot loader is quite a bit more flexible (it has a shell with which you can set environment variables that can be read by Minix during the boot process), and most importantly it understands my variant of the Minix FS - and will read the kernel from the target file system.  I've still got some cleanup to do, but I'm pretty close to completely retiring my the original OS.  I'll keep the ability to boot it up - just for old times sake.

Other than that, there's just been a lot of performance tuning, and the addition of some new applications (such as an IRC client, Tom Pittman's Tiny Basic, the Small C Compiler, and probably some others I've forgotten about).  I also reworked the directory structure a bit - in particular putting the games at /usr/local/bin/games.

I'll probably archive this page shortly, and put any updates on the "New Stuff" page.  Other than completion of the demand paging implementation (which I may, or may not, ever get around to), the Magic-1 Minix port is pretty much done.  [Oh, and I also plan on posting all of the source code.  I just need to review the various licenses of the code I munged to make sure I'm not stepping on anyone's toes].

6/10/08

The web server is running at an acceptable speed now (though I'd still like it to be a bit quicker).  There were a number of performance issues.  First, until I get the new VM rewrite completed, fork/exec will continue to be painfully slow.  The standard way of handling httpd servers under Minix is to use a small utility (tcpd) which will look for incoming packets at particular ports, and then fork/exec a handler to deal with them.  The fork of tcpd is a little painful, but the exec of httpd (reading the image from disk) cost a couple of seconds per session.  Fortunately, the author of the code included a daemon loop which allows httpd to stay alive and simply fork copies of itself for each new session.  Enabling that mode helped a bit.

Second, I moved the www root into a ram disk.

Third, I eliminated a reverse host lookup on incoming ip addresses.  Useful for logs and debugging, but not worth the delay for Magic-1.

The last  was the costliest problem.  The code which reads the incoming requests does so a single charactor at a time, while setting up timeout handlers in between each char.   Because request headers are pretty long now and reads end up threading through several layers of the OS, this caused a multiple-second delay per request.  I'm going to think about alternate ways of handling this, but for now I'm just halting the parse as soon as I see it's an HTTP_GET & URI.

Now I need to rework Magic-1's web site.  The old uIP one was nice in that it delivered all sorts of real-time stats about the stack and Magic-1.  I'll want the same under Minix.  Shouldn't be too hard - I've already added some cgi programs to do a "ps -ef" and 'finger".

6/9/08

Solved the issue with the FTP (and other) hangs.  Turns out it was a configuration problem.  My previous setup forced a MTU of 576 bytes for the SLIP connection (on the NSLU2).  The Minix networking code, however, would send bigger datagrams, and they would just get dropped before making it out onto the network.  I reconfigured the NSLU2 and all is well.  FTP works, as does the web server.  However, the web server is incredibly slow - so slow that I think there must be some sort of timeout happening.  It takes 10 or so seconds to serve a single small page.  Need to investiage this further.

6/7/2008

Just passed one of the last remaining hurdles for the Minix port - I've (mostly) gotten the full Minix TCP/IP stack working.  Minix network support exists in the INET server, and it's a somewhat large (relatively) and complex piece of code.  When I first attempted to build it last year, the code size was much bigger than would fit in a Magic-1's 64Kbyte code spaces.  At the time I figured that I'd have to support code overlays to get it working.

However, while getting sc to work I did another round of code generation improvements.  Just for kicks, I decided to built INET again to see if it benefited from the improved code generation.  As it turns out, it didn't - but this time I looked at the INET code a bit closer and realized that a big chuck of it was unnecessary.  Magic-1 doesn't have native ethernet interfaces - but uses one of the serial ports with SLIP.  I spent a couple of hours slicing out the ethernet-specific code in INET and found that it was within about 6K bytes of fitting.    So, I dug a little deeper and found some functionality I could live without.  Specifically, Minix's INET has some moderately capable routing support - which I would never use.  So, I sliced that out and INET fit with room to spare.

Amazingly, it also worked on first attempt (at least partially).  It took a few iterations to get it configured properly to use the correct nameservers, but most everything works.  I can ping out, create multiple incoming telnet sessions, finger and talk.  Ftp is a bit odd - I can ftp into Magic-1, but downloading via ftp doesn't work.  Also, I still get an ocassional telnet session hang.  Not sure what's going on there - I probably sliced out a bit too much code somewhere.

As expected, the performance is noticably slower than with Adam Dunkels' uIP TCP/IP stack that I was using.  Also, the cannonical Minix web server is way too heavyweight and slow for Magic-1.  I'll probably graft uIP's httpd with a low-level of the Minix stack to get Magic-1 back in the web server business.  Meanwhile, though, it's looking good.  It's fun to have more comprehensive network capabilities.  And, at least for the time being, I've opened up some (probably unsafe) ports.  You can see who is currently online by doing a:

finger @magic-1.org

from a Linux machine (don't know if there is a finger for Windows).  Talk also works internally on my local network, but rejects conversions from the outside world.  If I find a few minutes, I'll change the code to open it up to everyone.  Anyway, try it out.  Previously, telnet sessions were supported using a nifty gadget called a "device server".  When you telnetted into Magic-1, you were really telnetting into the device server, which was hooked up to Magic-1's console serial port.  Now, when you telnet in, you're coming in on the SLIP line.  The console port will be attached the a real VT100 terminal that I picked up from eBay a while back.  I should have the web page back online within the next wek or so.

5/26/2008

Getting side-tracked on hobby projects is part of the fun, I guess.  I'd intended to work on a native compiler tool chain for Magic-1, but instead spent part of Saturday porting the old "sc" spreadsheet.  This is a text-mode (curses) spreadsheet originally written by James Gosling of Java fame, but heavily modified since.  There is a current version you can install on Linux these days, but it is much too big to fit on Magic-1.  To get it working, I had to go all the way back to a set of source files (version 2.1) posted to the Usenix group net.sources back in 1987.  It was written in an ancient C dialect and took a bit of fixing up to work with my ANSII compliant lcc C compiler.  I also fixed a minor bug along the way and made it work directly with the curses library instead of being hard-coded for a VT100 terminal.

However, it works great - and it amuses me to no end to have a working spreadsheet on Magic-1.   It's very slow, but it works.  Telnet in and try "sc".  I found a tutorial file from a later version that is somewhat useful.  Try:

sc /usr/lib/sc/tutorial.sc

The arrow keys will move you around.  "q" quits.  Type "=" to enter a value or function.  It only knows about three functions, @sum(a0:a5), @prod(a0:a5), @avg(a0:a5)  [replace "a0:a5" with whatever cell range you want].

I also ran cross a nice version of yahtzee and ported it.  Try "yahtzee".

Anyway, lots to do at work and around the house, so I'm not sure when I'll get back to the toolchain work.  We'll see.

 

5/19/2008

It's been awhile since an update - mostly because I haven't done too much with the project lately.  Over the weekend, though, I played a bit with the old Small-C Compiler.  This was an early compiler for a simple subset of C which was popularized through Dr. Dobb's Journal back in the early 80's.  I remembered it while reading about Steve Chamberlin's ongoing homebrew project, and thought it might be a good match for his machine.  Just for kicks, I retargeted it for Magic-1.  It generates pretty awful code, but I can see how to significantly improve it.   That little effort has kick-started my interest in Magic-1.  One of the missing pieces for Magic-1 right now is that it doesn't have a native tool chain.  I have to build all of its software in a cross-development environment under Linux and then copy it over.

The problem is that my C compiler, LCC, requires far more memory that Magic-1's 64K code/64K data model allows.  Anyway, the Small-C compiler is now running natively on Magic-1 as "scc".  It generates assembly code, and though it's a simple subset of C it is written in that subset and can compile itself.  Log onto Magic-1 and try it out.  There's an example file in the guest home directory:

scc -t hello_world.c

...and look at the result code in hello_world.s

Of course, scc is only part of what is needed for a native toolchain.  I previously wrote an assembler and linker - but they also require too much memory to run natively.    I've decided to write a new macro assembler for Magic-1 that will run natively, as well as a new linker.  My current assembler is a quick-and-dirty affair - single pass with backpatching and no macro facility.  The new one will be a two-pass assembler with support for nested include files and macros. I know how to write a macro assembler, so that shouldn't be a problem.  The more challenging thing will be to write a native linker which can link targets that will use up all available code and data space.   The actual symbol resolution won't be the hard part - managing the memory in a limited system will be the challenge.  It's an interesting challenge though - hope to get to it soon.

1/15/2008

I've been doing quite a bit of fiddling with the project lately - but not much on the Minix port.  The main thing I've been doing is replacing the Ubuntu Linux box I'd been using as a SLIP bridge for Magic-1 with a very slick little gadget: a Linksys NSLU2.  The NSLU2, or "slug", is sold as a cheap ($80) device to convert USB hard drives into network storage.

By itself, that isn't particularly interesting, but a few years ago my brother Jim Buzbee (and a few other folks) discovered a back door into the device and were eventually able to login to the device (which runs a streamlined version of Linux) and take control.  From there, a large and active developer community arose and now there are multiple distribution of Linux that can be flashed onto the tiny box.  You can turn it into a file server, media server - just about anything.

What I wanted to do was to get rid of an aging PC that I pretty much only used to give Magic-1 an internet connection via SLIP.  It took a bit of fiddling, but it works.  I reflashed my "slug" with the OpenWRT distribution, and am using a USB to RS232 serial adapter to connect to Magic-1.  Via port forwarding, HTTP requests sent to my home DSL line (magic-1.org) eventually wind up at the slug, which forwards them on to Magic-1 via the serial port.

In case anyone else needs to do something similar - be aware that most of the stripped-down Linux distributions for the slug were compiled with SLIP support excluded.  I had to build my own kernel, as well as my own slattach.  Along the way, I also ported kermit to the box - and best of all, am now using it as the cvs server for all Magic-1 source code.

Oh, and I forgot to mention.  All of the slug software is stored in the internal flash of the device (and I've got a 1GB USB thumb drive for cvs storage).  The little box is fanless, totally quiet and draws a trivial amount of power.  It comes with a standard wall wart for power, but I hacked up a cable to run it directly off of Magic-1's power supply.

Here's the place to go to learn more about the NSLU2, as well as some of the other devices that have been enhanced via creative hacking:  http://www.nslu2-linux.org.  There is also an active Yahoo group: http://tech.groups.yahoo.com/group/nslu2-linux

Finally, here links to some of the articles my brother Jim wrote about slug-hacking:

http://www.nslu2-linux.org/wiki/Main/Articles

In other Magic-1 news, I finally got around to adding SLIP compressed header support to my uIP TCP/IP stack.  I used an early version of Van Jacobson's code as a base.  It took me a while to get it working, because I missed one of the places in the code that assumed an int was 32-bits wide.    Anyway, it works well now and I believe I it's noticeably faster.

12/8/2007

Logic analyzers are wonderful things.  Before starting this project, I'd never known how fantastically useful they can be to assist software debugging.  Over the last several days, I've been tracking down a very nasty bug.   Thanks to my logic analyzer, I just found it.  Here's a picture of the setup:

The bug turned to be exceedingly subtle and nasty - a microcode error in the ENTER instruction that only shows up in a very unusual situation.  ENTER is Magic-1's frame creation instruction.  On entry to a procedure, ENTER will allocate an activation record on the stack by subtracting the size of the activation record from stack pointer (SP), and then push the previous SP.   The bug showed up under the following conditions:

bulletWhen the frame size is added to SP, the resulting address is odd and points to the last byte of a page.
bulletThe page pointed to is present, and writeable.
bulletThe previous page (i.e. - the "next" page when a stack grows down) is *not* present.

What happens is when ENTER attempts to push the previous SP on the frame, the store of the first byte of the PSP succeeds and SP is decremented and updated.  Next, when ENTER tries to write the 2nd byte it traps with a Page Not Present page fault.  The bug is that SP has already been updated following the write of the first byte.  So, after the page fault handler completes, ENTER will be re-executed - but with a bad SP value.  From that point on, the stack is corrupted.

What made the problem so very nasty  is that this situation is not only very rare, but the effect of the bug won't be felt until the called procedure tries to return.  At that point control will fly off into the weeds.  The only program that I have that would hit it occasionally is the web server.  And, I couldn't easily reproduce it - all I knew for sure what that every now and then, the web server would hang.  Luckily for me, though, the web server would tend to hang by falling into a tight infinite loop, and it stayed alive enough that I could probe its state with an interrupt handler.

Here's how the logic analyzer helped.  Among the capabilities of a logic analyzer is recording state traces.  You connect probes to the signals you care about, as well as a clock.  You can then set up conditions to tell the logic analyzer when to start and stop recording the state of those signals when the clock fires.  What I did was attach state probes to the address bus, data bus, RW signal, interrupt enable signal and instruction init signal.  The logic analyzer was then set to record all state and stop whenever the web server started executing in the "weeds".  This would give me a trace of the instruction and data flow just before control flow went wacky.

I could then examine the state trace backwards and try to figure out how we got off into the weeds.  It took five or six traces for me to get enough info (and the web server would sometime run for hours before the bug was triggered).  One problem  was that several hundred thousand clock cycles separated the bad ENTER and the eventual flying off into the weeds and my ancient logic analyzer can store no more than 1024 states.  Fortunately, you can set up the logic analyzer to only store certain states.  Each failure trace gave me a little more info, and I was eventually able to find the smoking gun.

This bug has been in Magic-1's microcode for years.  Without a logic analyzer I doubt I ever would have found it.

Tomorrow, I'll burn new microcode EPROMs and put Magic-1 back online.  I still haven't found the cause of the occasional RS232 framing errors.  I'll look at that next.  After that - completion of the Minix demand paging implementation.

Oh, and here's another picture of the logic analyzer setup:

12/2/2007

I spent a little time yesterday looking at the serial port issues.  First off, I cobbled together an RS232 loop-back plug.  This is a device that plugs into a serial port and connects the transmit and receive lines - so that anything transmitted goes right back in.    Then, I wrote some simple programs to pump data through the serial port and then receive it again and test that it was all properly received.  This flushed out two software errors on my part.  First, I wasn't correctly configuring the Minix serial ports, causing some handshaking errors.  Next, I found that I was getting some data overrun about once every 200K chars sent/received.  The latter problem turned out to be an interrupt latency issue.  I had previously convinced myself that Magic-1 could respond to a hardware-handshake signal within the time it took to transmit 2 chars at 19,200 BAUD.  Occasionally, though, it apparently cannot.  By widening the response window from 2 chars at 19,200 to 8 chars that problem went away.  I'd still like to know where that long-latency path is.  When I get a chance, I should be able to find out via my logic analyzer.  Meanwhile, though, I'll just keep the window at 8 chars.

Unfortunately, I still haven't tracked down the sporadic framing error problem.  I've eliminated a few of the easy possibilities, and currently tend to believe it may be a noise issue.  I added some extra bypass capacitors to the device card, but that didn't help.  A Google search turned up one schematic with a note that suggested that the Max233 chip I'm using to generate the RS232 signals is very susceptible to noise and recommended using pull-up resistors.  Perhaps I'll try that.

In the afternoon, though, I got a little sidetracked.  My uIP-based web server has been hanging quite a lot lately and I decided to try to track the problem down.  To give me some more info about where it was hanging, I added a bit of code to cause the web server to generate a stack trace upon receipt of a SIGUSR1 signal.  This worked well, but showed me that whatever the problem is, it causes the web server to fly off into the weeds with a corrupted stack.  So, I've got a memory corruption problem, but don't know where.  I think I see a way to use the logic analyzer to help debug this.  Fortunately, the hang consistently sends control to a code region that just happens to cause a tight loop.  I can set up the logic analyzer to record a program counter trace and stop tracing when control flies into the bad region.

The logic analyzer, however, takes up a lot of space - so I'll have to take over the kitchen table for this debugging.  Monica and the girls, though, have the table pretty much locked up with Christmas decorations and craft projects.   Further debugging of this problem will have to wait for a week or two.  Meanwhile, I went ahead and added a very ugly kludge to the web server to keep it running.  A timer goes off every 15 seconds and examines the program counter.  It if happens to be in the bad region, it assumes that the web server is hung and resets it.  Ugly, but that should keep things running better until I can debug further.

Finally, last night I got an email from another homebrew CPU builder.  Very nice project - check out:

www.stevechamberlin.com/cpu

11/22/2007

Time to continue with the demand paging work - before I forget where I am.  I have been doing a little fiddling with Magic-1 since VCF.  Just before the show, I got ahold of an old VT-100 terminal.  I'd hoped to use it at VCF, but it was acting a little flakey.  It turns out that the main logic board was defective.  I replaced it and added the advanced graphics option board and it works great now.

There was, however, a bit of a problem.  The VT-100 does not understand hardware handshaking.  Flow control was done using the XON-XOFF protocol, which I hadn't implemented in my rewrite of the Minix RS232 driver (I rewrote the driver to enable use of the 16550's FIFO mode).   I expected adding XON-XOFF flow control to be trivial, but I could not get it to work.  I finally borrowed an ancient HP Serial Data Analyzer from work and eventually tracked the problem down to a long-standing Minix bug.  The Minix TTY/RS232 code that examines the termios structure to look for the flag that enables XON/XOFF flow control was looking at the wrong field.  It was likely just a typo - the code was looking for the IXON flag in the "c_lflag" field rather than the "c_iflag" field.  Once that was fixed, the old VT-100 was up and running all the way to 19200 BAUD.

That led me to my current problem.  While I was messing around with the serial ports, I decided to look a little closer at the throughput of my web server (which uses SLIP to connect to the internet via Magic-1's auxiliary serial port).  I've noticed some strange behavior.  It is sometimes much slower than I think it should be, and it also get quite a few packet checksum errors.  To make a long story shorter, it turns out that I am getting a significant number of framing errors on the auxiliary serial port.   I'm pretty confident that my sending and receiving baud rates within spec, and I've also noticed that the framing errors seem to only show up at the beginning of a packet transmission.  I ran several hours of a ping flood without getting any framing errors.  However, with sporadic TCP/IP packets I can usually get one every couple of minutes.

There are quite a lot of things I can do the track this down further.  I don't yet know whether I have the same issue on serial port 0, or even if the framing errors are showing up on the sending side (i.e. - the Linux box on the other end of the SLIP connection).  Some things to look at:

bulletis there a cable problem?   Try with a different cable.
bulletWhat magic incantation do I need to get Linux to report framing errors on its end of the SLIP connection?
bulletSwap serial ports 0 and 1, and see if the problem show up using port 0.
bulletThe UARTs are clocked by a 1.8432 Mhz oscillator located on the device card.  Perhaps there is too much noise on the card.  Rewire the oscillator to minimize crosstalk (diagonal wire, perhaps even go twisted pair with a grounded wire).
bulletDouble-check wiring, particularly the signal ground.
bulletBreak out the oscilloscope and look for noise.
bulletBorrow Ken's more recent serial data analyzer - perhaps it will tell me something.

Once I get this problem tracked down, I'll revising demand paging.  And after that works, I think I'll be done with the project.

10/30/2007

I think I've run out of time.  Demand paging is partially working, but looking ahead at the week it's clear that I won't have enough time to fully debug it by Saturday morning.  I've decided to revert to an pre-paging version of Minix for VCF.  I'll need what time I can find between now and Saturday to clean the system up and get it ready for the show.  My plan is to have it running with a pair of terminals to show off the multi-user aspect of the Minix port.  I've got a VT100, but unfortunately it is acting flakey at the moment.  I'll probably end up using a borrowed Data General Dasher terminal and Ken Sumrall's old HP terminal.

I did some testing this evening with the Data General Dasher.  I don't have the termcap entries quite right, but it mostly works.  It's kind of fun to see timesharing in action.  I was running the old "worms" curses program that causes some text-mode worms (or snakes) to randomly wiggle across the screen.  With two sessions running at once (one on the DG and one via Kermit on my laptop), you clearly see the worms alternately move from one session to the next.  And, or course, lots of blinky lights.

10/28/2007

I probably only need a few evenings to complete the demand paging implementation, but I'm not sure I'm going to find the time between now and this weekend's VCF.   I may need to roll back to a previous version of my Minix port, as the current state is very slow (now that I've ripped out the 2nd-level block cache and eliminated read-ahead).  I'll see what I can be done by Wednesday and make the call then.  For the time being, I'll continue to keep the machine off-line.

10/22/2007

Another busy weekend, but I was able to find a few hours to work on the demand paging implementation.  Mostly I worked in FS, the File System server.  I've removed the 2nd-level disk block cache, as well as the code to opportunistically request read-ahead.  This slows the system down even further, but I expect to make it up (and more) during the next stage when I implement a much more capable page caching mechanism.  My intent is to replace the explicit disk reads in FS with the equivalent of mmap().  Essentially, when the file system wants to read a disk block on behalf of a user read() system call, it will simply register a mapping between that block and a portion of its virtual address space.  Then, we can just copy from that virtual space to the user's buffer.  On the first access, we'll get a page fault, which will be handled by the paging mechanism to either read the block in from disk, or just update FS's page table with a previuosly cached page containing that block.

So far, I've dummied up the old get_block() to send a message to the PAGER to fault in a page rather than send a message to the block device driver to read the page.  Next up - adding a hash table lookup to MAPPER to store device/block pairs to physical pages.

 

10/15/2007

OK - things are looking better now.  The problem I was having in getting the page fault control flow working turned out to be a spurious use of one of the special instructions I added to Magic-1 (that now look pretty useless).  When designing Magic-1's instruction set architecture, I had Minix in mind.  One of my concerns was the cost of message passing, so I created a couple of instructions that would quickly copy between kernel memory and user memory.  The way they worked is that either the source or destination pointer would be translated using the current user-mode page table whiel the other address would use the kernel page table.

It seemed like a good idea at the time, but the problem is that I can't really do any cross-process copy in kernel mode without first verifying that the source & target pages are present and have the correct access permissions.  Otherwise, we take a page fault - which is generally a bad thing to do in kernel mode.

I think I'm going to have to replace all uses of TOSYS and FROMSYS with the much more expensive vir2vir_copy() calls.  The latter will validate the source and/or target locations as needed before attempting the copy and will pre-emptively handle potential page faults.

Anyway, it now looks like the basic mechanism is in place.  Normal user-mode page faults go through the trap handling system, which will block the faulting process and send a message to the PAGER task to service the fault.  For all system calls done on behalf of a user process, the kernel tasks will always perform a "validate_pagerange()" call prior to reading or writing user memory.  If a potential page fault is discovered, validate_pagerange() will attempt to fix the problem by passing a similar pagefault message to the PAGER task.  Note that the calling kernel task will be blocked until the fault is handled.

10/14/2007

Each October, the town I live in (Half Moon Bay, CA) holds something called the "Pumpkin Festival".  It has become somewhat popular, and the result is that so many people come to town folks who live here generally just try to stay at home to avoid the traffic.  This weekend was Pumpkin Festival, and staying at home is mostly what I did.

As a result, I was able to spend some time on the project.  I worked on the control plumbing for handling page faults, and part of it seems to work well.  If a user-mode process takes a page fault, the fault handler constructs a message for the PAGER task and removes the faulting process from the ready list. The PAGER task will eventually run, handle the page fault and then make the user process runable.

I tested this by allocating memory as usual, but turning off the page present and page writable bits in the process's page table.  The user process would fault on first access, and PAGER would "handle" the fault by simple turning the present and writeable bits back on.  It all worked just fine.

The part that doesn't work is taking a similar action whenever another kernel task needs to fault in pages on behalf of the user process for which it is performing a system call.   My (probably faulty) understanding of the message passing mechanism is likely the problem here.  For things to work as I envisioned, some kernel tasks would need to send a message to PAGER and then block awaiting a reply.  I thought a simple send(PAGER,&msg), recieve(PAGER,&msg) pair would do the trick, but Magic-1 tends to fly off into the weeds after a few of these.  To simplify matters, i tried doing a similar send/recieve from the SYSTEM task to PAGER with a null request - and it failed as well.   I think I need to review the message passing mechanism in a little more detail.  

10/9/2007

The file system has been rebuilt, and Magic-1 is back on-line.  I've re-enabled the guest account:

user: guest

pasword: magic

Telnet to magic-1.org and check it out.  Lots of applications to run at /bin, /usr/bin and /usr/local/bin

10/8/2007

OK - found a bit of time over the weekend and rolled my Minix port to using a file system with 2K blocks (rather than the stock 1K block size).   Now I'm in the process of re-bootstrapping my disk image - which is a bit of a pain.  I still don't have a very good way of moving files directly between my Linux development environment and Magic-1's Minix file system, so for the moment I'm still using a simple utility on the old Magic-1 Monitor/OS which knows a little about the Minix FS.  Anyway, it will probably be a day or two before it's completely rebuild and I can put the machine back online.

I'm hoping the next couple of weeks won't be quite as busy as the last two have been.  I'd like to complete the demand paging implementation in time for VCF 10.

9/30/2007

Still very busy on the home/work front, so not much progress.  I did, however, decide to go ahead and do a more complete implementation of mmap() and munmap().  I had originally planned on just doing enough to support demand paging of application code & data - which would not have requried me to keep track of modified pages or write them back to the disk file.   It is considerably more complex to do read/write mmap() support, but I find it an interesting topic.  One of the main differences is that I'll have to retain much more complete data on which pages are mapped by which processes.  I'm also planning on integrating a much larger disk block cache mechanism into PAGER.  Minix currently has a small disk caching mechanism in FS, but Magic-1's 16-bit address space make it not especially useful.  What I have in mind will open up all unused RAM pages for disk cache.

The other planned change is to rework the file system to use 2K blocks, rather than stock Minix 1K blocks.  Making the disk block size the same as my memory page size will make life much simpler.  I could have gotten away with a size mismatch when I was just going to support execution demand paging, but full mmap() support will be much simplified when page size and block size are the same.  The unfortunate aspect of this is that I will have to completely re-bootstrap my file system.

I expect to continue to have little to no free time over the next week, but things should ease up a bit after that.   I hope to get all of these changes done by the end of October so I can demonstrate Magic-1 running a fully multi-user, multi-tasking, demand-paged Minix 2.0 at VCF 10 (which will be held Nov. 3-4 at the Computer History Museum in Mountain View, CA.

9/22/2007

Given that I'm probably going to have to put further implementation on the back burner for the next week or so, I've decided to do a brain dump of my current thoughts for the implementation of copy-on-write and demand paging.  It will evolve as my thoughts evolve.

Here it is: Minix Demand Paging Design

9/21/2007

Time for another update: extremely busy at work (and home) lately, so I haven't done much coding on the demand paging enhancement (but have been doing a lot of thinking...).  A week or so ago I did go ahead and eliminate phys_copy() from the kernel.  In its place I now have four memory copying routines: vir2vir_copy(), vir2phys_copy(), phys2vir_copy() and phys2phys_copy().  Additionally, the old kernel routines that converted from virtual to physical addresses, umap() and numap() have been replaced with validate_pages().  This new routine is intended to verify that a virtual address range is resident before a kernel task attempts to access it.  It will force page faults prior, if applicable, and is generally called from within the above copy routines which take virtual address sources or targets.

I'm generally pleased with the new copy routines.  They actually speed things up a bit, but more than that make the code more explicit about what is going on (as well support non-contiguous physical <=> virtual mappings).

I expect the paying job to keep me pretty busy for the next couple of weeks, so I probably won't be making too much progress on fleshing out demand paging.  Meanwhile, if anyone would like to check out the current state of the port I've enabled an open guest account.  Telnet to www.magic-1.org, and use:

user: guest

password: magic

Note that there isn't a native C compiler, so you won't be able to build new applications.  To see what applications are available, check out /usr/bin and /usr/local/bin.    Do keep in mind that it's a very slow system by today's standards (though I hope to make it seem much more responsive when I get demand paging working).

Over the last several years I've gotten quite a few emails from folks who say that'd really like to build a system like Magic-1, but can't because they don't know how.  I'm afraid they're missing the point.  If I had known how to do it, I wouldn't have bothered.  It was interesting to me *because* I didn't know how to do it.  With that in mind, I'll have to say that adding demand paging to Minix is an especially interesting task.  And, I think I'm getting a pretty good handle on how I'm going to do it.  More on that later.

9/10/2007

The next stage of the implementation of demand paging is going to mostly be thinking about it.  I've looked through the source code a bit, and did a simple implementation of the new fmap() system call (which, given a file descriptor and a memory range, returns the device and list of raw blocks that represent the range).  I expect that at the end of the day, I will end up adding a very small amount of code to get this working.  Most of my hobby project time over the next few days or weeks will involve thinking through what that small amount of code should be.

The first problem is how best to break Minix's underlying assumption that the physical memory that is associated with virtual memory spaces is contiguous.  A key kernel routine is phys_copy( phys_addr, phys_addr, length), which copies data from one place in physical memory to another.  The problem here is that this routine is incorrectly used throughout the kernel to copy virtual memory ranges to either physical memory or another virtual memory range. 

For example, if a process wants to write a buffer full of data to a disk, it will call the write() system call.  That system call will eventually need to copy that data to kernel memory and then copy push it through the disk interface hardware.    When the process makes the write() system call, it passes it's notion of the buffer's address - which in reality is a virtual address specific to that process.  The kernel will then convert that virtual address to a physical address, which it then passes to phys_copy() to copy the buffer's contents to kernel memory in order to process it.

Here's the problem in a nutshell: In a paged memory architecture like Magic-1, a process's memory consists of a series of physical memory pages along with some address translation hardware to convert the process's virtual addresses to the actual physical addresses of the underlying physical memory.  The actual addresses of the physical pages need have no ordering relationship relative to the virtual pages.  So, address 0x17ff in virtual space might map to physical memory address 0x337ff in physical memory and the following byte in virtual space, 0x1800 could map to physical memory 0x10800.  Minix's phys_copy() routine is fine if you are really wanting to copy physical memory, but it should not be used to copy virtual memory ranges which might possibly cross a page boundary.  It works only because existing Minix implementions require contiguous virtual memory ranges to be associated with contiguous physical memory ranges.

There are a couple of approaches I could use, and I'm not sure yet which one I'll use.  First, I could preserve the way Minix uses phys_copy() by redefining what a 'phys_addr' is.  In all existing Minix implementations (at least that I know of), a phys_addr is defined as an unsigned long, or some other integer-type that is sufficiently large to address all possible physical memory.  I could, I suppose, redefine it for Magic-1 to be a struct/union which could either be a true physical address, or a [process/virtual address] pair.  Phys_copy would then be changed such that it would redo the virtual<=>physical address mapping any time either source or target pointer crossed a page boundary.  That would likely allow the least perturbation of the source code, but it feels a bit ugly to me because it continues to use phys_copy() in a way I think is wrong.

The other possibility is to add a few more xxx_copy() routines which are explicit about what they are trying to do: virtual_to_virtual_copy, virtual_to_physical_copy, physical_to_virtual_copy, etc.  This would involve a bit more work, but I think it would also make the code a little more understandable by explicitly making clear the intent of the action.

Anyway, some things to think about.

9/9/2007

Spent a few hours on the project today in between dropping the kids off at various activities.   A good chunk of the infrastructure for demand paging is now in place.  I've added a MAPPER task, which runs in user space like FS and MM, but is given kernel task priority.  I've put three system call placeholders: mmap, munmap and fmap.  The latter system call lives in the FS server and given a file descriptor, starting offset and length will return a vector of block addresses on the drive deive.  I've also implemented that system call, which simply amounted to adding a loop wrapper around an existing routine which would take a file pointer and offest and return the physical block number that it represents.

This continues to look like the addition of at least my minimal demand paging on Minix wil turn out to be almost trivial.  The ony substantive issue (at least that I know of now) is working out he details of having the system task directly invoke the appropriate block device driver to service the fault (using the mapping info ).  This would be easy if I enforced a busy-wait mechanims for fetcing disk blocks,  II also need to spend a little time thinkging abut the implecations (and likely necessity) of maintaing reference counts on the disk lblocks.  Also, I've had the beginnings of an idea about how to all "unused" physical pages as a victim cache pool.

9/8/2007

I'm starting to think a bit about adding demand paging to my Minix port.  I'm still in the early, fuzzy thoughts stage - but find it useful to write down what I'm thinking.  So, most all of this is likely to be badly wrong, but here goes:

First, demand paging is generally used in situations in which you have a larger virtual address space then your physical memory.  This is not the situation for Magic-1.  I actually have plenty of physical memory (4MB), but tiny per-process virtual address spaces (64K code and 64K data).  So, my intent is not to use paging to support applications whose virtual address needs exceed physical memory, but rather to do "lazy" fetching of code and data from disk.  The primary goal is to improve the perceived responsiveness of the overall system by only fetching from disk the portions of applications which are needed.  (Note: my IDE hard drive interface is extremely slow - I haven't measured it recently, but I think that my top transfer speed is in the vicinity of 20K bytes per second).

A significant implication of this is that I can essentially ignore what is generally a big part of demand paging - swapping and page replacement mechanisms.  Once I page in some data, it can stay resident until the process dies.  I'll design things in such a way that swapping and page replacement could be added later - but it isn't something I'll ever likely need.  If it do it, it would just be as an exercise.

To implement demand paging, I'll need to add two new system calls: mmap() and munmap().  These syscalls associate a virtual address range with a file.  There are a lot of possible features and options that other Unix-like implementations have added to mmap(), but at least at first I will only need a tiny subset.  To meet the immediate goals, I need only support READ, EXECUTE  and PRIVATE fixed-address range mappings.

Looking only at the simple cases first, I could see supporting these syscalls in the MM (memory manager) server.  This is where fork and exec happen.  The main change would be in do_exec(), which currently calls out to the FS (file server) to open an executable, read the header and then read in the code/data and finally close the file.  I think the general idea would be that do_exec() would open the file as usual, but not read in the code/data - and further, not close the file.  It would keep it open and associate the fd with the process.  It would allocate a page table with all entries unmapped, but tagged as file-backed.

When the applcation is launched, it would take a page fault, sending control to the kernel.  The kernel has a copy of the page tables, and after verfifying that the page fault was recoverable, it would contruct a message to send to MM to service the fault.  MM would then allocate a fresh page and either copy from an existing page (if it was a copy-on-write page fault) or invoke the FS to fill the page from the backing file on disk.  Once all that completed, the process would be made ready to run and re-attempt the instruction which caused the fault.

Now, if this was all there was to the problem, it would be quite simple.  What I've described above could be done with a negligable amount of new code as most of what is needed already exists in Minix.  The difficult part - at least I'm having difficulty wrapping my head around it, is what to do if the page fault happens not during normal execution, but as part of a system call made on behalf of the application.

For example, suppose a program does a file read, and requests that the newly-read data be placed in a buffer that has not yet been mapped.  If we didn't know ahead of time that the user buffer was unmapped, we'd face the situtation that the FS would take a page fault in the middle of an operation and then need to be invoked itself to service that fault before it could continue.  Minix's single-threaded design makes this difficult.  There are other similar issues, but they generally boil down to how to handle or avoid a page fault when doing a system call or other OS-level operation.

At the moment, I'm thinking in terms of validating system call arguments on entry to the MM and FS, with the intent of making sure the pages are in place before proceeding.  However, this gets a bit ugly as only the kernel and MM have copies of the page mappings.  I really don't like the idea of having all servers get a copy of the mapping, but the overhead additional message passing to MM to verify seems way too much.  Alternately, I suppose we could take another approach where we just run as if things are good, take the fault and then somehow back up and restart the system call after the page fault has been dealt with.

Not sure what to do here yet.  Need to think on it a bit.

[Update] - I think I'm seeing the beginning of a solution to this problem.  The key problem is at the time of page fault, we would need to invoke MM to allocate a page of physical memory, and then FS to locate and read the physical sectors on the disk the page is to be loaded from, but page faults could originate on behalf on MM and FS - which are not re-entrant.  One thing I hadn't mentioned above is that because MM and FS are user-mode processes, they would never actually generate a real page fault on behalf of a client.  Instead, they call back into the kernel's SYSTEM task to perform memory copies to and from client processes memory.  The SYSTEM task uses umap() to identify the target user memory pages and knows whether the target pages are mapped.

So, SYSTEM would be in a position to "service" the fault before it happened.  That, however, still leaves the problem of having to re-enter MM and FS to allocate a memory page and locate the bits to be faulted in from the block device.  But what if we did not have to re-enter MM and FS?  In other words, what if the ability to allocate a physical page of memory and the knowledge of the mapping between an unmapped virtual memory page and its disk-backed storage were available outside of MM and FS?

Assume we have a new server called MAPPER.  This server could be viewed as a data repository which never initiates requests to other tasks or servers, and thus would avoid participating in a possible deadlock situation.  It would have two jobs:

  1. Keep a list of free physical memory pages and support requests to add and remove pages from that free lsit.
  2. Maintain a mapping between physical disk blocks (or sectors) and the unmapped virtual memory pages that those blocks are backing up.

On startup, MM would hand off the MAPPER a list of all free pages.  At the time of a fork(), MM would no longer allocate physical pages to a child, but would simply copy the parent's page table and ask the SYSTEM task to mark both the parent and child's data page as read-only (i.e. - copy-on-write initialization).   For an exec(), MM would not allocate physical memory pages, but simply initialize an unmapped page table.  As before, it would call FS to read the new program's header to determine code, data and bss sizes, but instead of actually reading the code and data into physical memory, it would ask FS to return  mapping structures which identify which disk blocks (or sectors) are backing the unmapped virtual pages.  MM would then pass this structure off to MAPPER.

When a process takes a recoverable page fault during normal execution, the kernel would construct a message describing the fault and sent it to the SYSTEM task.  SYSTEM would ask MAPPER to allocate a fresh physical memory page and also return the device/block/sector info describing where the needed bits live.  SYSTEM would updage the process' page table with the new physical page info.  Finally, the appropriate bits would be copied to the new page either directly by SYSTEM if this was a copy-on-write fault, or by sending a read message to the appropriate device driver task.  Once the fault was serviced, the faulting process would be made runnable.

The above actions would also be taken by the SYSTEM task if it detected an impending page fault that would be caused by a SYS_COPY request made by the MM or FS servers prior to performing the SYS_COPY (it would discover this during the execution of the umap() call that happens prior to phys_copy()).  This appears to solve the re-entrancy problem, as neither MM nor FS would need to be invoked to service the fault.  And, at least right know, I don't see any deadlock possibilities.

Other details and questions include:

bulletMM keeps an open file descriptor for all active mapped files. 
bulletShould FS actually be the one to tell MAPPER directly about mappings?  That way when a file's open count goes to zero it can tell MAPPER to release the mapping (or should this mapping be process-based rather than file based, which which case it would more logically be MM's job).  Think about this. (leaning towards MM and process-based...)
bulletThe page <=> device/block mapping could be fairly large.  Assuming 2 1K disk blocks per 2K page represented by 32-bit sector starts, 64 pages per process and a maximum of 32 active processes, the table could be as large as 16K bytes for Magic-1.  This is too large to include in the kernel proper (as all kernel tasks currently share the same memory map).  However, MAPPER probably doesn't need access to any other kernel data, so it should be fine for it to have it's own distinct memory mappings and page table.  I probably would want to treat it as a kernel task as far as scheduling priority goes.  So, it would look like a user-mode server such as MM or FS, but would be scheduled as a kernel task.
bulletThe mmap() and munmap() system calls would be handled by MM.
bulletA new system call, say "fmap()", would be added to FS.  It would be handed a file descriptor, plus a starting page-aligned offset within that file and a length.  It would return a device identifier plus a list of the disk blocks/sectors on which that range is stored.  This would be used by MM.  Should I also expose it for general use?  I think this could be quite simply implemented using the fseek() code in FS (i.e. repeated fseek()s to each block start and record where we are).
bulletWhat should the device identifier that FS associates with the fmap() range be?  SYSTEM will need to generate the read (and perhaps later, write) message that would be sent directly to the device driver.  Ideally it shouldn't have to even know what kind of device it is dealing with.  Complication: we'll probably have to bypass FS's block caching mechanism.  It's possible, I suppose, that the block cache could be broken out as a server in its own right that could filter requests from both the FS and SYSTEM's page fault mechanism.  Or, maybe this isn't a big issue any more as the MAPPER itself will act as a block cache for application code and data pages.  The FS cache would no longer contain paged-in blocks, and would largely be concerned with directory blocks and data files.  Yes, I think this is not an issue.   The MAPPER will serve as a block cache for any mapped file regions, and the FS's caching mechanism won't even see those blocks.  The disk blocks that the FS cares about will be subject to local FS caching (which will now become much more effective).  Win-win - cool!

I'll think about this more over the weekend, but implementing demand paging for Magic-1 Minix could turn out to be quite simple.

9/7/2007

Last night I added the vfork() system call to my Minix port in hopes of speeding up fork/exec.  It works and is measurably faster - but not noticiably faster.  I had hoped it would improve the apparent responsiveness of the system.

I guess the next step is to go ahead and implement demand paging - which is likely to be a significant bit of work.   In my previous Monitor/OS I implemented a simple form of demand paging and it made a noticable difference in the percieved speed of program launch.  In short, demand paging works by loading just the portions of the program that are needed as the program runs.  Currently in Minix, if you exec() a program all of the code and data is loaded from disk before any execution begins.  For Magic-1 and a typical program, this can take up to 4 or 5 seconds if the program is on the hard drive (or a second or two if it's on the RAM disk or in the 2nd-level RAM disk cache).

With demand paging, you can start getting output much quicker because only a small portion of the program typically needs to load before a prompt of some sort is display.  In some cases, demand paging can reduce overall performance (because if you really need to load everything it is more efficient to do it all at once), but is can make the program "feel" much more responsive - and that's Magic-1's biggest issue right now.

I have some ideas about how to proceed, but the first step in any case is to rework Minix to allow programs to have non-contiguous code and data physical memory.  I can go ahead and do that while working out the details of the paging support (which will likely take the form of a new PAGER kernel task).  The tricky part will be handling (or avoiding) page faults on behalf on user processes that would occur during the execution of system calls by that process.

9/2/2007

Lots of good progress over the holiday weekend.  I completed the instruction set changes along with some LCC tuning and saw dramatic improvements in 32-bit arithmetic (as well as large memory copies).  The new instructions are:

bulletadd.16    (--A),(--B)
bulletadc.16    (--A),(--B)
bulletsub.16    (--A),(--B)
bulletsbc.16    (--A),(--B)
bulletor.16    (--A),(--B)
bulletand.16    (--A),(--B)
bulletxor.16    (--A),(--B)
bulletmemcopy4    (A++),(B++)

Now, vi and awk fit in my 64K code space - and both work great (although I did have to find and fix a subtle bug in my a.out.h header file that was breaking any program that had code on the final page of code space).  This should be about the end of any microcode or instruction set changes.  I've only got a tiny amount of free microcode space and had to carefully examine and rework a lot of the existing microcode to make space these instructions.

I had hoped that I might also be able to get the last of the 16-bit Minix programs to work - INET, the TCP/IP server.  However, it's just a bit to big (about 15K bytes over).  Whereas the problem with the other programs was the (now fixed) advantage x86 had over Magic-1 due to memory-to-memory operations, the issue in INET is that Magic-1's runtime calling convention is a bit more verbose that that used by the x86 Minix compiler.  And, INET does a lot of small procedure calls.  The x86 Minix convention pushes args on the stack, and can do this in just a few instrucion bytes.  Magic-1 uses a fixed frame convention and generally has to to a load/store pair to pass an argument.  I don't see an easy fix for this, so I'll likely have to either implement code overlays or remove some of the INET functionality.

I'm very happy with my new 4-byte memory copy instruction.  Stock Minix doesn't support paging, and ends up doing a lot of memory copying.  I wrote a kernel-only memcpy() routine that includes a massively unrolled loop of memcopy4's that nearly triples the speed of page (2K) and file system block (1K) copies.  The improvement is noticeable.  The system is almost usable now, and once I do something about the poor fork/exec performance it should give a reasonable responsiveness (considering the 4.09 Mhz clock speed).

Here's a partial list of things I need to do:

  1. Track down and fix the problems I've run into with zmodem and kermit file transfer.   This has been a bit of a pain - I have to jump through hoops to copy programs from my Linux development environment to Magic-1's file system. 
  2. Fix the telnet problem that causes excessive echoing from some telnet clients.  This may just an issue with the configuratin of my Lantronic device server, but it is annoying.
  3. Implement proper handling of .bss data sections.  When I first wrote my assembler and linker, I took a shortcut and just included .bss with initialized data.  By properly handling bss, I should be able to speed both program both fork() and exec().
  4. Rework the ram disk driver to support multiple ram disks.  Right now, it knows about a single ram disk, which can either be used as the root drive or as a 2nd level block cache.  I've got enough memory that I can easily afford both, and perhaps also a scratch file system that can be used with abandon by guest accounts and temp files.
  5. As a temporary hack, implement vfork().  vfork() is a variant of fork() that can be used to improve performance on systems without proper memory management.  Currently, when a Minix process forks, the entire image is copied (or at least the data section) to the child.  However, this is generally a wasteful operation and the child typically immediately does an exec(), which will release the copied pages and allocate & load new ones.  vfork() is a pretty vile hack that blocks the parent and allows the child to use the parents code and data pages until it either does an exec() or _exit().  The proper solution is copy on write, which is that my old Magic-1 OS did.  However, implementing copy on write will take a bit of effort.  There are a lot of places in Minix which assume that a process's physical code and data memory are in contiguous chunks.  To implement copy on write, I'll need to change all of the virtual <=> physical memory address translation routines to query the process's page table (and also deal with subtleties dealing with page crossing).  There is also a bit of nastiness because of the requirement that kernel code must never cause a page fault while servicing a page fault.
  6. Get pseudo terminals working.  I've tested the multi-user feature of Minix using a tty hanging off of each of my two serial ports, but my end goal is to have one serial port assigned to the console, and a series of pseudo terminals sharing the SLIP internet connection managed by the other serial port.
  7. Finally, my last stretch goal is to get my 2nd favorite old game running: Larn, which is similar to Rogue.  I played a lot of Larn on HP-UX systems long ago (on 1200-baud remote terminals), and it would be a kick to see that running on Magic-1.  This, however, will certainly require me to implement code overlays.

I expect that I'll be ready soon to open up a general guest account for folks to try the system out.

8/26/2007

One of the great things about building your own machine is that you can do whatever you want.  I'd gotten used to big-endian bit and byte ordering from my days working on HP's PA-RISC machines, so it seemed natural to make Magic-1 a big-endian box.  I also did it because it was a bit contrary to the rest of the world these days.  On Magic-1, the most significant bit in a byte, word or long is numbered bit 0, and 16-bit integers are stored with the most significant byte at the lower address.  I kept up this convention when doing code generation for 32-bit integers.  A 32-bit integer 0x12345678 would be stored with 0x12 in the first byte, 0x34 in the second, 0x56 in the third and 0x78 in the fourth.  It seemed the consistent and right thing to do.

However, I wasn't thinking clearly when I designed my latest batch of instructions to support 32-bit moves and 32-bit arithmetic.  The specifications, and microcode, correctly deal with big-endian ordering within a 16-bit integer, but treat the two 16-bit words that make up a 32-bit long in little endian, rather than big endian order.  I specified the instructions as using post increment:

    add.16 (a++),(b++)

with the intention that I could support multiple precision integers by simply  doing a series of these operations in line.  See the problem?  We have to do arithmetic starting with the least significant words - which in a big endian world come after.  So instead,  I need to make my new instructions pre-decrement:

   add.16 (--a),(--b)

For example, if we have the following C code:

long foo;
long bar;
foo += bar;

The code I need to generate would be something like this:

lea    a,foo+4(dp)
lea    b,bar+4(dp)
add.16    (--a),(--b)
adc.16    (--a),(--b)
 

In other words, we need to point just past our operands, and then walk from least to most significant using pre-decrement rather than post-increment.  Similarly, I had designed the new 4-byte memcopy instruction to not alter its base pointers.  The idea was that for a 3-operand operation, I could avoid one of the leas.  Instead, I'll need to have the memcopy update its base pointers as well.  This will also be useful for general memory copies, but will be a bit of a challenge to code (again, the issue with not committing changes until we're sure we're not going to fault).  The following code

long foo;
long bar;

long foobar;
foobar = foo + bar;

  would be generated as:

lea    a,foobar(dp)
lea    b,foo(dp)
memcopy4 (a++),(b++)

lea    b,bar+4(dp)
add.16    (--a),(--b)
adc.16    (--a),(--b)

I haven't yet reworked the microcode to see what changing from post-increment to pre-decrement will do.  I'm guessing it won't hurt the arithmetic instructions, but it probably will cause my 4-byte moves to slow down a cycle or two.

[update] - Okay, I looked at the microcode.  The arithmetic operations can be trivially changed from post-increment to pre-decrement without cost.  However,  adding post-increment to the memcopy4 will be costly.  One problem is that I don't have an easy way of generating a "4" in microcode.  My microcode definition includes the ability generate constant 1,0,-1 and -2, and to get a 4 I would have to burn two clock ticks to add 1+1=2, and then 2+2=4.  It looks like it would cost 4 additional clock ticks.  This is too much - moving a 4-byte copy from 13 to 17 clock cycles.  Much of the time I won't need the auto-increment, so those 4 cycles would just be wasted.  I guess I'll just use lea to adjust when necessary.

8/24/2007

Time for an update.  I've given quite a bit more thought to my 32-bit code generation issues.  First, I took a close look at the performance and code-size consequences of moving 32-bit mem copy, addition, subtraction, and & or from my current in-line sequences to fast runtime "millicode" calls.  The good news was that doing this would largely solve my code size problem (both vi and awk would fit in a 64Kbyte code space without having to resort to overlays).  The bad news is that it would go a lot slower.  So, I decided to try to figure out any instruction set changes that could help.

As far as ISA differences between Magic-1 and X86 go, it is really the memory-to-memory operations that gave x86 the edge in code size when doing a lot of 32-bit integer math.  I've avoided doing memory-to-memory ops for Magic-1 for several reasons.  The most important is that it's kind of tricky to get right.  The main issue is how to handle instruction rollback in the event of a page fault part-way through an operation.  For example, suppose you have a 16-bit memory to memory add instruction, where  *op1 += *op2.  Suppose also that op1 is located in memory at such that one byte of it falls on one page, and the other on the next page.  With a machine with an 8-bit data bus (like Magic-1), you need to do this operation a byte at a time.  If the first byte from each operand is on a page that is resident, you can load add and store them.  However, if you take a page fault while loading the second byte, you cannot roll the instruction back to retry it later (after the page is faulted in), because you've already stored the first byte.    This is less of an issue for a memory copy, because at worst you will redundantly copy the first byte after a restart, but when you're doing a 2-operand addition, you've trashed your first operand.

Most modern CPU's either disallow page-crossing (requiring 4-byte int alignment) or have a probe mechanism to test whether a fault would happen before committing a change.  I considered adding a probe facility, but decided that the cost would be too high (a decision I now question, after working with some of the Minix code).  Anyway, I'll cut this short.  I bit the bullet and designed a new set of 16-bit memory to memory arithmetic operations: add.16 (a++),(b++), adc.16 (a++),(b++), and.16 (a++),(b++), or.16 (a++),(b++) & xor.16 (a++),(b++).  These are all 1-byte opcodes, and will not only enable me to keep 32-bit arithmetic inline, but will do it in fewer bytes than an millicode call and significantly faster than even the old in-line version.  The instructions will treat the first operand, pointed to by the address in register A, as both the first source operand and the target.  Additionally, they will auto-increment the two pointers to allow multi-precision arithmetic.  For example, a 32-bit addition of two frame locals would look like:

lea    a,<offset1>(sp)
lea    b,<offset2>(sp)
add.16    (a++),(b++)
adc.16    (a++),(b++)

In most cases, this will require only 6 bytes of instructions and 36 clock ticks, vs 19 bytes and 44 clock ticks for the current code.  This is a big win.

To make this work and allow instruction restart, however, I've had to get a little unclean.  I wrote the microcode in such a way that the first byte of the result is not stored until we're sure that there will be no page fault.  So, the instruction starts off by loading operand 1 and then immediately storing it back.  If we survive this, we know that the page[s] are both readable and writeable.  Next, operand 2 is loaded.  Once this is done, we're golden and can therefore commit changes to memory and registers.  Note that this means these instructions should never be used to access I/O devices, which often have side effects on loads and stores.

There is one additional pretty ugly hack.  All of these instruction will destroy the contents of register C.  I simply needed more temp storage to hold on to operand 1 until it's safe to commit.  Normally, this isn't too bad - register C is also modified in the counted instructions (memcopy, strcopy, vshl, vshr).  However, the gross hack part is that C is not rolled back in the event of a page fault part-way in.  This is not very clean, but the performance payoff here seems to justify it.  We'll just document it and call it a feature.

This is a large change.  I had a few free instructions that I was saving for FORTH opcodes - but not enough to add all of these new ops.  So, I've had to remove several instructions from the ISA, and rearrange others.  This means that once I update the microcode, all previously compiled Magic-1 code (on disk or in ROM) will be broken.  I plan on making this change in very small steps over the next week or so.

After these changes are made, the only bit of Minix that still has a code expansion problem will be INET, the TCP/IP stack.  I'll examine that in detail later - perhaps I can completely avoid having to write an overlay mechanism.

8/20/2007

I decided it was time that Magic-1 started serving web pages again, so I recompiled my uIP web server.  I had to fiddle a bit with non-blocking I/O, but it seems to work.  The performance is about 15-20% slower than what I was getting with my old Monitor/OS.  I hope to close the gap a bit with tuning, but am unlikely to reach the same performance level (my old OS didn't do nearly as much, and was highly tuned for uIP).  Anyway, it's back serving web pages now, as well as periodically sync'ng Magic-1's real-time clock with a network time server.  One other thing I need to do is change the appearance of the web page to highlight the fact that it's now running Minix.  The old site had a feature that would load current stats about the OS (last visitor's comment, number of commands executed, etc.).  I think it would be nice to do this for Minix as well - something like push a button and do a remote "ps -ef" or "who".

In other news, I've been looking more closely at my 32-bit code generation (which is much, much worse than the x86 Minix code).  The big problem is code expansion, which largely caused by the following factors:

bulletInline arithmetic:  My lcc port generates macros for 32-bit arithmetic ops, which are expanded inline.
bulletUseless copies: My strategy for supporting long, float and double types in my lcc retargeting is pretty gross, and results in lots of what lcc thinks are register-to-register copies, but in fact are memory copies.
bulletx86 memory-op richness:  x86 has a lot of memory-to-memory operations, while Magic-1 must load to register, perform the op and then store.

Some of this I can fix with a peephole optimization pass, but the biggest bang for the buck will come from supporting 32-bit operations out-of-line and by adding some new instructions.  The first two instructions are short-form "lea [a/b],#u8(sp)" ops.  These generate an SP-relative address, but take a 8-bit offset rather than a 16-bit one.  The vast majority of current SP-relative leas will fit in 8-bits, so this will save a lot.   Next, I'll add a special-purpose 4-byte memcopy instruction.  Not only will it reduce the code size of a 32-bit copy from 8 bytes down to 5 (lea, lea, memcopy4), but it will do the copy in 18 clock ticks rather than the current 28.

8/16/2007

I'm  a little less confident that I've got the new serial port driver correct.  I've been trying to get Adam Dunkels' uIP TCP/IP stack running and am having some difficulty.  This is the stack I used for Magic-1's original OS.  It's getting closer to working, and at the moment it seems to correctly respond to a single ping, but then something hangs.

On the other hand, I have successfully run several tests of pumping large files across the serial port at 38400 BAUD, all the while running CPU benchmarks in the background and performing other disk I/O.  The handshaking seems to work just fine - we'll get a burst of data, stop for a while and then take in another burst.

I think my next step will be to go back to a untouched copy of the uIP code (rather than start the port with my old Magic-1 copy).  I played some hacky tricks to improve the performance of the stack, and the things I did might not play well with Minix.

Also, it's just about time to break out the logic analyzer for some performance measurements.  If I remember correctly, my logic analyzer has a feature that will enable me to time and then plot the periods in which interrupts are disabled.  I've done quite a few reviews of the code and believe I am keeping locked regions to an acceptable level, but I'd feel better with some actual measurements.

8/14/2007

I'm now back from vacation and am resuming work on the port.  I decided to completely rewrite the rs232 interrupt handler to take advantage of the 16550 FIFOs.  After a few frustrating coding/debugging sessions, it seems to be working just fine.  The big problem I ran into turned out to be in tty.c's in_process().  This code is responsible for copying incoming data from the low-level driver's buffer to the primary tty input buffer.  For some reason, it was silently discarding data when the buffer filled up and the tty was in canonical mode (but not raw mode).  I'm probably missing some subtlety, but I can't imagine why it would do that.

Anyway, I've changed to code to not silently discard the data and everything seems to be working just fine.  It may be that I'm the only person who has ever encountered  that particular bit of code because of Magic-1's relatively slow computational speed relative to the high UART speed of 38.4K BAUD.

The new rs232 interrupt handler is pretty fast, though I could probably get about 25% speedup by recoding it in assembly.  I ended up deciding to have the entire handler run with interrupts disabled.  I'll redo the calculations be be sure, but I believe that it is fast enough that one serial port won't starve the other and cause data overflow, and I shouldn't have any lost clock ticks.

Next up I want to build and test zmodem binary file transfers.  Getting that working will make life a lot easier.  Right now I have to jump through some awkward hoops to copy files to and from Magic-1's file system.

8/11/2007

Haven't made any code changes, but have given some more thought to the issues around getting the best serial I/O.  Here's a brain dump of my current thinking:

bulletA key design goal of Magic-1 was that it should be capable of internet connection.  I decided against putting in an native ethernet adaptor, and instead am using a dedicated serial port and SLIP/PPP to one of my Linux boxes.  So, I have one serial port for a console and another for the SLIP connection.  It is the SLIP connection serial port that I really want to get the best performance out of.
bulletThe fastest speed my UART supports is 38.4K BAUD.  If thinking in terms of serial port speed vs. CPU clock speed, something closer to 2400 or 9600 BAUD is more realistic for a 4 Mhz computer.   Consider the following "back of the envelope" calculations:
bulletAt 38,400 BAUD and 4.09 Mhz main clock, each bit takes 109 clock cycles to transmit or receive.  Assuming 8-bit words, 1 stop and 1 start bit, one byte is transmitted in 1090 clock cycles.  Magic-1 typically takes about 5 cycles per instruction, which means that at maximum speed Magic-1 can execute only about 200 instructions for each byte received.
bulletIf you factor in the interrupt-driven nature of serial I/O, it's clear that at 38.4K BAUD, Magic-1 would be almost completely occupied by echoing data - it wouldn't have many cycles left over to process data.
bulletStill, to obtain the fastest TCP/IP response times, we would like to make the actual packet receive and transmit as fast as possible.
bulletOne way to reduce communication overhead in an interrupt-driven system is to have FIFO buffers on the transmit/receive hardware.  Magic-1 uses 16550 UARTs, which feature 16-byte deep FIFOs.  The UARTs can be programmed to buffer up input and only fire off an interrupt when 2, 4, 8 or 14 bytes have been received.  Similarly, they can receive up to 16 bytes for output and will fire an interrupt when the transmit FIFO has drained.
bulletAlthough Magic-1 can't handle a sustained high BAUD rate, it can use hardware flow control allow "bursty" high-speed communication.  In other words, the UARTs communicate at high speed to fill up their FIFOs and secondary buffers, and then block further transmissions until  the received data has been processed. 
bulletFor the UARTs, we can use the RTS/CTR signals (a.k.a. "hardware handshaking").   The UART receives data at high speed, and buffers up incoming bytes in the FIFO until a threshold is reached (probably 8 bytes).  It then fires an interrupt.  Magic-1 stops what it is doing and copies the incoming bytes from the FIFO to an internal circular buffer.  If that secondary buffer get close to filling up, Magic-1 will drop it's RTS signal, which tells the sender to stop sending.  Note that the sender may continue to send data for a little while before it responds to the dropping of RTS.
bulletMagic-1 will then process the incoming data in the secondary buffer.  At some point, the secondary buffer will be sufficiently drained that RTS can be raised, allowing the sender to give up more data.
bulletThere are two things we're trying to do here.  First, communicate at high speed to reduce response latency.  Second, enable FIFO buffering to reduce the number of interrupts that must be handled.  With no buffering, you have an interrupt overhead of 1 interrupt per byte received.  With a FIFO threshold of 14, you can do as well as 1 interrupt per 14 bytes.
bulletAs is typical in the world, the goals of high speed for reduced latency and high FIFO threshold for reduced overhead are conflicting.    A requirement of the FIFO buffering is that we must have enough time to recognize an impending buffer overflow and tell the sender to stop transmitting.  However, the faster the transmission speed the smaller our window to accomplish this.
bulletThe minimum window is the time taken by the RS232 interrupt handler to launch, talk to the UART to see what it wants, drain the FIFO, fill the secondary buffer and drop CTS.  This window is then further reduced by the maximum time Magic-1 spends with interrupts disabled plus the total time spent by any possible combination of nested interrupts accepted while the key RS232 interrupt handler is running before it begins draining the FIFO.
bulletAs it stands now, from the point of interrupt, the RS232 interrupts handler takes about
bullet23 instructions from interrupt until interrupts re-enabled
bullet30 instruction from that point until 1st FIFO byte drained.
bulletAt the toughest setting, 38.4K BAUD and a FIFO threshold of 14, we must be able to start draining within 2000 clock cycles, or 400 instructions.  Subtracting 23 and 30 from 400, we're down to about 350 instructions worth of time that can have signals blocked or the handling of nested interrupts.
bulletIt shouldn't be very difficult to keep periods interrupts are blocked down to ~50 instructions.  The difficult thing is nested interrupts
bulletI think I can make this work by doing the following:
bulletThe SLIP serial port set to either 19.8K or 38.4K BAUD, with a FIFO threshold of 14
bulletThe interrupt handler for the SLIP port is given top priority, and unlike the other handlers, it runs with interrupts disabled until it has had a chance to drain the incoming FIFO (at which point it re-enables interrupts).
bulletAll other handlers are made interruptable early (as is done now).
bulletImplement mutexes using my load and clear instructions (actually, this isn't really necessary - but I think it will be cleaner in some places).  We must still use interrupt disable/enable to protect structures that may be modified by interrupt handlers, but we can use to mutex protection to mediate data structures that can be modified by tasks.
bulletThe longest period of disabled interrupts covers modification of the linked list of TTY wakeup timers.   If necessary, this lock/unlock can be eliminated by having the task handlers set a flag before they start looking it.  The clock tick handler would look for this flag and abort if it is set (after incrementing "lost_ticks").  This would avoid contention at the expense of slightly delaying timer firing.

8/6/2007

Just wanted to jot down a few thoughts on possible ways to reduce the maximum latency for responding to serial port input (needed to get the high communication speeds I'm after):

bulletThere are two main problems right now - both caused by me. 
bulletBecause I don't yet have the ability to disable individual interrupt lines, I'm blocking all signals while in the main handler loop for the serial port.  This causes problems when I've got input coming into both ports at the same time.   I think it will be just fine for one line to interrupt the other - they don't share any critical structures that might change.  The problem to avoid is a line interrupting itself.  My plan to solve this is to have the handler for each serial port interrupt line replace it's handler pointer in the interrupt vector with a stub that will simply record an attempted re-entry and then return.  When the main handler is ready to return, it will first check to see if it needs to be run again.  Before exiting, it will re-arm itself.
bulletwhen I designed Magic-1's instruction set, I added some instructions that could copy to and from user space - allowing the address translation hardware to map to physical memory.  I believed this would make for a faster physical memory copy.    As it turns out, I think this may not be useful.  It is faster, but it is not re-entrant.    This may be useful in a few places (passing small messages), but it won't be able to use it for general physical mem copies because I can't afford to keep interrupts disabled.   Several things need to happen here:
bulletHand-code a fast phys memcopy in assembly, paying particular attention to message-sized copies, aligned disk sector-sized copies and aligned page-sized copies.  These routines will all be interruptable.
bulletSplit out the code so that there is one physical mem copy routine that is used only by kernel tasks (which actually run in user mode) and another phys mem copy that is only used by true kernel (supervisor mode) code.  These copy routines will work by remapping reserved src and tgt page table entries onto the source and target physical memory.  Because each kernel task has its own private page table, we don't need to worry about tasks stepping on each other's phys copies.  I will, however, have to ensure that the kernel phys copy routine does not re-enter itself.
bulletThe Minix kernel code uses enabling and disabling of interrupts to protect critical regions.  This is sometimes less than ideal.  It generally works in a single-cpu machine, as you can't re-enter a code region when interrupts are disabled as it is only through interrupts that a thread will be pre-empted.  However, it is a bit of a sledge-hammer approach when all you are trying to do is ensure mutual exclusion.  When I designed Magic-1's instruction set, I included an atomic semaphore primitive that may be useful here.  It will load the contents of a memory byte and then set that memory byte to zero.   I think it would be good to use this to create mutexes to protect the various critial regions used by the kernel tasks from each other (I'll still need to disable interrupts when the supervisor-mode kernel code is running).   A quick pass over the kernel code suggests that there are only a handful of structures that need to be protected, and only one that keeps interrupts blocked for a significant amount of time.

Anyway, by implementing the above, I should be able to dramatically reduce the worst-case interrupt responce latency to a small and manageable amount.  That, in turn, should allow me to run the UARTs in FIFO mode at a much higher speed than 4.09 Mhz CPU would generally support.

8/5/2007

I'm almost, but not quite, ready to declare a successful Minix port.  It's running pretty well, and I've got all but a handful of the Minix applications ported over (the vast majority required only a recompile).  The biggest remaining problem is related to interrupts and the rs232 driver.

Because the serial ports are the primary modes of getting data on and off of Magic-1, I need them to reliably run at the highest possible speed.  This was probably the area I spent most of my time on when I wrote the original Magic-1 OS/monitor, and it took me quite a while to get it working well.  Things are not working so well right now with the Minix rs232 driver.  When I push a big chunk of data into the machine, the driver hangs.

There is a hardware handshaking mechanism by which the driver tells the sender to stop sending data.  I suspect that's where the problem is.  The key to making this mechanism work is to ensure that the driver can quickly respond to incoming data and shut off the sender before buffers overflow.  Right now I've got too many places where I'm masking interrupts (and for too long).  Also, the Minix rs232 driver is intended to support an old UART which does not have a FIFO - something that has to work to get the good speeds.

Anyway, the first step will be for me to carefully examine and eliminate or reduce the periods where interrupts are disabled.  Next up, I have some ideas about how to give priority to responding to incoming serial port data in code regions that must otherwise be locked.  As part of this process, I may eliminate the existing code that allows multiple interrupt handlers to share a single interrupt line (something that x86 needs, but Magic-1 does not).

The other big issue is that I have run into code size problems with vi.  Vi has a lot of 32-bit arithmetic, which my lcc port handles poorly.  As a result, vi won't fit into a Magic-1 64 Kbyte code space.  I've got to address this either by improving my lcc retargeting or by supporting code overlays (or both).  I won't be able to consider this a real system until I have something that looks like vi.

Some other minor issues that other folks might run into with similar porting efforts:

bulletI fought with what I thought were line termination problems ( CR vs LF-CR ).  I was having difficulty logging in, and it appeared that I was getting spurious carriage returns.  It turns out that I had special file permission problems with /dev/tty01.
bulletRight now I'm supporting remote access using a Lantronix device server.  You telnet into the device server, and it connects to Magic-1 via a serial port.  However, there is some weirdness with the connection.  It seems I need to ensure that the telnet program in use uses character rather than line mode ("mode character" at the telnet command prompt).
bulletMost of the porting problems I had were due to my C compiler, lcc, being much less forgiving about non-ANSI code, and simply required cleaning up declarations.  However, there were a few more subtle problems related to what I understand are new ANSI rules about how to treat unsigned values.  Pay attention to compiler warnings.  Two in particular to watch out for were:
bulletComparing an unsigned to -1.  Lcc will treat this as a compile-time constant.
bulletComparing an signed to a "sizeof(xxx)" result using a relational operator other than "==" or "!=".
bulletBe careful with /etc/rc during bringup.  In particular, make sure that you have a way to launch a single-user shell up front.  I caused myself some extra problems by not having this ability at first, and had to rebuild a complete disk image because of a typo in /etc/rc.

Much, however, does work pretty well.  I've got two serial ports and thus support two simultaneous sessions.  Cron is running, man pages can be displayed, local mail works, fsck fsck's when appropriate and so on.  Besides vi, the only other important missing pieces are awk (code too big because of my poor lcc floating point support) and the tcp/ip stack (inet server - also code too big because of my poor lcc 32-bit integer support).  I plan on fixing those two via a code overlay mechanism.  Oh, and I've also got a workaround in place for the Minix shell: ash.  I haven't debugged it yet, but ash hangs when I use the "readline" package - so I've disabled that for the moment.

The last issue is speed.  The system is painfully slow - it can take several seconds for even the simplest of commands.  This seems less bad when I remember what machine speeds were like 20 years ago, but I believe I can make significant improvements.  First off is just tuning, which I haven't done yet.  i'm sure I can significantly speed up the physical memory copy routine for message passing.  Second, I never got around to supporting bss in my tool chain (I just add bss to initialized data).  This means I have to copy more from disk than is really necessary when I load a program.  Finally, the biggest improvement should come from either implementing demand paging and copy-on-write or something like the old vfork() hack.  Fork/exec is much, much slower at the moment than process launch is with my old monitor/OS (which did demand paging and copy-on-write).   Fixing that alone should go a long, long way towards making the system feel more responsive.

Oh, and other other thing: there is no native tool chain, and probably won't be.  All of the system is built using a cross development environment.  Without too much difficulty, I could get the linker, object file tools, libraries, cpp and the assembler running natively.  However, lcc is awfully large - too large for my 64K code and 64K data limitations.  I'm not especially interested in massaging lcc to make it fit - this is a hobby project after all and that sounds too much like real work.  However, I have toyed with the idea of creating a web service wrapper around lcc.  In this scenario I'd have the linker, assembler, etc. running natively.  The cc driver, though, would bundle up and compress the input source file, squirt it over to my Linux box where it would be compiled with the cross-lcc compiler.  The results would then be shipped back for linking or error reporting.  We'll see.

Once I get things a bit more stable, I'll open the machine up again for guest logins.

I've got a busy week coming up, so I doubt I'll be able to make much progress this week.  Still, it looks pretty good.

7/28/2007

WooHoo!

7/26/2007

Mostly good news tonight.  I took a look at my sendsig code and noticed that I had neglected to change the sigcontext structure to reflect a change I made earlier to how I was handling the page table base register in the proc structure.  One simple edit later and it appears that at least basic signal capabilities work (I haven't done extensive testing yet).

Then, I decided it was time I started running a shell as my simple test program.  I now realize that I had not ever previously tried to compile "ash" - the Minix shell.  It's a bit of a mess when dealing with a cross-compilation environment - the make process includes the building of a half-dozen or so native programs to generate and massage the source code.  Also, ash on the 16-bit x86 version of Minix almost completely fills up a 64K code segment.

Anyway, I spent most of the evening trying to build ash.  Some of the native build actions I was able to perform using gcc and my x86 build host - but some I felt safest running on Magic-1.  I never quite got it to completely build.  Ash uses setjmp and longjmp, which I haven't implemented yet.  However, I did go ahead a stub them out to see how big the code was.  Good news: ash will fit on Magic-1.  It's even slightly smaller than the x86 version.  One of my Magic-1 instruction set architecture goals was code density equal to or better than x86 for exactly this reason.

So, I'm going to have to get setjmp/longjmp working before I proceed with trying out a full-featured shell.  Meanwhile, though, I cobbled together a trivial shell.  That's now running on Magic-1 - and I'll probably flesh it out enough to handle the needs of the boot process.  Ash is complicated enough that I'd prefer to tackle it after the rest of Minix is booting.

7/25/2007

Just a few small changes to my C start-up code, and exec() now works.  I've got a nice test that forks and execs - looking good.  Next up is working out the problem with sendsig.  There's a bit of machine/calling convention specific code there that I've likely gotten wrong.  Shouldn't be too hard to fix.  With luck, I should be in a position to start working through the real INIT code by this weekend.

7/24/2007

Tried out fork() and exec() last night.  After one minor change (the copy of the proc table entry for the child wiped out its page table base), fork appears to work just fine.  I've had to do a bit more work on exec(), which isn't surprising.  Magic-1 uses a quite different calling convention as well as a.out format than x86 Minix.  However, it now appears to be basically working.  I'm sure I'm not yet getting argc, argv and envp set up quite correctly - but they should be pretty close.

I did run into some odd behavior with sleep(), and possibly exit().  Earlier I did a quick test for sleep() and it seemed to work, so there may be a problem in fork or exec that is manifesting itself in sleep().   I'll track those down next, as well as finish off the argument setup for exec.

The way I'm testing things is to replace INIT with my test program.

7/23/2007

Just a quick update - shortly after writing last night's diary entry I noticed the primary problem with the TTY.  The code in rs232.c was written for an early version of the UART I'm using in Magic-1.  Magic-1 uses a 16550, which has a built-in FIFO.   The one Minix is coded for uses the same command set, but doesn't have the FIFO.   Anyway, the FIFO adds some extra status bits in the interrupt status register, and this was causing the switch statement in rs232_handler() to ignore all interrupts.  After masking out those extra bits, things started working pretty well.  I've still got some issues with nested interrupts, which I'm working around at the moment by blocking all interrupts for longer than I'd like.  But, so far, so good.  I've got reliable interrupt-driven TTY output and input.

Just for kicks, I replaced INIT with a slightly modified version of ps.  After all the tasks and servers are initialized, ps is launched.  Here's a screen dump:

I also had time this event to do some minor file I/O testing.  I added /etc/rc and /etc/ttytab to the skeleton Minix file system and then put some code at the beginning of INIT to just open those files and copy them to stdout.  Worked like a charm on first attempt.

There isn't that much left before I start working through the real INIT.  The next thing is to take a close look at fork() and exec() to make sure I've correctly reworked the low-level code to handle Magic-1's memory allocation model and a.out header changes.

7/22/2007

Steady progress, though I didn't get to spend much time on the project this weekend (the final Harry Potter book was released early Saturday, and kids (and I....) spent much of the weekend reading).  Anyway, a lot is working.  Magic 1 currently:

bulletInitializes all system tasks.
bulletInitializes the two user-mode servers: MM (the memory manager) and FS (the file system)
bulletCorrectly identifies the attached hard drive (my old 20-MB HP Kittyhawk drive)
bulletIdentifies a skeleton Minix file system on a 500Kbyte Minix partition on the hard drive.
bulletAllocates a RAM disk and copies that file system into it as the root device.
bulletBasic interrupt functionality is there - the heartbeat timer is running throughout this process and the hard drive driver is fully interrupt-driven (until recently it was busy-wait).
bulletAfter all of the tasks and servers are initialized, a user-mode program is launched (this will eventually be INIT).  The user-mode program does a file open on /dev/tty00 and then falls into a loop writing a message and then sleeping for a couple of seconds (see below).
bulletThe FS server correctly identifies /dev/tty00 as a special character device and routes the message to the TTY driver (and the rs232 driver).
bulletsleep() works, as do other timeout timers.