BMOW title
Floppy Emu banner

Archive for the 'BMOW 1' Category

Apple II Machine Language Monitor

While waiting for the parts I need for the video system, I ported the machine language monitor from the Apple II ROMs to BMOW. Somehow I’d never gotten around to writing an interactive program to examine and modify memory, which I’ll certainly need once I start trying to poke bytes into video memory and see something on the screen. Rather than write my own monitor program, I decided it would be interesting to port the Apple II’s. BMOW’s instruction set is mostly a superset of the 6502 CPU in the Apple II, and the monitor program’s source code is listed in the famous Apple II “Red Book”. It was written by Steve Wozniak and Allen Baum in 1977. I figured it shouldn’t be too hard to convert it for BMOW’s purposes. After a longer than usual amount of fiddling I have it mostly working now, but it proved to be a major pain in the ass! Here’s a photo of the monitor program in action, showing a disassembly of itself.

monitor program

The monitor is pretty darn impressive for a 2K program. Its features:

  • Examine a single memory location or a range of addresses
  • Disassemble a program at a given address
  • Store one or more user-supplied bytes beginning at a given address
  • Block move memory
  • Compare two blocks of memory
  • Run a program at a given address
  • Single step a program at a given address
  • Perform hexadecimal math using a built-in calculator

The original Apple II version also included an assembler, as well as functions to read and write data from a cassette tape. Ah, those were the days!
To port the monitor program to BMOW, I expected I’d need to solve a couple of problems:

  • Replace the original keyboard input and screen output code with BMOW equivalents
  • Adapt the output for a 20×4 LCD, instead of the original 40×24 television

That wasn’t too difficult. The bigger problem I hadn’t expected was that BMOW was much less 6502-compatible than I’d thought. In particular, which condition codes are set by which instructions differed quite a lot between the 6502 and BMOW. It turned out that the Apple II monitor relied substantially on the fact that certain instructions like INC would set the zero and negative flags, but leave the carry flag untouched. BMOW wasn’t designed to work that way, and unfortunately it took me many hours of digging through the source code to understand how it worked before I picked up on this detail, and even longer to find and fix all the instances where it caused bugs. My BMOW simulator proved to be extremely valuable, and I even spent some time to add rudimentary source-level debugging to the simulator. The good news is that I found and fixed a lot of BMOW microcode bugs, and I was eventually able to find a way to change BMOW’s condition code usage to match the 6502’s by reprogramming a GAL, with no wiring changes at all. And now I have a working monitor program!

After spending all that time in the monitor source code, I’ve come to know it pretty well, and I have to say that Woz and Baum must be crazy! OK not crazy, but they certainly approached programming much differently than people would today. The program makes widespread use of some pretty bizarre techniques which I’m sure were intended to economize every last byte. Functions are constantly calling into the bodies of other functions to borrow a little snippet here or there, so it becomes difficult to identify which bits of source code are related to any particular program feature. Conditional branches are used in many cases where it’s obvious that the condition will always evaluate true, which I think is because the conditional branch instruction takes less program space than an unconditional jump.

Beyond that, it also looks as if they intentionally designed the program so that certain parts of it would end up at very specific memory addresses. The program is precisely 2048 bytes. There are NOPs and padding bytes in a couple of places, and in a few places I saw a bigger instruction used where a smaller one could have been. For example, some code does a branch to another instruction, which is a RTS (return from subroutine). Why not just put the RTS in directly instead of having the branch? It seems that Woz and Baum may have started with goal addresses in mind for portions of the program, so that certain address arithmetic would work out nicely and save a few instructions, and then they went out of their way to make sure those goal addresses were achieved.

Maybe Woz or Baum or someone else who knows will follow up and explain what they were thinking; I’d be curious. Years ago I once received a shareware registration from Woz for a Macintosh game I wrote, it was a $10 check (signed simply “Woz”). In retrospect it might have been fun to hang on to, but I cashed it. After that we exchanged a couple of emails, though, and I even sent him a game T-shirt. I wonder if it’s still wedged at the bottom of his dresser drawer.

Read 11 comments and join the conversation 

Video Palette Setup

I’ve been thinking some more about the best way to set up the palette for BMOW’s video system. It’s the only part of the design that I’m not really happy with. There are a number of different options I’m considering. Here’s a summary of each one, along with a sample palette image showing all the possible colors. I’ve tried to arrange the colors in each sample in the most pleasing way, but try not to let the ordering of the colors sway your opinion.

8-bit palette entries, RRRGGGBB format

RRRGGBB palette sampleAlternative RRRGGBB palette sampleThis is the current plan. Two alternative orderings of the same 256 colors are shown here. Each entry is a single byte, with 3 bits of red, 3 bits of green, and 2 bits of blue. It’s simple to understand and build, and requires two 3-bit DACs (digital to analog converter) and one 2-bit DAC. There are 256 possible colors, and the division of bits between red, green, and blue is asymmetric. Only 8 or 4 intensities of any given hue are possible. For grayscale, it’s limited to either 4 shades of gray, or eight shades where some “grays” have too much or too little blue, since there’s less precision for blue than for red and green.

8-bit palette entries, HHHIIIII format

HHHIIIII palette sampleEach entry is a single byte, with three bits of hue and five bits of intensity. The three hue (H) bits are used to switch the R, G, and B color outputs on or off. Then for any color outputs that are on, the five intensity (I) bits define the intensity for all of them. There are still only 256 possible colors, but it’s symmetric in that all the color channels are treated identically, and it permits 32 intensities per hue. It requires a single 5-bit DAC whose output is used for all three color channels, and three transistors to switch the analog outputs on and off according to the H bits (I think this would work– I need to sketch up an appropriate analog circuit). The big drawback is that there are only 8 hues, so some colors like orange and brown can’t be rendered. Also, 39 of the 256 possible colors are black, which seems wasteful. That’s the entire top row and left column in the sample image.

8-bit palette entries, HHHHIIII format

HHHHIIII palette sampleIn theory this would yield 16 hues, with 16 intensities per hue, which sounds like a decent compromise. In practice, I can’t think of a simple way to define 16 hues from the four H bits. Three bits of H are easy to treat as on/off controls for the R, G, and B channels, but four bits? If anyone can think of an obvious way to map eight bits of HHHHIIII to individual R, G, B outputs, I’d love to hear it. The image shows one possible mapping, but it’s a big switch statement in the program I wrote to generate the sample images, and I’m not sure how I’d implement it efficiently in hardware. Certainly some extra chips would be needed in the output logic.

8-bit palette entries, RRGGBBSS format

RRGGBBSS palette sampleEach entry is one byte, with four bits each for R, G, and B. However, the least-significant two bits for each color are shared among all the colors. There are 256 possible colors in total. They consist of 64 basic hues, and each hue can be biased brighter by 4 intensity levels, with the total intensity bias being as much as one-quarter of the total black-to-white intensity range. It requires four 4-bit DACs, but no other special output logic.

8-bit palette entries, RRGGBBII format

RRGGBBII palette sampleThis is very similar to RRGGBBSS format, except that the extra two bits are used to set the overall intensity of the pixel, rather than as the least significant bits of the color data. Each of the 64 primary hues can be rendered at 25%, 50%, 75%, or 100% of normal intensity. It’s a similar idea to the extra half-brite mode from the Amiga. The advantage over RRGGBBSS format is that there’s more fidelity among the darker colors. There is some redundancy in the palette, however: red = 01 (binary) with 100% intensity is the same as red = 10 (binary) with 50% intensity. I’m not sure what the best way to implement this in hardware would be. Possibly as a 2-bit intensity DAC, whose output is used as the reference voltage for three more 2-bit color DACs.

15-bit palette entries, RRRRRGGGGGBBBBB format

RRRRRGGGGGBBBBB palette sampleEach entry is two bytes, with five bits each for R, G, and B. There are 32768 possible colors. It requires three 5-bit DACs. Since palette entries are twice as large as the 8-bit formats, it also requires two palette RAMs arranged in parallel. Storing the data in a single RAM that’s twice as big won’t work, because all 15 bits for the pixel need to be output simultaneously. It also requires two output registers to latch the output of the palette RAM, instead of one. Overall this would provide the best-looking results, but require the most hardware, and double the amount of palette data that the CPU needs to generate.

Read 16 comments and join the conversation 

Notes on Video

The high-level design of the BMOW video system is mostly complete now, but is scattered across eight pages of scrawled notes and the inside of my head. It’s proving to be a complex beast, and there are still many smaller details I need to resolve, but I think I’m finished with the worst of it. Hans Summers’ custom-built Z80 video circuit design validated my original approach, and provided a few key ideas for getting more flexibility from the same amount of hardware.

My design has a screen resolution of 320×240, and devotes 64 bytes of video memory to each scan line. Of these, 40 are used to create the visible portion of the line, and 24 are unused. This wastes a considerable amount of memory, but it simplifies the hardware, and memory is cheap. The simplest case is text mode, in which each of the 40 bytes translates to a character to display, and each character is composed of an 8-pixel wide black-and-white bitmap, so 40*8 = 320 and the screen gets filled with text. Here the resolution is 1-bit-per-pixel, as each of those 320 pixels is one of two colors.

Things get more interesting in the graphics modes. I’m especially happy that I found a way to support the 1bpp text mode as well as 2bpp, 4bpp, and 8bpp graphics modes all through software changes to the color palette, without any hardware differences for the different bit depths. At the same time, I was also able to eliminate the shift register that most video generators need for shifting out the bits in a byte to create pixel data. Again, the answer is the color palette. Here’s a diagram:

palette diagram

At the left, bytes are clocked in from video RAM or the character generator. For text mode they’re clocked at one byte every 8 pixels, and for graphics modes bytes are clocked at one every 4 pixels. The byte is then passed to the palette RAM as part of its address, along with the lowest three bits of the column count. Four bits of palette select are also provided, so the software can quickly change between several different ones. Think of the RAM as a lookup table, that determines the color to display as a function of the data byte and the three bits of column count. Using those inputs, it’s possible to create 8bpp, 4bpp, 2bpp, or 1bpp just by storing different color data in the palette. For each increase in bit depth, the pixels get twice as wide, so you can get 320 pixels @ 2bpp, 160 @ 4bpp, or 80 @ 8bpp.

Here’s how it works. For 1bpp operation during text mode, you set the palette so that the three column bits select one of eight “sub-palettes”, corresponding to the eight columns. Each of those sub-palettes then contains the value for white whenever the bit in the corresponding column is set, and black otherwise. No shift register is needed to create the eight output pixels. The palette looks like this (X means “don’t care”):

column[2:0] data byte color
000 XXXXXXX0 black
000 XXXXXXX1 white
001 XXXXXX0X black
001 XXXXXX1X white
010 XXXXX0XX black
010 XXXXX1XX white
011 XXXX0XXX black
011 XXXX1XXX white
100 XXX0XXXX black
100 XXX1XXXX white
101 XX0XXXXX black
101 XX1XXXXX white
110 X0XXXXXX black
110 X1XXXXXX white
111 0XXXXXXX black
111 1XXXXXXX white

For 2bpp, you create four sub-palettes, selected by the two higher-order column bits. Each of these contains one of four color entries, depending on the pair of data bits corresponding to the column count. Similarly for 4bpp to select one of 16 colors. For 8bpp, the column bits don’t matter, and the entire 8-bit byte is used to select one of 256 colors.

If this seems confusing, another way of viewing it is that the horizontal resolution is always 320, and the hardware outputs four pixels at a time. Those pixels can all be different colors, or pairs of identically-colored pixels, or a single 4-pixel block of color.

Since the palette RAM is a normal memory with 8 bits per address, the palette entries are 8 bits wide. They define 3 bits of green, 3 bits of red, and 2 bits of blue color data, which is then latched in an output register, and finally converted to an analog voltage for the VGA monitor. 8-bit palette entries certainly aren’t ideal. There’s more precision for red and green than blue, and the overall precision is poor. In fact for the 8bpp indexed color mode, the indexes are as large as the palette entries themselves, and you could just as easily use them directly as color data, skipping the palette entirely!

I’m hoping that 8-bit palette entries will be good enough for my purposes, but if not, I could add a second palette RAM in parallel with the first, and pass it the same address data. That would effectively create a 16-bit wide memory: enough for 555 or 565 encoding of the RGB color data, which should look pretty good. Unfortunately it would add an additional RAM chip, a second output register, and require doubling the size of the DAC, so it might well be more trouble than it’s worth.

Be the first to comment! 

Video Tests

My loftiest of stretch goals for BMOW is to build a primitive video circuit, and connect the machine to a standard VGA monitor. To that end, I did some prototyping experiments on a breadboard recently to experiment with video generation. This was a completely stand-alone test circuit, not integrated into BMOW’s hardware in any way, but the results look quite promising! I still think it will be a lengthy, challenging project, but BMOW video output now looks likely.

I started with a standard 15-pin VGA connector from Radio Shack, and soldered the necessary wires to it. I then dug through all the technical docs on the VGA signal format I could find. The first hurdle was to build a circuit to generate the hsync and vsync signals at the required frequencies, and with the correct on and off durations for each cycle. I built two counters, using three GALs: a 7-bit horizontal pixel counter, and a 10-bit vertical line counter. The horizontal counter had a range of 127 pixels, with a 4MHz clock (250ns period), for a total duration of 127 * 250ns = 31.75us per line. That’s very close to the VGA spec of 31.77us per line for 640×480@60Hz. The vertical line counter used the horizontal counter’s rollover as its clock, and had a range of 525 lines. Again, that’s the VGA spec for 640×480. The extra lines between 480 and 525 are part of the overscan and vertical retrace, and aren’t displayed.

The sync signals were generated by the GAL counters. Hsync was defined as active during a portion of the 127 horizontal range, and vsync during a portion of the 525 vertical range. Lastly, I defined an “image” signal that was 1 whenever the horizontal counter was 64, and 0 otherwise. Connected to one of the VGA color inputs, that should have created a vertical line down the middle of the screen at pixel 64. Here’s a photo of my test circuit, including a ROM (stolen from BMOW) that I added during a later test:

video test circuit

Unfortunately when I connected this Frankenstein up to a monitor, it did… nothing. It acted as if nothing were connected at all. I tried a second monitor and got the same results. Then I spent a long time rereading all the VGA signal documents again, and scratching my head, but I couldn’t figure out what was wrong. I even connected the circuit up to the oscilloscope, and verified that the hysnc and vsync signals looked as I expected. But no video = no progress.

One thing I’d been wondering about was the relationship of hsync to vsync. The spec defined the requirements for both, but I never found anything that talked about one relative to the other. Should hysnc and vsync both become active simultaneously, at the end of a frame? Or should they be ought of phase? On a hunch, I tried altering the circuit slightly so that hsync would be asserted in the middle of vsync, rather than simultaneously with its beginning or end, and… Eureka!

video test 1

Holy crap, video! My circuit was generating honest to God VGA video, at the completely oddball resolution of 127×480. What’s not visible in the photo is the tremendous amount of noise in the image outside that green line, so there were some signal quality issues to contend with, but it was clearly working.

From there I began to experiment. Since I already had a 7-bit horizontal counter, why not try connecting the R, G, and B color inputs to the bits of the counter?

video test 2

Or connect some of them to bits of the horizontal counter, and others to bits of the vertical counter?

video test 3

So far, so good. But to do anything useful, I needed to generate an image from data stored in memory, not just use bits from a counter. Luckily that proved to be pretty easy: I just used the 7-bit horizontal counter and 10-bit vertical counter to form a 17-bit address to a ROM, and then used bits from the ROM’s output data for the RGB channels. By sheer coincidence, the BMOW ROMs are 17-bit (128K), so it worked out perfectly. Here’s what it looked like, using one of the BMOW ROMs as the image source:

video test 4

Mmm, noise! But memory-mapped noise, which meant I was getting close to something good. Now all I needed was a way to generate a 127×525 image as a contiguous, uncompressed file, so I could program it into the ROM for display. The 256-color BMP format proved to be ideal, as it’s uncompressed, and has a 1087 byte header that can be stripped out using a hex editor, leaving behind only the image pixels themselves, one byte per pixel. So I fired up the Windows Paint program, drew some awesome art using the pencil tool, saved it as BMP and stripped the header, and…

video test 5

Argh, what the heck?! Are BMPs stored backwards? So close, just one more try…

video test 6

Ta-da! Memory-mapped video generation, using a 5-component test circuit.

All this looks promising, but there’s still lots more I’d need to do to get a useful integration with BMOW:

  • There’s a LOT of noise in the image. In the last photo you can see a few stray vertical lines, in reality the image looks much worse. Grounding problems? Signal cross-talk?
  • The horizontal resolution of 127 is too low to be useful. To get even 40 columns of text, with each letter 6 pixels wide, I need a horizontal resolution of 240. But higher resolutions require faster memory and other components, to get the pixel data quickly enough.
  • Speaking of text, I probably need a text mode, with a character ROM that generates font glyph pixel data indirectly from a single ASCII character byte, in addition to the direct mode used in this test. That would require much less work from the CPU when displaying text on the screen.
  • Horizontal and vertical blank signals are probably needed, in addition to hsync and vsync. These would be used to supress all data on the RGB channels during the portion of each line and frame that shouldn’t be visible. Failure to do this seems to confuse the monitor.
  • The memory-mapped video image needs to be stored in RAM, not ROM, so it can be updated dynamically by the CPU.
  • Some way to arbitrate access to the video RAM between the CPU and the display circuit is needed.
  • A D-to-A converter is needed in order to get more than 1 bit per channel for red, green, and blue (8 colors total).
  • The finished system, including components for all the functions not part of my test circuit, has to squeeze into the space remaining on the BMOW system board. That’s space for maybe 15 components, depending on their size. Ideally, it would still leave some space for audio circuitry.

It’s a tall order, but I’m pretty confident I can do it. I’ll post more updates whenever I have interesting news to report. Meanwhile, any suggestions on how best to approach the remaining design work are very welcome!

Read 19 comments and join the conversation 

Multitasking Success

After a LOT of fiddling, I finally pulled off a successful multitasking demo. It’s not the most exciting thing in the world to watch, but once you understand what’s happening, you’ll appreciate the nerd factor. The photo below shows BMOW powering-up and starting two processes, which are then multitasked to create the appearance of simultaneous execution. First the boot code prints “boot” to prove that the machine is alive, and then a few punctuation marks that serve as status codes while it’s initializing the two processes. Following that is the output from the processes themselves. One prints the letters A-M, and the other prints the digits 0-9.

multitasking demo shot

The two processes have different delay lengths between each printed character, which is why the sequence doesn’t appear strictly as letter-number-letter-number. The differing delays were intended to help expose race conditions: wait long enough, and the two processes should attempt to print to the LCD at the same time, exposing any possible bugs with resource contention.

How it Works

BMOW multitasking required writing a simple process manager and scheduler, and takes advantage of the machine’s semi-protected memory banks. BMOW memory is organized into banks of 64K each, with the OS occupying bank 0, and other processes each occupying a private bank. Code executing outside of bank 0 is unable to directly view or modify memory in other banks. It can only invoke specific system procedures to do such work on its behalf, using the JSP instruction (jump to system procedure).

At power-on time, the boot code copies the letters program into bank 1, and the numbers program into bank 2. It then calls a scheduler routine to add process table entries for the two processes. This routine sets up some initial stack data in the appropriate bank, to mimic what would be on the stack if the process had already been running, but was suspended by an interrupt. It then initializes the process table entry to contain the process’ stack pointer value, and marks that table entry as active. Finally, the boot code sets the real-time clock to generate an interrupt every 2 ms, and halts.

Two milliseconds later, the first interrupt is triggered. The interrupt service routine first pushes the current register contents and program counter value onto the stack. Since each process (each bank) has a separate stack, this means the registers are saved onto the stack of whatever process was interrupted. Then the ISR checks to see if it’s a timer interrupt (it could also be keyboard or USB). If it’s a timer interrupt, it calls a scheduler routine to find a new process to run.

The scheduler stores the value of the stack pointer into the current process’ table entry. Then it scans the process table, beginning at the next entry after the current process, looking for active entries. When it finds one, it copies the value in the process table entry into the stack pointer, and then executes the RTI instruction (return from interrupt). This instruction pops the program counter and register contents off the stack, and resumes execution of the new process where it left off.

Each process knows nothing about the other processes, and thinks that it’s in complete control of a 64K machine. From the point of view of the process, it can’t even tell that anything special happens when it’s suspended and restored, except that some time has passed that can’t be accounted for. It’s a bit like an alien abduction for a BMOW process.

Observant readers may have wondered: where do the letters and numbers programs come from? On a typical computer, the OS loads and runs programs from disk or some other storage medium, but BMOW has none. For this demo, the programs are compiled separately, and then embedded into the OS itself, which is stored in ROM. The programs appear as data to the OS, and it just blindly copies their bytes into the appropriate memory banks. This works, but obviously isn’t an ideal solution, because it ties the OS and the demo programs together into a single unit. It’s a reasonable work-around until BMOW gets some storage medium that can be altered independently of the OS boot ROM.

The Road from Here to There

Planning everything out and writing the scheduler code was actually pretty simple. I had the multitasking demo working fine on the BMOW simulator a week ago, but on the real hardware, none of the processes would start. Then I procrastinated looking into it further, and built the Robo-Flower instead (see previous post).

It turned out that there were multiple hardware problems to overcome. After a lot of experimentation, I was able to determine that the processes weren’t starting because their program codes weren’t correctly copied into their memory banks. A bug in the memory copy routine, maybe? But then why wouldn’t it show up in the simulator? I tried writing a simpler memory copy routine just for this purpose, and it worked, but I was stumped as to why, and the simpler routine wouldn’t work in the general case.

Finally I noticed that my original copy routine was doing an indirect load through a pointer that happened to be stored at an address ending in 7F hex. Reading the two byte pointer required incrementing the address from 7F to 80 in order to read the high byte. On a hunch, I changed the program slightly to store the pointer at 80 instead (incrementing to 81 for the high byte), and it worked!

This looked suspiciously like a very nasty clock glitch problem I ran into a few months ago, that laid waste to BMOW for a while. In that case, the low byte of the program counter would sometimes increment from FF to 01, or from 7F to 81, as if it were being incremented twice. Now I had the address register doing the same thing. I “fixed” the problem with the PC by making a nonsensical change to the GAL equations that implemented it, but I never really resolved the underlying issue. Unfortunately when I made the same change to the address register GAL equations, it didn’t help.

Back when the PC incrementing problem occurred, I was pretty sure it was due to some noise or signal reflection on the clock wire. I tried a couple of quick signal termination tests, but they didn’t help, and I didn’t follow-up at the time. Now I was considering more signal termination experiments again, when it occurred to me to wonder why I had my clock signal daisy-chained to so many chips to begin with. The clock signal comes from a buffer with eight outputs, so I could easily run a new clock line to some of the chips, shortening the existing clock line, and hopefully reducing noise and reflections. All it took were a few minutes with the wire wrap tool, and it worked like a champ! I don’t know why I didn’t do that in the first place.

I wasn’t quite out of the woods yet, though. The two processes ran now, but the output was a mess. Every 10-20 characters was some garbage symbol rather than a letter or number, and the display didn’t word-wrap properly. After an initial heart attack, I quickly realized this was likely the result of a bug in the microcode for BRK (break) that I fixed a while back, where the X register wasn’t correctly preserved across interrupts. Although I’d made a fix in the microcode source a few weeks ago, I’d postponed reprogramming the micro-ROMs, since every time I touch them something bad seems to happen. This time proved to be no exception: I reprogrammed the micro-ROMs, powered up the machine, and it failed to boot at all. After a second heart attack, I tried pushing and prodding the micro-ROMs until BMOW came back to life, which it eventually did with successful results. Obviously there’s a loose connection somewhere on one of the micro-ROM pins, which will undoubtedly come back to haunt me later.

Future Plans

I’m pretty happy with how the multitasking demo worked out, and I can now check off one of the “stretch goals” listed on the About BMOW page. To really finish the multitasking support, though, there’s still a fair bit of work left to be done. For one, there’s no way to stop a process once it’s started! There’s also no way to get info about running processes. A lot of time is wasted in the scheduler and the system procedure interface that could be optimized. There’s no test-and-set atomic operation that can be used as a semaphore. There’s no dedicated stack for interrupt handling: it uses the interrupted process’ stack, which means if that process had an invalid SP, the whole machine will crash.

Really the whole JSP (jump to system procedure) interface for calling OS routines needs to be reworked. JSP is implemented as a jump to an address in bank 0. Supply a bad address, and it will crash. This should really be replaced by some kind of indexed trap system, where JSP takes a trap number as a parameter rather than an address.

At this point, I’m going to archive what I’ve got, and then turn BMOW back into a single-process machine. Without a windowed graphical display or multiple output terminals, there’s not really any compelling use case for multitasking. But running only a single process on a multitasking-capable machine is very inefficient, at least how I’ve implemented it. The scheduler overhead and the indirection of the JSP interface eat up a lot of time that could otherwise be better-spent by the single process.

I’ll try to avoid doing anything major that would prevent switching back to multi-tasking in the future, so for now I’ll just disable the scheduler interrupt, and maybe provide some lighter-weight interface to the OS routines. I’ll keep the 64K bank model for the time being, since I don’t yet have any examples that would benefit from more than 64K, and changing it would be non-trivial. In particular I’d need to change the microcode for all the instructions to expect 3-byte addresses instead of 2-byte ones, which would add an extra clock cycle to the execution time of any instruction that manipulates an address. Worse, certain address-manipulation instructions might become impossible for 3-byte addresses unless I added a second temporary register somehow. So I’ll push those problems off into the future, and concentrate on other things.

Read 1 comment and join the conversation 

User Mode Programs

I made a little progress this week on separating what I hope will eventually become the operating system software from everything else. Up until now, all the test programs I’ve written have essentially been the operating system, or part of it: their executable code exists in ROM, they execute in memory bank 0, and they make direct subroutine calls to library routines for handling the keyboard and LCD. I finally cleaned this up a little, and got my first working example of a user program, separate from the OS. It just prints the letter “A” endlessly and doesn’t make for an exciting demo, but it’s a start.

Since there’s no disk or other storage media yet, the user program was actually embedded in the OS boot code, as a piece of data. At boot-up, the OS copies the user program into memory bank 1, which is RAM instead of ROM, and then jumps to it to begin execution. The program then executes inside of bank 1, and has no visibility or access to other banks. In order to access hardware like the keyboard or LCD, it uses a new instruction I added named JSP, or “jump to system procedure”. This instruction takes a two byte immediate argument, which indicates the subsystem (LCD, keyboard, clock, etc) and procedure to be invoked. The microcode for this instruction switches to bank 0, and then uses the arguments to lookup into a jump table to find the address of a system procedure handler routine to invoke. The handler does the requested work, and a RSP instruction (return from system procedure) returns control to the original program in its original bank.

The idea is that JSP should allow user programs to access hardware resources in a controlled way, without doing a jump directly to some bank 0 handler address, since handlers might change location between versions of the OS. Jumping to the wrong bank 0 address might also cause the OS to write invalid data or crash in the worst case. Of course the trade-off for this extra layer of protection is that access to hardware resources is more cumbersome and slow.

This is all still executing as a single process for now: The OS program starts, sets up the user program, and jumps to it. Once that happens, there is no more OS program running. However, the next logical step is to add an interrupt handler to swap between several user programs. In theory this shouldn’t be difficult: just store a table of saved registers for each process somewhere in bank 0. Whenever a timer interrupt occurs, the handler would copy the current register contents into an entry in the table, fill the registers using another entry in the table, and then return. I think the harder part is actually thinking of two programs that would make sense to run at the same time on BMOW. Multiprocessing makes a lot more sense on a graphical display with windows than on a text screen. With only one LCD, output from two programs would be jumbled together. Maybe I could make one program that uses the keyboard and LCD for I/O, while another runs simultaneously using the USB interface. Or maybe someone here can suggest a more impressive multiprocessing demo?

Read 2 comments and join the conversation 

« Newer PostsOlder Posts »