BMOW title
Floppy Emu banner

Archive for June, 2008

Notes on Video

The high-level design of the BMOW video system is mostly complete now, but is scattered across eight pages of scrawled notes and the inside of my head. It’s proving to be a complex beast, and there are still many smaller details I need to resolve, but I think I’m finished with the worst of it. Hans Summers’ custom-built Z80 video circuit design validated my original approach, and provided a few key ideas for getting more flexibility from the same amount of hardware.

My design has a screen resolution of 320×240, and devotes 64 bytes of video memory to each scan line. Of these, 40 are used to create the visible portion of the line, and 24 are unused. This wastes a considerable amount of memory, but it simplifies the hardware, and memory is cheap. The simplest case is text mode, in which each of the 40 bytes translates to a character to display, and each character is composed of an 8-pixel wide black-and-white bitmap, so 40*8 = 320 and the screen gets filled with text. Here the resolution is 1-bit-per-pixel, as each of those 320 pixels is one of two colors.

Things get more interesting in the graphics modes. I’m especially happy that I found a way to support the 1bpp text mode as well as 2bpp, 4bpp, and 8bpp graphics modes all through software changes to the color palette, without any hardware differences for the different bit depths. At the same time, I was also able to eliminate the shift register that most video generators need for shifting out the bits in a byte to create pixel data. Again, the answer is the color palette. Here’s a diagram:

palette diagram

At the left, bytes are clocked in from video RAM or the character generator. For text mode they’re clocked at one byte every 8 pixels, and for graphics modes bytes are clocked at one every 4 pixels. The byte is then passed to the palette RAM as part of its address, along with the lowest three bits of the column count. Four bits of palette select are also provided, so the software can quickly change between several different ones. Think of the RAM as a lookup table, that determines the color to display as a function of the data byte and the three bits of column count. Using those inputs, it’s possible to create 8bpp, 4bpp, 2bpp, or 1bpp just by storing different color data in the palette. For each increase in bit depth, the pixels get twice as wide, so you can get 320 pixels @ 2bpp, 160 @ 4bpp, or 80 @ 8bpp.

Here’s how it works. For 1bpp operation during text mode, you set the palette so that the three column bits select one of eight “sub-palettes”, corresponding to the eight columns. Each of those sub-palettes then contains the value for white whenever the bit in the corresponding column is set, and black otherwise. No shift register is needed to create the eight output pixels. The palette looks like this (X means “don’t care”):

column[2:0] data byte color
000 XXXXXXX0 black
000 XXXXXXX1 white
001 XXXXXX0X black
001 XXXXXX1X white
010 XXXXX0XX black
010 XXXXX1XX white
011 XXXX0XXX black
011 XXXX1XXX white
100 XXX0XXXX black
100 XXX1XXXX white
101 XX0XXXXX black
101 XX1XXXXX white
110 X0XXXXXX black
110 X1XXXXXX white
111 0XXXXXXX black
111 1XXXXXXX white

For 2bpp, you create four sub-palettes, selected by the two higher-order column bits. Each of these contains one of four color entries, depending on the pair of data bits corresponding to the column count. Similarly for 4bpp to select one of 16 colors. For 8bpp, the column bits don’t matter, and the entire 8-bit byte is used to select one of 256 colors.

If this seems confusing, another way of viewing it is that the horizontal resolution is always 320, and the hardware outputs four pixels at a time. Those pixels can all be different colors, or pairs of identically-colored pixels, or a single 4-pixel block of color.

Since the palette RAM is a normal memory with 8 bits per address, the palette entries are 8 bits wide. They define 3 bits of green, 3 bits of red, and 2 bits of blue color data, which is then latched in an output register, and finally converted to an analog voltage for the VGA monitor. 8-bit palette entries certainly aren’t ideal. There’s more precision for red and green than blue, and the overall precision is poor. In fact for the 8bpp indexed color mode, the indexes are as large as the palette entries themselves, and you could just as easily use them directly as color data, skipping the palette entirely!

I’m hoping that 8-bit palette entries will be good enough for my purposes, but if not, I could add a second palette RAM in parallel with the first, and pass it the same address data. That would effectively create a 16-bit wide memory: enough for 555 or 565 encoding of the RGB color data, which should look pretty good. Unfortunately it would add an additional RAM chip, a second output register, and require doubling the size of the DAC, so it might well be more trouble than it’s worth.

Be the first to comment! 

Video Tests

My loftiest of stretch goals for BMOW is to build a primitive video circuit, and connect the machine to a standard VGA monitor. To that end, I did some prototyping experiments on a breadboard recently to experiment with video generation. This was a completely stand-alone test circuit, not integrated into BMOW’s hardware in any way, but the results look quite promising! I still think it will be a lengthy, challenging project, but BMOW video output now looks likely.

I started with a standard 15-pin VGA connector from Radio Shack, and soldered the necessary wires to it. I then dug through all the technical docs on the VGA signal format I could find. The first hurdle was to build a circuit to generate the hsync and vsync signals at the required frequencies, and with the correct on and off durations for each cycle. I built two counters, using three GALs: a 7-bit horizontal pixel counter, and a 10-bit vertical line counter. The horizontal counter had a range of 127 pixels, with a 4MHz clock (250ns period), for a total duration of 127 * 250ns = 31.75us per line. That’s very close to the VGA spec of 31.77us per line for 640×480@60Hz. The vertical line counter used the horizontal counter’s rollover as its clock, and had a range of 525 lines. Again, that’s the VGA spec for 640×480. The extra lines between 480 and 525 are part of the overscan and vertical retrace, and aren’t displayed.

The sync signals were generated by the GAL counters. Hsync was defined as active during a portion of the 127 horizontal range, and vsync during a portion of the 525 vertical range. Lastly, I defined an “image” signal that was 1 whenever the horizontal counter was 64, and 0 otherwise. Connected to one of the VGA color inputs, that should have created a vertical line down the middle of the screen at pixel 64. Here’s a photo of my test circuit, including a ROM (stolen from BMOW) that I added during a later test:

video test circuit

Unfortunately when I connected this Frankenstein up to a monitor, it did… nothing. It acted as if nothing were connected at all. I tried a second monitor and got the same results. Then I spent a long time rereading all the VGA signal documents again, and scratching my head, but I couldn’t figure out what was wrong. I even connected the circuit up to the oscilloscope, and verified that the hysnc and vsync signals looked as I expected. But no video = no progress.

One thing I’d been wondering about was the relationship of hsync to vsync. The spec defined the requirements for both, but I never found anything that talked about one relative to the other. Should hysnc and vsync both become active simultaneously, at the end of a frame? Or should they be ought of phase? On a hunch, I tried altering the circuit slightly so that hsync would be asserted in the middle of vsync, rather than simultaneously with its beginning or end, and… Eureka!

video test 1

Holy crap, video! My circuit was generating honest to God VGA video, at the completely oddball resolution of 127×480. What’s not visible in the photo is the tremendous amount of noise in the image outside that green line, so there were some signal quality issues to contend with, but it was clearly working.

From there I began to experiment. Since I already had a 7-bit horizontal counter, why not try connecting the R, G, and B color inputs to the bits of the counter?

video test 2

Or connect some of them to bits of the horizontal counter, and others to bits of the vertical counter?

video test 3

So far, so good. But to do anything useful, I needed to generate an image from data stored in memory, not just use bits from a counter. Luckily that proved to be pretty easy: I just used the 7-bit horizontal counter and 10-bit vertical counter to form a 17-bit address to a ROM, and then used bits from the ROM’s output data for the RGB channels. By sheer coincidence, the BMOW ROMs are 17-bit (128K), so it worked out perfectly. Here’s what it looked like, using one of the BMOW ROMs as the image source:

video test 4

Mmm, noise! But memory-mapped noise, which meant I was getting close to something good. Now all I needed was a way to generate a 127×525 image as a contiguous, uncompressed file, so I could program it into the ROM for display. The 256-color BMP format proved to be ideal, as it’s uncompressed, and has a 1087 byte header that can be stripped out using a hex editor, leaving behind only the image pixels themselves, one byte per pixel. So I fired up the Windows Paint program, drew some awesome art using the pencil tool, saved it as BMP and stripped the header, and…

video test 5

Argh, what the heck?! Are BMPs stored backwards? So close, just one more try…

video test 6

Ta-da! Memory-mapped video generation, using a 5-component test circuit.

All this looks promising, but there’s still lots more I’d need to do to get a useful integration with BMOW:

  • There’s a LOT of noise in the image. In the last photo you can see a few stray vertical lines, in reality the image looks much worse. Grounding problems? Signal cross-talk?
  • The horizontal resolution of 127 is too low to be useful. To get even 40 columns of text, with each letter 6 pixels wide, I need a horizontal resolution of 240. But higher resolutions require faster memory and other components, to get the pixel data quickly enough.
  • Speaking of text, I probably need a text mode, with a character ROM that generates font glyph pixel data indirectly from a single ASCII character byte, in addition to the direct mode used in this test. That would require much less work from the CPU when displaying text on the screen.
  • Horizontal and vertical blank signals are probably needed, in addition to hsync and vsync. These would be used to supress all data on the RGB channels during the portion of each line and frame that shouldn’t be visible. Failure to do this seems to confuse the monitor.
  • The memory-mapped video image needs to be stored in RAM, not ROM, so it can be updated dynamically by the CPU.
  • Some way to arbitrate access to the video RAM between the CPU and the display circuit is needed.
  • A D-to-A converter is needed in order to get more than 1 bit per channel for red, green, and blue (8 colors total).
  • The finished system, including components for all the functions not part of my test circuit, has to squeeze into the space remaining on the BMOW system board. That’s space for maybe 15 components, depending on their size. Ideally, it would still leave some space for audio circuitry.

It’s a tall order, but I’m pretty confident I can do it. I’ll post more updates whenever I have interesting news to report. Meanwhile, any suggestions on how best to approach the remaining design work are very welcome!

Read 19 comments and join the conversation 

Multitasking Success

After a LOT of fiddling, I finally pulled off a successful multitasking demo. It’s not the most exciting thing in the world to watch, but once you understand what’s happening, you’ll appreciate the nerd factor. The photo below shows BMOW powering-up and starting two processes, which are then multitasked to create the appearance of simultaneous execution. First the boot code prints “boot” to prove that the machine is alive, and then a few punctuation marks that serve as status codes while it’s initializing the two processes. Following that is the output from the processes themselves. One prints the letters A-M, and the other prints the digits 0-9.

multitasking demo shot

The two processes have different delay lengths between each printed character, which is why the sequence doesn’t appear strictly as letter-number-letter-number. The differing delays were intended to help expose race conditions: wait long enough, and the two processes should attempt to print to the LCD at the same time, exposing any possible bugs with resource contention.

How it Works

BMOW multitasking required writing a simple process manager and scheduler, and takes advantage of the machine’s semi-protected memory banks. BMOW memory is organized into banks of 64K each, with the OS occupying bank 0, and other processes each occupying a private bank. Code executing outside of bank 0 is unable to directly view or modify memory in other banks. It can only invoke specific system procedures to do such work on its behalf, using the JSP instruction (jump to system procedure).

At power-on time, the boot code copies the letters program into bank 1, and the numbers program into bank 2. It then calls a scheduler routine to add process table entries for the two processes. This routine sets up some initial stack data in the appropriate bank, to mimic what would be on the stack if the process had already been running, but was suspended by an interrupt. It then initializes the process table entry to contain the process’ stack pointer value, and marks that table entry as active. Finally, the boot code sets the real-time clock to generate an interrupt every 2 ms, and halts.

Two milliseconds later, the first interrupt is triggered. The interrupt service routine first pushes the current register contents and program counter value onto the stack. Since each process (each bank) has a separate stack, this means the registers are saved onto the stack of whatever process was interrupted. Then the ISR checks to see if it’s a timer interrupt (it could also be keyboard or USB). If it’s a timer interrupt, it calls a scheduler routine to find a new process to run.

The scheduler stores the value of the stack pointer into the current process’ table entry. Then it scans the process table, beginning at the next entry after the current process, looking for active entries. When it finds one, it copies the value in the process table entry into the stack pointer, and then executes the RTI instruction (return from interrupt). This instruction pops the program counter and register contents off the stack, and resumes execution of the new process where it left off.

Each process knows nothing about the other processes, and thinks that it’s in complete control of a 64K machine. From the point of view of the process, it can’t even tell that anything special happens when it’s suspended and restored, except that some time has passed that can’t be accounted for. It’s a bit like an alien abduction for a BMOW process.

Observant readers may have wondered: where do the letters and numbers programs come from? On a typical computer, the OS loads and runs programs from disk or some other storage medium, but BMOW has none. For this demo, the programs are compiled separately, and then embedded into the OS itself, which is stored in ROM. The programs appear as data to the OS, and it just blindly copies their bytes into the appropriate memory banks. This works, but obviously isn’t an ideal solution, because it ties the OS and the demo programs together into a single unit. It’s a reasonable work-around until BMOW gets some storage medium that can be altered independently of the OS boot ROM.

The Road from Here to There

Planning everything out and writing the scheduler code was actually pretty simple. I had the multitasking demo working fine on the BMOW simulator a week ago, but on the real hardware, none of the processes would start. Then I procrastinated looking into it further, and built the Robo-Flower instead (see previous post).

It turned out that there were multiple hardware problems to overcome. After a lot of experimentation, I was able to determine that the processes weren’t starting because their program codes weren’t correctly copied into their memory banks. A bug in the memory copy routine, maybe? But then why wouldn’t it show up in the simulator? I tried writing a simpler memory copy routine just for this purpose, and it worked, but I was stumped as to why, and the simpler routine wouldn’t work in the general case.

Finally I noticed that my original copy routine was doing an indirect load through a pointer that happened to be stored at an address ending in 7F hex. Reading the two byte pointer required incrementing the address from 7F to 80 in order to read the high byte. On a hunch, I changed the program slightly to store the pointer at 80 instead (incrementing to 81 for the high byte), and it worked!

This looked suspiciously like a very nasty clock glitch problem I ran into a few months ago, that laid waste to BMOW for a while. In that case, the low byte of the program counter would sometimes increment from FF to 01, or from 7F to 81, as if it were being incremented twice. Now I had the address register doing the same thing. I “fixed” the problem with the PC by making a nonsensical change to the GAL equations that implemented it, but I never really resolved the underlying issue. Unfortunately when I made the same change to the address register GAL equations, it didn’t help.

Back when the PC incrementing problem occurred, I was pretty sure it was due to some noise or signal reflection on the clock wire. I tried a couple of quick signal termination tests, but they didn’t help, and I didn’t follow-up at the time. Now I was considering more signal termination experiments again, when it occurred to me to wonder why I had my clock signal daisy-chained to so many chips to begin with. The clock signal comes from a buffer with eight outputs, so I could easily run a new clock line to some of the chips, shortening the existing clock line, and hopefully reducing noise and reflections. All it took were a few minutes with the wire wrap tool, and it worked like a champ! I don’t know why I didn’t do that in the first place.

I wasn’t quite out of the woods yet, though. The two processes ran now, but the output was a mess. Every 10-20 characters was some garbage symbol rather than a letter or number, and the display didn’t word-wrap properly. After an initial heart attack, I quickly realized this was likely the result of a bug in the microcode for BRK (break) that I fixed a while back, where the X register wasn’t correctly preserved across interrupts. Although I’d made a fix in the microcode source a few weeks ago, I’d postponed reprogramming the micro-ROMs, since every time I touch them something bad seems to happen. This time proved to be no exception: I reprogrammed the micro-ROMs, powered up the machine, and it failed to boot at all. After a second heart attack, I tried pushing and prodding the micro-ROMs until BMOW came back to life, which it eventually did with successful results. Obviously there’s a loose connection somewhere on one of the micro-ROM pins, which will undoubtedly come back to haunt me later.

Future Plans

I’m pretty happy with how the multitasking demo worked out, and I can now check off one of the “stretch goals” listed on the About BMOW page. To really finish the multitasking support, though, there’s still a fair bit of work left to be done. For one, there’s no way to stop a process once it’s started! There’s also no way to get info about running processes. A lot of time is wasted in the scheduler and the system procedure interface that could be optimized. There’s no test-and-set atomic operation that can be used as a semaphore. There’s no dedicated stack for interrupt handling: it uses the interrupted process’ stack, which means if that process had an invalid SP, the whole machine will crash.

Really the whole JSP (jump to system procedure) interface for calling OS routines needs to be reworked. JSP is implemented as a jump to an address in bank 0. Supply a bad address, and it will crash. This should really be replaced by some kind of indexed trap system, where JSP takes a trap number as a parameter rather than an address.

At this point, I’m going to archive what I’ve got, and then turn BMOW back into a single-process machine. Without a windowed graphical display or multiple output terminals, there’s not really any compelling use case for multitasking. But running only a single process on a multitasking-capable machine is very inefficient, at least how I’ve implemented it. The scheduler overhead and the indirection of the JSP interface eat up a lot of time that could otherwise be better-spent by the single process.

I’ll try to avoid doing anything major that would prevent switching back to multi-tasking in the future, so for now I’ll just disable the scheduler interrupt, and maybe provide some lighter-weight interface to the OS routines. I’ll keep the 64K bank model for the time being, since I don’t yet have any examples that would benefit from more than 64K, and changing it would be non-trivial. In particular I’d need to change the microcode for all the instructions to expect 3-byte addresses instead of 2-byte ones, which would add an extra clock cycle to the execution time of any instruction that manipulates an address. Worse, certain address-manipulation instructions might become impossible for 3-byte addresses unless I added a second temporary register somehow. So I’ll push those problems off into the future, and concentrate on other things.

Read 1 comment and join the conversation 

Robo-Flower

I recently took a break from BMOW’s multi-tasking support to work on something completely different. (Actually, I have a multi-tasking demo working in the simultator, but it doesn’t work on the real hardware yet.) The “something different” was a robo-flower: a pulsing light powered from solar energy that’s made to look like a flower from Junkyard Wars. The circuit was based on this Make Magazine article, but the construction design is my own. Here’s a photo of the result:

Robo-Flower

During the day, the solar cell charges the main storage capacitor. At night, the storage capacitor powers the circuit, making the LED in the flower’s center pulse organically every few seconds. I followed the circuit design from the Make article, with these exceptions:

  • Replaced the 1000uF “pumm” capacitor (C3) with a 3300uF one. This was one of the suggested alterations.
  • Replaced the rechargable battery with a 1F “super capacitor”. A 10F capacitor was a suggested alteration, but I couldn’t find one.
  • Where their diagram shows resistor R3 connected to the positive power supply, I instead connected it through a diode to the negative end of the LED. This makes the LED turn off all the way at the end of each pumm, which I think looks better.

The robo-flower works, but I’m pretty disappointed with how it turned out. For one thing, the pulses of light are surprisingly dim, compared to what I’d expected. They reach a brightness I’d call “on”, but not bright. If you look at the circuit diagram, you’ll see that the pulses come from the 3300uF capacitor (charged to about 3v) discharging through an LED. With no current-limiting resistor, there should be a brief pulse of high current = high brightness, but that’s not what I’m seeing.

The bigger disappointment is that even after leaving the flower in the sun until the main 1F storage capacitor is fully charged (as measured with a voltmeter), the pulsing only lasts for about 15 minutes after the sun’s illumination is lost. That’s pretty lame. Other designs that use rechargable batteries pulse nearly all night, I think, so I was expecting much more. Yes, it’s a capacitor powering the circuit and not a battery, but 1F is a mighty-big capacitance.

If I get motivated enough, I may build another copy of the circuit on a protoboard, so I can experiment with it further, or measure the voltage at various points on the oscilloscope, to see if I can’t improve the behavior. Otherwise it’ll be one more piece of odd electronics junk littering my home. 🙂

Be the first to comment! 

User Mode Programs

I made a little progress this week on separating what I hope will eventually become the operating system software from everything else. Up until now, all the test programs I’ve written have essentially been the operating system, or part of it: their executable code exists in ROM, they execute in memory bank 0, and they make direct subroutine calls to library routines for handling the keyboard and LCD. I finally cleaned this up a little, and got my first working example of a user program, separate from the OS. It just prints the letter “A” endlessly and doesn’t make for an exciting demo, but it’s a start.

Since there’s no disk or other storage media yet, the user program was actually embedded in the OS boot code, as a piece of data. At boot-up, the OS copies the user program into memory bank 1, which is RAM instead of ROM, and then jumps to it to begin execution. The program then executes inside of bank 1, and has no visibility or access to other banks. In order to access hardware like the keyboard or LCD, it uses a new instruction I added named JSP, or “jump to system procedure”. This instruction takes a two byte immediate argument, which indicates the subsystem (LCD, keyboard, clock, etc) and procedure to be invoked. The microcode for this instruction switches to bank 0, and then uses the arguments to lookup into a jump table to find the address of a system procedure handler routine to invoke. The handler does the requested work, and a RSP instruction (return from system procedure) returns control to the original program in its original bank.

The idea is that JSP should allow user programs to access hardware resources in a controlled way, without doing a jump directly to some bank 0 handler address, since handlers might change location between versions of the OS. Jumping to the wrong bank 0 address might also cause the OS to write invalid data or crash in the worst case. Of course the trade-off for this extra layer of protection is that access to hardware resources is more cumbersome and slow.

This is all still executing as a single process for now: The OS program starts, sets up the user program, and jumps to it. Once that happens, there is no more OS program running. However, the next logical step is to add an interrupt handler to swap between several user programs. In theory this shouldn’t be difficult: just store a table of saved registers for each process somewhere in bank 0. Whenever a timer interrupt occurs, the handler would copy the current register contents into an entry in the table, fill the registers using another entry in the table, and then return. I think the harder part is actually thinking of two programs that would make sense to run at the same time on BMOW. Multiprocessing makes a lot more sense on a graphical display with windows than on a text screen. With only one LCD, output from two programs would be jumbled together. Maybe I could make one program that uses the keyboard and LCD for I/O, while another runs simultaneously using the USB interface. Or maybe someone here can suggest a more impressive multiprocessing demo?

Read 2 comments and join the conversation