SOFTWARE

Timings for VGA port are given here, but the detailed version may be found on a lot of other sites (I recommend http://tinyvga.com/vga-timing). Dot clock for 800x600 resolution @ 60 Hz vertical frequency is exactly 40 MHz, which makes it very convinient for signal generation by PIC @ 80 MHz, as the instruction cycle frequency is 40 MHz. So, each pixel on the screen corresponds to one instruction cycle of the microcontroller.

It would be easy to make the software generated video in graphic mode for full 800x600 resolution with 256 colours, but there is one problem - amount of internal Video RAM required for that application. You would need 480 Kbytes, but PIC with the largest internal RAM is dsPIC33FJ256xx710, with only 30 Kbytes of RAM. Although it is possible to add externall RAM, it would be not be usable, as the access would be too slow.

If you use text mode it will decrease the number of possible applications, but it requires only a small amount of video memory. In this project, it needs just 60x25=1500 bytes for character storage, and the same quantity for attribute storage (colour and blinking bits). That makes the software significantly more complicated, as it has not only to read the byte and to move it to the port, but also to "pass it through" the character generator, to combine the result with attribute bytes (ink and paper colour), keep track with character row counter, turn the ouput on and off for blinking and to draw cursor - all that in real time.

Of course that all that can not be done in only one onstruction, so the dot clock had to be lowered. In this application, one character pixel is actualy two screen pixels wide, which gives the final resolution of 400x300 pixels. Still the requirements for the software are high, and the explanation might be a little bit confusing. If you are not interested in details, skip it an jump to the next page. You do not have to understand the theory of operation to embed this project in your application.

The video signal generation for each text row is aranged in four steps.

1. NEXT COLOUR SETUP. The first step starts even before the row scanning started, in the scan lines 22 and 24 of the previous text row. Here, the colour attributes for each character of the next text row are perpared in the 60-byte part of the line buffer. This task is represented by green areas on the text row drawing.

2. CHARACTER SETUP. When the first scan line starts, that part of the line buffer is completed, and the preparation of pixels for the first two scan lines begins (yellow areas on the text row drawing). This must be completed before the actual pixel generation starts.

3. CURSOR SETUP. This routine adds cursor pixels on the areas on the screen which are defined by cursor1 and cursor2 X and Y positions, with colour defined by cursor1 and cursor2 colours and cursor blinking attributes. This is represented by the gray areas on the drawing.

4. VIDEO OUTPUT GENERATION. This is the main part of the routine, which combines data in line buffers to generate VGA video signal.

 

STEP 1   Next colour setup (only the scan lines 22 and 24)

The first step presets the 60 bytes wide portion of the line buffer with the color data, picked from the odd locations of the video memory, which are reserved for argument storage. Each character in video memory is represented by two bytes: the first one contains the ASCII character, and the second one is the argument byte for that character. The format of those 2-byte locations is represented on the next drawing.

As there is not enough time to do this in a single routine, the task is divided in two parts. The first one, contained in the subroutine LINE4, presets the first 30 locations of the line buffer, and LINE5 presets the next half of the line buffer. In the addition, there are two 60-byte wide line buffers for temporary argument storage - one is at LINE_BUFFER+60 (we shall name it Line Buffer 2), and the another one is at LINE_BUFFER+120 (Line Buffer 3). The reason is that the whole line buffer has to be ready when the text row begins, and the program has to preset the next row while the first one is read; so, the two line buffers are writtene or read alternatively, and the bit FLAG,#14 decides which one shall be read, and which one written.

This routine does not only move data from the attribute (odd) bytes of the video memory to the line buffer, but also processes blinking for both ink (foreground) and paper (backgroung). RGB ink bits (2,1,0) are simply transfered to the line buffer or reset to 0,0,0 depended on the state of blink bit (3) and the real-time blinking counter output (FLAG,#11). That is why blink bits from the video memory are represented as 0 in the line buffers - they are already embedded in RGB bits and they are not needed any more. In the same step, the equal opeartion is done with the upper (paper) nibble of the byte. The whole opeartion is greatly speeded up by the use of the lookup table BLINKTAB.

Note that Line Buffer 1 is built at the beginning of the each odd scan line (12 times during one text row generation, each time with the new scan line for character generator reading), and Line Buufers 2 and 3 only once for one text row, as it will reamain the same for all scan lines. Actually, not both buffers but only one of them - the one which is not used for video signal generation at that moment.

The first step is realized at the end of text row generation in the previous line. It is represented by the green areas on the next drawing.

 

STEP 2   Character setup (all odd scan lines)

This step reads b&w character from video memory, reads the character generator byte for that byte (only for the required row, placed in WREG1H) and puts the output in Line Buffer 1 (LINE_BUF...LINE_BUF+59). Only two instructions are employed for each byte:

                                                                                  
   mov.b    [w3+0],w1    ; in the next cycles, it will read from w3+2, w3+4...    
   tblrdl.b [w1],[w5++]  ; read character generator and put pixels in line buffer 
                                                                                  
where:

w3 = video memory read pointer
w1 = ASCII byte
w5 = line buffer write pointer

There is no time for looping, so the whole sequence of 60 bytes is realized in a string of 120 instructions. There is one more problem which could slow down the operation - RAW, Read After Write dependency. This problem is solved by using two sets of registers alternatively, to make some kind of pipeline. The final routine looks like:

                                                                                  
   mov.b    [w3+0],w2    ; read byte 1 (fill the queue)                           
   mov.b    [w3+2],w1    ; read byte 2                                            
   tblrdl.b [w2],[w5++]  ;
read character generator and put pixels in line buffer 
   mov.b    [w3+4],w2    ; read byte 3                                            
   tblrdl.b [w1],[w5++]  ;
read character generator for byte 2                    
   mov.b    [w3+6],w1    ; read byte 4                                            
   tblrdl.b [w2],[w5++]  ;
read character generator for byte 3                    
   mov.b    [w3+8],w2    ; read byte 5                                            
   tblrdl.b [w1],[w5++]  ;
read character generator for byte 4                    
                         ; ...                                                    
                                                                                  

...and so on, until the byte 60 is translated and written into the line buffer.

All this is done at the begining of subroutine LINE1 (yellow areas on the drawing), before the actual video signal output starts. It should be noted that this step has to be done only at the begining of odd lines, as the same line buffer contents will be used in the next (even) scan line (remember that one pixel is represented by 2x2 screen pixels).

 

STEP 3   Cursor setup (only the even scan lines 6...20)

This step writes the graphic line for two cursors in both line buffers, one for colour (Line Buffer 2 or Line Buffer 3, prepared in step 1) and the another one for data (Line Buffer 1, which was prepared in step 3). Subroutine LINE2 does this, but as the program calls it only at the begining of even lines, when one character graphics (odd) scan line was already displayed without the cursor, it will be visible only in even lines. This makes the cursors pseudo-transparent, as the block does not cover all lines of the character. This is represented by the gray areas on the drawing (excluding idle cycles).

 

STEP 4   Video output generation (all scan lines)

This is the most important and critical step. Each dot data (RGB ink or RGB paper) is createt in two instruction cycles. Those "magic" instructions are:

                                                                        
   and.b   w1,[w2],w3     ; mask the desired bit from the Line Buffer 1 
   and.b   w4,[w3],[w5]   ; if it is =0, translate it to paper colour,  
                          ; ...or if it is =1 to ink colour and write   
                          ; ...it in the output port addressed by w5    
                                                                        

where:

w1 = 0b00100000 for the leftmost pixel in the character
         0b00010000 for the next pixel
         0b00001000 for the next pixel
         0b00000100 for the next pixel
         0b00000010 for the next pixel
         0b00000001 for the rightmost pixel in the character (used only for frame pseudographics)
w2 = Line Buffer 1 (for character data) pointer. This pointer will be post-incremented only after the last (6th) pixel output.
w3 = Byte from Line Buffer 1, used for translation via BITTAB (the high byte is preset to the page in internal RAM at which the table is located).
w4 = Colour (bits 012= Ink, bits 456=Paper), taken from Line Buffer 2 or Line Buffer 3
w5 = Literal address of output latch, used as the output port for video signal out.

Only the rightmost pixel (bit 0 in the character set) is represented as three screen pixels (1.5 character pixels), as it is executed in three instructions. The 3rd "extra" instruction reads the colour byte from the "argument portion" of the line buffer (Line Buffer 2 or Line Buffer 3). That makes the rightmost 50% pixel thicker than the other ones, but it does not affect the character shape, as this pixel is used only by the pseudographic frame characters, and all the ASCII characters use this pixel for horizontal spacing. So, the character spacing is actually 150% of one character pixel.

There is the same problem with the execution speed, like in the step 2. As the program has to be highly optimized for speed, there is no time for looping, so the whole routine must be located in one string, about 800 instructions long. Also, the same problem with RAW is present, so there are two sets of registers used alternatively here.

Literals 0b0010000 to 0b00000001, which are represented in w1 in our example, are actually loaded at six registers, w9...w13. This seems like the very "bad economy" in microcontroller resources, but it was the only way to maintain the required speed, as there was no time to load or shift their contents in the real time. Anyhow, there are no consequences "visible" from the user's program, as all those registers are saved on the stack at the beginning of the interrupt routine.

In order to correct the uneven delays caused by interrupt latency, the routine alignes its flow with TMR2 contents at the beginning of each scan line:

                                                              
   neg      TMR2,WREG      ; get -TMR2                        
   add      #hor_align,w0                                     
   repeat   w0                                                
   nop                     ; wait for #hor_align-TMR2 cycles  
                                                              

After this sequence, the program flow is synchronized with TMR2. As the OCx (which is driven by TMR2) generates the horizontal sync, the video signal is now synchronized with the program flow and the video signal is jitter-free.

 

The attached drawing explains the operation. At the first instruction cycle, this literal is ANDed with the Line Buffer 1 contents, and the result is written in w3 in our example. In the program, registers w6 and w7 are used alternatively (because of the RAW dependency problem) and the high byte of both registers are preset to the BITTAB page in RAM. The result is that the output of this lookup table will contain ink bits (2,1,0) set if the currently processed bit (5...0) in Line Buffer 1 is set, and paper bits (6,5,4) set if it is reset. So, in the next AND operation, this output will set output port "ink" pins if the bit was set, and "paper" pins if it was reset.