FPGA VGA Graphics in Verilog Part 3

Introduction

Welcome back to my FPGA graphics tutorial series using the Digilent Arty or Basys3 boards. In part 2 we introduced bitmapped displays and loaded graphics files into memory. In this third part, we animate sprites using bitmaps and double buffering: this classic technique is a staple of 2D games. By the end of this tutorial, you'll be able to control a sprite using the switches on your FPGA board.

Find the code and resources for this and other FPGA tutorials at github.com/WillGreen/timetoexplore.

Feedback to @WillFlux is most welcome.

Requirements

The requirements for this part are the same as the previous parts:

  1. Digilent Arty or Basys3 board
  2. VGA Pmod if using the Arty (Basys3 has VGA built-in)
  3. VGA capable monitor & cable
  4. Micro USB cable to program and power the board
  5. Xilinx Vivado installed
  6. Arty or Basys3 board file installed, so Vivado knows your board specification

Other FPGA Boards

This tutorial requires at least 964 Kbits of on-board FPGA memory (block or distributed ram). Provided you can meet the ram requirement it should be relatively easy to adapt this tutorial to other boards by making changes to the top.v module:

  1. Hardware I/O: update hardware ports, such as CLK and VGA_R, to match your board.
  2. Clock: if your board clock isn't 100 MHz, you need to update the pixel clock code.
  3. VGA Outputs: if your VGA output isn't 4-bits per colour, adjust VGA assign statements.

Double Buffering

The principle of double buffering is simple: you draw on one buffer while driving the monitor from the other. You have an entire frame (1/60th second) to create your image while avoiding any screen tearing. There are two downsides: you need twice the memory, and you increase latency by one frame.

As well as the memory for our two buffers we're going to need some memory to store sprites. We have therefore cut our resolution in half to 320x180. A 320 x 180 x 8-bit buffer requires 450 Kbits of ram; with 1,800 Kbits of block ram on our FPGA, this provides plenty of wiggle room.

By halving the resolution, we make the driver implementation easy: dividing by two can be efficiently implemented using a right shift. In Verilog a right shift is implemented with the >> operator; just like in C, Go, or Python.

VGA 320x180

Create a new RTL project in Vivado called vga03 with the Arty or Basys 3 board as the target. If you need advice on project creation see part 1 of my introductory tutorial series.

We're going to use a modified version of our familiar VGA driver module from part 2. It halves the horizontal and vertical resolutions while maintaining standard 640x480 VGA timings. There is just one change from the 640x360 version in part 2:

  • We right shift o_x and o_y to halve the effective resolution

Create a new module called vga320x180.v with the following content [view on github]:

module vga320x180(
    input wire i_clk,           // base clock
    input wire i_pix_stb,       // pixel clock strobe
    input wire i_rst,           // reset: restarts frame
    output wire o_hs,           // horizontal sync
    output wire o_vs,           // vertical sync
    output wire o_blanking,     // high during blanking interval
    output wire o_active,       // high during active pixel drawing
    output wire o_screenend,    // high for one tick at the end of screen
    output wire o_animate,      // high for one tick at end of active drawing
    output wire [9:0] o_x,      // current pixel x position
    output wire [8:0] o_y       // current pixel y position
    );

    // VGA timings https://timetoexplore.net/blog/video-timings-vga-720p-1080p
    localparam HS_STA = 16;              // horizontal sync start
    localparam HS_END = 16 + 96;         // horizontal sync end
    localparam HA_STA = 16 + 96 + 48;    // horizontal active pixel start
    localparam VS_STA = 480 + 11;        // vertical sync start
    localparam VS_END = 480 + 11 + 2;    // vertical sync end
    localparam VA_STA = 60;              // vertical active pixel start
    localparam VA_END = 420;             // vertical active pixel end
    localparam LINE   = 800;             // complete line (pixels)
    localparam SCREEN = 524;             // complete screen (lines)

    reg [9:0] h_count;      // line position
    reg [9:0] v_count;      // screen position

    // generate sync signals (active low for 640x480)
    assign o_hs = ~((h_count >= HS_STA) & (h_count < HS_END));
    assign o_vs = ~((v_count >= VS_STA) & (v_count < VS_END));

    // keep x and y bound within the active pixels
    assign o_x = ((h_count < HA_STA) ? 0 : (h_count - HA_STA)) >> 1;
    assign o_y = ((v_count >= VA_END) ? 
                    (VA_END - VA_STA - 1) : (v_count - VA_STA)) >> 1;

    // blanking: high within the blanking period
    assign o_blanking = ((h_count < HA_STA) | (v_count > VA_END - 1));

    // active: high during active pixel drawing
    assign o_active = ~((h_count < HA_STA) | 
                        (v_count > VA_END - 1) | 
                        (v_count < VA_STA));

    // screenend: high for one tick at the end of the screen
    assign o_screenend = ((v_count == SCREEN - 1) & (h_count == LINE));

    // animate: high for one tick at the end of the final active pixel line
    assign o_animate = ((v_count == VA_END - 1) & (h_count == LINE));

    always @ (posedge i_clk)
    begin
        if (i_rst)  // reset to start of frame
        begin
            h_count <= 0;
            v_count <= 0;
        end
        if (i_pix_stb)  // once per pixel
        begin
            if (h_count == LINE)  // end of line
            begin
                h_count <= 0;
                v_count <= v_count + 1;
            end
            else 
                h_count <= h_count + 1;

            if (v_count == SCREEN)  // end of screen
                v_count <= 0;
        end
    end
endmodule

Learn more about video display timings.

Remember Me?

We use exactly the same sram module as in the previous part.

Add a design source called sram.v [view on github]:

module sram #(parameter ADDR_WIDTH=8, DATA_WIDTH=8, DEPTH=256, MEMFILE="") (
    input wire i_clk,
    input wire [ADDR_WIDTH-1:0] i_addr, 
    input wire i_write,
    input wire [DATA_WIDTH-1:0] i_data,
    output reg [DATA_WIDTH-1:0] o_data 
    );

    reg [DATA_WIDTH-1:0] memory_array [0:DEPTH-1]; 

    initial begin
        $display("Loading memory init file '" + MEMFILE + "' into array.");
        $readmemh(MEMFILE, memory_array);
    end

    always @ (posedge i_clk)
    begin
        if(i_write) begin
            memory_array[i_addr] <= i_data;
        end
        else begin
            o_data <= memory_array[i_addr];
        end     
    end
endmodule

Sprite Sheet

We have eight sprites in our design. Each sprite is 32 x 32 pixels. We store these in a single 8-bit PNG of 32 x 256 pixels. You can see our sprites below (rotated to the horizontal to fit better on this page).

I've used the same FPGATools script as in part 2 to convert a PNG sprite sheet into Verilog memory initialization format.

  1. Copy sprites.mem and sprites_palette.mem from github
  2. In Vivado select "Add Sources" then "Files of type: Memory Initialization Files"
  3. Locate sprites.mem and sprites_palette.mem and select "OK" then "Finish"

If you'd rather create your own sprites, you can. Save eight 32x32 pixel sprites on one 32x256 image then use the img2fmem.py script from FPGATools to generate the memory initialization files.

With the sprites loaded into the project, we're ready to get drawing!

The Emptiness of Space

We're going to take a simplistic approach to sprites: draw every pixel in the frame every time. We start with the background, then add our ship afterwards. This approach is wasteful in that we end up redrawing pixels that haven't changed, but avoids the need to track what needs redrawing.

To create our backdrop, we tile the background sprite to fill the screen. While we're drawing sprites in one buffer, we'll be outputting the other buffer to the VGA monitor.

Create a design source called top.v with the following design [view on github]:

module top(
    input wire CLK,             // board clock: 100 MHz on Arty & Basys 3
    input wire RST_BTN,         // reset button
    input wire [3:0] sw,        // four switches
    output wire VGA_HS_O,       // horizontal sync output
    output wire VGA_VS_O,       // vertical sync output
    output reg [3:0] VGA_R,     // 4-bit VGA red output
    output reg [3:0] VGA_G,     // 4-bit VGA green output
    output reg [3:0] VGA_B      // 4-bit VGA blue output
    );

    wire rst = ~RST_BTN;  // reset is active low on Arty

    // generate a 25 MHz pixel strobe
    reg [15:0] cnt;
    reg pix_stb;
    always @(posedge CLK)
        {pix_stb, cnt} <= cnt + 16'h4000;  // divide by 4: (2^16)/4 = 0x4000

    wire [9:0] x;       // current pixel x position: 10-bit value: 0-1023
    wire [8:0] y;       // current pixel y position:  9-bit value: 0-511
    wire blanking;      // high within the blanking period
    wire active;        // high during active pixel drawing
    wire screenend;     // high for one tick at the end of screen
    wire animate;       // high for one tick at end of active drawing

    vga320x180 display (
        .i_clk(CLK), 
        .i_pix_stb(pix_stb),
        .i_rst(rst),
        .o_hs(VGA_HS_O), 
        .o_vs(VGA_VS_O), 
        .o_x(x), 
        .o_y(y),
        .o_blanking(blanking),
        .o_active(active),
        .o_screenend(screenend),
        .o_animate(animate)
    );

    // VRAM frame buffers (read-write)
    localparam SCREEN_WIDTH = 320;
    localparam SCREEN_HEIGHT = 180;
    localparam VRAM_DEPTH = SCREEN_WIDTH * SCREEN_HEIGHT; 
    localparam VRAM_A_WIDTH = 16;  // 2^16 > 320 x 180
    localparam VRAM_D_WIDTH = 8;   // colour bits per pixel

    reg [VRAM_A_WIDTH-1:0] address_a, address_b;
    reg [VRAM_D_WIDTH-1:0] datain_a, datain_b;
    wire [VRAM_D_WIDTH-1:0] dataout_a, dataout_b;
    reg we_a = 0, we_b = 1;  // write enable bit

    // frame buffer A VRAM
    sram #(
        .ADDR_WIDTH(VRAM_A_WIDTH), 
        .DATA_WIDTH(VRAM_D_WIDTH), 
        .DEPTH(VRAM_DEPTH), 
        .MEMFILE("")) 
        vram_a (
        .i_addr(address_a), 
        .i_clk(CLK), 
        .i_write(we_a),
        .i_data(datain_a), 
        .o_data(dataout_a)
    );

    // frame buffer B VRAM
    sram #(
        .ADDR_WIDTH(VRAM_A_WIDTH), 
        .DATA_WIDTH(VRAM_D_WIDTH), 
        .DEPTH(VRAM_DEPTH), 
        .MEMFILE("")) 
        vram_b (
        .i_addr(address_b), 
        .i_clk(CLK), 
        .i_write(we_b),
        .i_data(datain_b), 
        .o_data(dataout_b)
    );

    // sprite buffer (read-only)
    localparam SPRITE_SIZE = 32;  // dimensions of square sprites in pixels
    localparam SPRITE_COUNT = 8;  // number of sprites in buffer
    localparam SPRITEBUF_D_WIDTH = 8;  // colour bits per pixel
    localparam SPRITEBUF_DEPTH = SPRITE_SIZE * SPRITE_SIZE * SPRITE_COUNT;    
    localparam SPRITEBUF_A_WIDTH = 13;  // 2^13 == 8,096 == 32 x 256 

    reg [SPRITEBUF_A_WIDTH-1:0] address_s;
    wire [SPRITEBUF_D_WIDTH-1:0] dataout_s;

    // sprite buffer memory
    sram #(
        .ADDR_WIDTH(SPRITEBUF_A_WIDTH), 
        .DATA_WIDTH(SPRITEBUF_D_WIDTH), 
        .DEPTH(SPRITEBUF_DEPTH), 
        .MEMFILE("sprites.mem"))
        spritebuf (
        .i_addr(address_s), 
        .i_clk(CLK), 
        .i_write(0),  // read only
        .i_data(0), 
        .o_data(dataout_s)
    );

    reg [11:0] palette [0:255];  // 256 x 12-bit colour palette entries
    reg [11:0] colour;
    initial begin
        $display("Loading palette.");
        $readmemh("sprites_palette.mem", palette);
    end

    // Sprites to load and position of player sprite in frame
    localparam SPRITE_BG_INDEX = 7;  // background sprite
    localparam SPRITE_PL_INDEX = 0;  // player sprite
    localparam SPRITE_BG_OFFSET = SPRITE_BG_INDEX * SPRITE_SIZE * SPRITE_SIZE;
    localparam SPRITE_PL_OFFSET = SPRITE_PL_INDEX * SPRITE_SIZE * SPRITE_SIZE;
    localparam SPRITE_PL_X = SCREEN_WIDTH - SPRITE_SIZE >> 1; // centre
    localparam SPRITE_PL_Y = SCREEN_HEIGHT - SPRITE_SIZE;     // bottom

    reg [9:0] draw_x;
    reg [8:0] draw_y;
    reg [9:0] pl_x = SPRITE_PL_X; 
    reg [9:0] pl_y = SPRITE_PL_Y; 
    reg [9:0] pl_pix_x; 
    reg [8:0] pl_pix_y;

    // pipeline registers for for address calculation
    reg [VRAM_A_WIDTH-1:0] address_fb1;  
    reg [VRAM_A_WIDTH-1:0] address_fb2;

    always @ (posedge CLK)
    begin
        // reset drawing
        if (rst)
        begin
            draw_x <= 0;
            draw_y <= 0;
            pl_x <= SPRITE_PL_X; 
            pl_y <= SPRITE_PL_Y; 
            pl_pix_x <= 0; 
            pl_pix_y <= 0;
        end

        // draw background
        if (address_fb1 < VRAM_DEPTH)
        begin
            if (draw_x < SCREEN_WIDTH)
                draw_x <= draw_x + 1;
            else
            begin
                draw_x <= 0;
                draw_y <= draw_y + 1;
            end

            // calculate address of sprite and frame buffer (with pipeline)
            address_s <= SPRITE_BG_OFFSET + 
                        (SPRITE_SIZE * draw_y[4:0]) + draw_x[4:0];
            address_fb1 <= (SCREEN_WIDTH * draw_y) + draw_x;
            address_fb2 <= address_fb1;

            if (we_a)
            begin
                address_a <= address_fb2;
                datain_a <= dataout_s;
            end
            else
            begin
                address_b <= address_fb2;
                datain_b <= dataout_s;
            end
        end

        if (pix_stb)  // once per pixel
        begin
            if (we_a)  // when drawing to A, output from B
            begin
                address_b <= y * SCREEN_WIDTH + x;
                colour <= active ? palette[dataout_b] : 0;
            end
            else  // otherwise output from A
            begin
                address_a <= y * SCREEN_WIDTH + x;
                colour <= active ? palette[dataout_a] : 0;
            end

            if (screenend)  // switch active buffer once per frame
            begin
                we_a <= ~we_a;
                we_b <= ~we_b;
                // reset background position at start of frame
                draw_x <= 0;
                draw_y <= 0;
                // reset player position
                pl_pix_x <= 0;
                pl_pix_y <= 0;
                // reset frame address
                address_fb1 <= 0;
            end
        end

        VGA_R <= colour[11:8];
        VGA_G <= colour[7:4];
        VGA_B <= colour[3:0];
    end
endmodule

Memory Latency

You might have spotted two unusual registers: address_fb1 and address_fb2.

address_fb1 <= (sprite_pos_y + SPRITE_Y) * SCREEN_WIDTH 
    + sprite_pos_x + SPRITE_X;
address_fb2 <= address_fb1;
...
address_a <= address_fb2;

The memory we're using has a two clock cycle latency: it takes two clock cycles to retrieve the pixel colour from the sprite buffer. Thus, the write to the frame buffer address needs to be delayed two clock cycles too. If we didn't do this, our sprites would be drawn two pixels to the right.

If you're not using an Arty or Basys 3 board, then you might need to adjust the length of this delay by removing or inserting additional registers.

Programming the Board

Constraints

Create a constraints file called arty.xdc (or whatever your board is called) with the following content [view on github].

## FPGA VGA Graphics Part 3: Arty Board Constraints
## Adapted from Digilent master file:
##   https://github.com/Digilent/digilent-xdc/blob/master/Arty-Master.xdc
## Learn more at https://timetoexplore.net/blog/arty-fpga-vga-verilog-03

## Clock
set_property -dict {PACKAGE_PIN E3  IOSTANDARD LVCMOS33} [get_ports {CLK}];
create_clock -add -name sys_clk_pin -period 10.00 \
    -waveform {0 5} [get_ports {CLK}];

## Reset Button (active low)
set_property -dict {PACKAGE_PIN C2  IOSTANDARD LVCMOS33} [get_ports {RST_BTN}];

## Switches
set_property -dict {PACKAGE_PIN A8  IOSTANDARD LVCMOS33} [get_ports {sw[0]}];
set_property -dict {PACKAGE_PIN C11 IOSTANDARD LVCMOS33} [get_ports {sw[1]}];
set_property -dict {PACKAGE_PIN C10 IOSTANDARD LVCMOS33} [get_ports {sw[2]}];
set_property -dict {PACKAGE_PIN A10 IOSTANDARD LVCMOS33} [get_ports {sw[3]}];

## VGA Pmod Header JB
set_property -dict {PACKAGE_PIN E15 IOSTANDARD LVCMOS33} [get_ports {VGA_R[0]}];
set_property -dict {PACKAGE_PIN E16 IOSTANDARD LVCMOS33} [get_ports {VGA_R[1]}];
set_property -dict {PACKAGE_PIN D15 IOSTANDARD LVCMOS33} [get_ports {VGA_R[2]}];
set_property -dict {PACKAGE_PIN C15 IOSTANDARD LVCMOS33} [get_ports {VGA_R[3]}];
set_property -dict {PACKAGE_PIN J17 IOSTANDARD LVCMOS33} [get_ports {VGA_B[0]}];
set_property -dict {PACKAGE_PIN J18 IOSTANDARD LVCMOS33} [get_ports {VGA_B[1]}];
set_property -dict {PACKAGE_PIN K15 IOSTANDARD LVCMOS33} [get_ports {VGA_B[2]}];
set_property -dict {PACKAGE_PIN J15 IOSTANDARD LVCMOS33} [get_ports {VGA_B[3]}];

## VGA Pmod Header JC
set_property -dict {PACKAGE_PIN U12 IOSTANDARD LVCMOS33} [get_ports {VGA_G[0]}];
set_property -dict {PACKAGE_PIN V12 IOSTANDARD LVCMOS33} [get_ports {VGA_G[1]}];
set_property -dict {PACKAGE_PIN V10 IOSTANDARD LVCMOS33} [get_ports {VGA_G[2]}];
set_property -dict {PACKAGE_PIN V11 IOSTANDARD LVCMOS33} [get_ports {VGA_G[3]}];
set_property -dict {PACKAGE_PIN U14 IOSTANDARD LVCMOS33} [get_ports {VGA_HS_O}];
set_property -dict {PACKAGE_PIN V14 IOSTANDARD LVCMOS33} [get_ports {VGA_VS_O}];

If you're using the Basys3 board, you need to modify the constraints for your board. Change the pins for the clock, reset button, switches, and VGA ports. See the Basys3 reference manual and Basys3 master XDC for details.

Build & Program

Run synthesis, implementation, bitstream generation. See the FPGA introductory post if you need a reminder on how to do this.

Next, hook up your VGA Pmod to the middle two connectors (JB and JC) on your Arty and use your VGA cable to connect your monitor to the VGA Pmod. Basys3 users can connect the VGA cable directly to their board. Finally, connect your board to your computer via USB and program it with vga03/vga03.runs/impl_1/top.bit.

You should see a purple star field. If your screen is black, check you correctly added sprites.mem and sprites_palette.mem to your project.

Ready Player One

Next, we add our player ship sprite at the bottom of the screen.

Within top.v, add the following after the draw background block and before the line if (pix_stb) [view full source on github]:

// draw player ship
if (address_fb1 >= VRAM_DEPTH)  // background drawing is finished 
begin
    if (pl_pix_y < SPRITE_SIZE)
    begin
        if (pl_pix_x < SPRITE_SIZE - 1)
            pl_pix_x <= pl_pix_x + 1;
        else
        begin
            pl_pix_x <= 0;
            pl_pix_y <= pl_pix_y + 1;
        end

        address_s <= SPRITE_PL_OFFSET 
                    + (SPRITE_SIZE * pl_pix_y) + pl_pix_x;
        address_fb1 <= SCREEN_WIDTH * (pl_y + pl_pix_y) 
                    + pl_x + pl_pix_x;
        address_fb2 <= address_fb1;

        if (we_a)
        begin
            address_a <= address_fb2;
            datain_a <= dataout_s;
        end
        else
        begin
            address_b <= address_fb2;
            datain_b <= dataout_s;
        end
    end
end

Regenerate the bitstream and program your board again. You should see a ship at the bottom of the screen.

Learning to Fly

A static ship is hardly a ship. Let's control the ship's position with the switches on our FPGA board. There are four switches labelled SW0 to SW3 in the middle bottom of the Arty board.

Add the following to the bottom of the if (screenend) block, below the line address_fb1 <= 0; [view full source on github]:

// update ship position based on switches
if (sw[0] && pl_x < SCREEN_WIDTH - SPRITE_SIZE)
    pl_x <= pl_x + 1;
if (sw[1] && pl_x > 0)
    pl_x <= pl_x - 1;      
if (sw[2] && pl_y < SCREEN_HEIGHT - SPRITE_SIZE)
    pl_y <= pl_y + 1;
if (sw[3] & pl_y > 0)
    pl_y <= pl_y - 1;

Regenerate the bitstream and program your board again.

Try experimenting with the switches (not the push buttons; they won't do anything). You can make your ship move diagonally by combining switches, e.g. SW1 and SW3.

You can also change which sprites are drawn by updating the relevant lines in top.v:

localparam SPRITE_BG_INDEX = 7;  // background sprite
localparam SPRITE_PL_INDEX = 0;  // player sprite

For example, if SPRITE_PL_INDEX is set to 4, then you'll get the alien spaceship instead.

What's Next?

I'm currently writing the next part, which will make use of the other sprites to add asteroids and aliens. Follow @WillFlux for updates.

©2018 Will Green.

Graphics Credit: The sample spaceship game graphics come from KenneyNL and are in the public domain.