FPGA VGA Graphics in Verilog Part 3

Introduction

Welcome back to the third installment of my FPGA graphics tutorial series using Digilent boards. In part 2 we introduced bitmapped displays and loaded graphics files into memory. In this third part, we animate sprites using bitmaps and double buffering: this classic technique is a staple of 2D games. By the end of this tutorial, you'll be able to control a sprite using the switches on your FPGA board.

Find the code and resources for this and other FPGA tutorials at github.com/WillGreen/timetoexplore.

Feedback to @WillFlux is most welcome. Updated January 2019.

Requirements

The requirements for this part are the same as the previous parts:

  1. Digilent Arty A7-35T, Arty S7-50T, Basys 3, or Nexys Video board (see below for other boards)
  2. VGA Pmod if using the Arty or Nexys Video (Basys 3 has VGA built-in)
  3. VGA capable monitor & cable
  4. Micro USB cable to program and power the board
  5. Xilinx Vivado installed (including Digilent board files)

Other FPGA Boards

This tutorial requires at least 964 Kbits of on-board FPGA memory (block or distributed ram). Provided you can meet the ram requirement it should be relatively easy to adapt this tutorial to other boards by making changes to the top.v module:

  1. Hardware I/O: update hardware ports, such as CLK and VGA_R, to match your board.
  2. Clock: if your board clock isn't 100 MHz, you need to update the pixel clock code.
  3. VGA Outputs: if your VGA output isn't 4-bits per colour, adjust VGA assign statements.

Double Buffering

The principle of double buffering is simple: you draw on one buffer while driving the monitor from the other. You have an entire frame (1/60th second) to create your image while avoiding any screen tearing. There are two downsides: you need twice the memory, and you increase latency by one frame.

As well as the memory for our two buffers we're going to need some memory to store sprites. We have therefore cut our resolution in half to 320x180. A 320 x 180 x 8-bit buffer requires 450 Kbits of ram; with 1,800 Kbits of block ram on our FPGA, this provides plenty of wiggle room.

By halving the resolution, we make the driver implementation easy: dividing by two can be efficiently implemented using a right shift. In Verilog a right shift is implemented with the >> operator; just like in C, Go, or Python.

VGA 320x180

Create a new RTL project in Vivado called vga03 with the appropriate board as the target. If you need advice on project creation see part 1 of my introductory tutorial series.

We're going to use a modified version of our familiar VGA driver module from part 2. It halves the horizontal and vertical resolutions while maintaining standard 640x480 VGA timings. There is just one change from the 640x360 version in part 2:

  • We right shift o_x and o_y to halve the effective resolution

Create a new module called vga320x180.v with the following content [view source]:

module vga320x180(
    input wire i_clk,           // base clock
    input wire i_pix_stb,       // pixel clock strobe
    input wire i_rst,           // reset: restarts frame
    output wire o_hs,           // horizontal sync
    output wire o_vs,           // vertical sync
    output wire o_blanking,     // high during blanking interval
    output wire o_active,       // high during active pixel drawing
    output wire o_screenend,    // high for one tick at the end of screen
    output wire o_animate,      // high for one tick at end of active drawing
    output wire [9:0] o_x,      // current pixel x position
    output wire [8:0] o_y       // current pixel y position
    );

    // VGA timings https://timetoexplore.net/blog/video-timings-vga-720p-1080p
    localparam HS_STA = 16;              // horizontal sync start
    localparam HS_END = 16 + 96;         // horizontal sync end
    localparam HA_STA = 16 + 96 + 48;    // horizontal active pixel start
    localparam VS_STA = 480 + 10;        // vertical sync start
    localparam VS_END = 480 + 10 + 2;    // vertical sync end
    localparam VA_STA = 60;              // vertical active pixel start
    localparam VA_END = 420;             // vertical active pixel end
    localparam LINE   = 800;             // complete line (pixels)
    localparam SCREEN = 525;             // complete screen (lines)

    reg [9:0] h_count;      // line position
    reg [9:0] v_count;      // screen position

    // generate sync signals (active low for 640x480)
    assign o_hs = ~((h_count >= HS_STA) & (h_count < HS_END));
    assign o_vs = ~((v_count >= VS_STA) & (v_count < VS_END));

    // keep x and y bound within the active pixels
    assign o_x = ((h_count < HA_STA) ? 0 : (h_count - HA_STA)) >> 1;
    assign o_y = ((v_count >= VA_END) ? 
                    (VA_END - VA_STA - 1) : (v_count - VA_STA)) >> 1;

    // blanking: high within the blanking period
    assign o_blanking = ((h_count < HA_STA) | (v_count > VA_END - 1));

    // active: high during active pixel drawing
    assign o_active = ~((h_count < HA_STA) | 
                        (v_count > VA_END - 1) | 
                        (v_count < VA_STA));

    // screenend: high for one tick at the end of the screen
    assign o_screenend = ((v_count == SCREEN - 1) & (h_count == LINE));

    // animate: high for one tick at the end of the final active pixel line
    assign o_animate = ((v_count == VA_END - 1) & (h_count == LINE));

    always @ (posedge i_clk)
    begin
        if (i_rst)  // reset to start of frame
        begin
            h_count <= 0;
            v_count <= 0;
        end
        if (i_pix_stb)  // once per pixel
        begin
            if (h_count == LINE)  // end of line
            begin
                h_count <= 0;
                v_count <= v_count + 1;
            end
            else 
                h_count <= h_count + 1;

            if (v_count == SCREEN)  // end of screen
                v_count <= 0;
        end
    end
endmodule

Learn more about video display timings.

Remember Me?

We use exactly the same sram module as in the previous part.

Add a design source called sram.v [view source]:

module sram #(parameter ADDR_WIDTH=8, DATA_WIDTH=8, DEPTH=256, MEMFILE="") (
    input wire i_clk,
    input wire [ADDR_WIDTH-1:0] i_addr, 
    input wire i_write,
    input wire [DATA_WIDTH-1:0] i_data,
    output reg [DATA_WIDTH-1:0] o_data 
    );

    reg [DATA_WIDTH-1:0] memory_array [0:DEPTH-1]; 

    initial begin
        if (MEMFILE > 0)
        begin
            $display("Loading memory init file '" + MEMFILE + "' into array.");
            $readmemh(MEMFILE, memory_array);
        end
    end

    always @ (posedge i_clk)
    begin
        if(i_write) begin
            memory_array[i_addr] <= i_data;
        end
        else begin
            o_data <= memory_array[i_addr];
        end     
    end
endmodule

Sprite Sheet

We have eight sprites in our design. Each sprite is 32 x 32 pixels. We store these in a single 8-bit PNG of 32 x 256 pixels. You can see our sprites below (rotated to the horizontal to fit better on this page).

I've used the same FPGATools script as in part 2 to convert a PNG sprite sheet into Verilog memory initialization format.

  1. Copy sprites.mem and sprites_palette.mem from the tutorial git repo
  2. In Vivado select "Add Sources" then "design sources" and "Files of type: Memory Initialization Files"
  3. Locate sprites.mem and sprites_palette.mem and select "OK" then "Finish"

If you'd rather create your own sprites, you can. Save eight 32x32 pixel sprites on one 32x256 image then use the img2fmem.py script from FPGATools to generate the memory initialization files.

With the sprites loaded into the project, we're ready to get drawing!

The Emptiness of Space

We're going to take a simplistic approach to sprites: draw every pixel in the frame every time. We start with the background, then add our ship afterwards. This approach is wasteful in that we end up redrawing pixels that haven't changed, but avoids the need to track what needs redrawing.

To create our backdrop, we tile the background sprite to fill the screen. While we're drawing sprites in one buffer, we'll be outputting the other buffer to the VGA monitor.

Create a design source called top.v with the following design [view source]:

module top(
    input wire CLK,             // board clock: 100 MHz on Arty/Basys3/Nexys
    input wire RST_BTN,         // reset button
    input wire [3:0] sw,        // four switches
    output wire VGA_HS_O,       // horizontal sync output
    output wire VGA_VS_O,       // vertical sync output
    output reg [3:0] VGA_R,     // 4-bit VGA red output
    output reg [3:0] VGA_G,     // 4-bit VGA green output
    output reg [3:0] VGA_B      // 4-bit VGA blue output
    );

    wire rst = ~RST_BTN;    // reset is active low on Arty & Nexys Video
    // wire rst = RST_BTN;  // reset is active high on Basys3 (BTNC)

    // generate a 25 MHz pixel strobe
    reg [15:0] cnt;
    reg pix_stb;
    always @(posedge CLK)
        {pix_stb, cnt} <= cnt + 16'h4000;  // divide by 4: (2^16)/4 = 0x4000

    wire [9:0] x;       // current pixel x position: 10-bit value: 0-1023
    wire [8:0] y;       // current pixel y position:  9-bit value: 0-511
    wire blanking;      // high within the blanking period
    wire active;        // high during active pixel drawing
    wire screenend;     // high for one tick at the end of screen
    wire animate;       // high for one tick at end of active drawing

    vga320x180 display (
        .i_clk(CLK), 
        .i_pix_stb(pix_stb),
        .i_rst(rst),
        .o_hs(VGA_HS_O), 
        .o_vs(VGA_VS_O), 
        .o_x(x), 
        .o_y(y),
        .o_blanking(blanking),
        .o_active(active),
        .o_screenend(screenend),
        .o_animate(animate)
    );

    // VRAM frame buffers (read-write)
    localparam SCREEN_WIDTH = 320;
    localparam SCREEN_HEIGHT = 180;
    localparam VRAM_DEPTH = SCREEN_WIDTH * SCREEN_HEIGHT; 
    localparam VRAM_A_WIDTH = 16;  // 2^16 > 320 x 180
    localparam VRAM_D_WIDTH = 8;   // colour bits per pixel

    reg [VRAM_A_WIDTH-1:0] address_a, address_b;
    reg [VRAM_D_WIDTH-1:0] datain_a, datain_b;
    wire [VRAM_D_WIDTH-1:0] dataout_a, dataout_b;
    reg we_a = 0, we_b = 1;  // write enable bit

    // frame buffer A VRAM
    sram #(
        .ADDR_WIDTH(VRAM_A_WIDTH), 
        .DATA_WIDTH(VRAM_D_WIDTH), 
        .DEPTH(VRAM_DEPTH), 
        .MEMFILE("")) 
        vram_a (
        .i_addr(address_a), 
        .i_clk(CLK), 
        .i_write(we_a),
        .i_data(datain_a), 
        .o_data(dataout_a)
    );

    // frame buffer B VRAM
    sram #(
        .ADDR_WIDTH(VRAM_A_WIDTH), 
        .DATA_WIDTH(VRAM_D_WIDTH), 
        .DEPTH(VRAM_DEPTH), 
        .MEMFILE("")) 
        vram_b (
        .i_addr(address_b), 
        .i_clk(CLK), 
        .i_write(we_b),
        .i_data(datain_b), 
        .o_data(dataout_b)
    );

    // sprite buffer (read-only)
    localparam SPRITE_SIZE = 32;  // dimensions of square sprites in pixels
    localparam SPRITE_COUNT = 8;  // number of sprites in buffer
    localparam SPRITEBUF_D_WIDTH = 8;  // colour bits per pixel
    localparam SPRITEBUF_DEPTH = SPRITE_SIZE * SPRITE_SIZE * SPRITE_COUNT;    
    localparam SPRITEBUF_A_WIDTH = 13;  // 2^13 == 8,096 == 32 x 256 

    reg [SPRITEBUF_A_WIDTH-1:0] address_s;
    wire [SPRITEBUF_D_WIDTH-1:0] dataout_s;

    // sprite buffer memory
    sram #(
        .ADDR_WIDTH(SPRITEBUF_A_WIDTH), 
        .DATA_WIDTH(SPRITEBUF_D_WIDTH), 
        .DEPTH(SPRITEBUF_DEPTH), 
        .MEMFILE("sprites.mem"))
        spritebuf (
        .i_addr(address_s), 
        .i_clk(CLK), 
        .i_write(0),  // read only
        .i_data(0), 
        .o_data(dataout_s)
    );

    reg [11:0] palette [0:255];  // 256 x 12-bit colour palette entries
    reg [11:0] colour;
    initial begin
        $display("Loading palette.");
        $readmemh("sprites_palette.mem", palette);
    end

    // sprites to load and position of player sprite in frame
    localparam SPRITE_BG_INDEX = 7;  // background sprite
    localparam SPRITE_PL_INDEX = 0;  // player sprite
    localparam SPRITE_BG_OFFSET = SPRITE_BG_INDEX * SPRITE_SIZE * SPRITE_SIZE;
    localparam SPRITE_PL_OFFSET = SPRITE_PL_INDEX * SPRITE_SIZE * SPRITE_SIZE;
    localparam SPRITE_PL_X = SCREEN_WIDTH - SPRITE_SIZE >> 1; // centre
    localparam SPRITE_PL_Y = SCREEN_HEIGHT - SPRITE_SIZE;     // bottom

    reg [9:0] draw_x;
    reg [8:0] draw_y;
    reg [9:0] pl_x = SPRITE_PL_X; 
    reg [9:0] pl_y = SPRITE_PL_Y; 
    reg [9:0] pl_pix_x; 
    reg [8:0] pl_pix_y;

    // pipeline registers for for address calculation
    reg [VRAM_A_WIDTH-1:0] address_fb1;  
    reg [VRAM_A_WIDTH-1:0] address_fb2;

    always @ (posedge CLK)
    begin
        // reset drawing
        if (rst)
        begin
            draw_x <= 0;
            draw_y <= 0;
            pl_x <= SPRITE_PL_X; 
            pl_y <= SPRITE_PL_Y; 
            pl_pix_x <= 0; 
            pl_pix_y <= 0;
        end

        // draw background
        if (address_fb1 < VRAM_DEPTH)
        begin
            if (draw_x < SCREEN_WIDTH)
                draw_x <= draw_x + 1;
            else
            begin
                draw_x <= 0;
                draw_y <= draw_y + 1;
            end

            // calculate address of sprite and frame buffer (with pipeline)
            address_s <= SPRITE_BG_OFFSET + 
                        (SPRITE_SIZE * draw_y[4:0]) + draw_x[4:0];
            address_fb1 <= (SCREEN_WIDTH * draw_y) + draw_x;
            address_fb2 <= address_fb1;

            if (we_a)
            begin
                address_a <= address_fb2;
                datain_a <= dataout_s;
            end
            else
            begin
                address_b <= address_fb2;
                datain_b <= dataout_s;
            end
        end

        if (pix_stb)  // once per pixel
        begin
            if (we_a)  // when drawing to A, output from B
            begin
                address_b <= y * SCREEN_WIDTH + x;
                colour <= active ? palette[dataout_b] : 0;
            end
            else  // otherwise output from A
            begin
                address_a <= y * SCREEN_WIDTH + x;
                colour <= active ? palette[dataout_a] : 0;
            end

            if (screenend)  // switch active buffer once per frame
            begin
                we_a <= ~we_a;
                we_b <= ~we_b;
                // reset background position at start of frame
                draw_x <= 0;
                draw_y <= 0;
                // reset player position
                pl_pix_x <= 0;
                pl_pix_y <= 0;
                // reset frame address
                address_fb1 <= 0;
            end
        end

        VGA_R <= colour[11:8];
        VGA_G <= colour[7:4];
        VGA_B <= colour[3:0];
    end
endmodule

Memory Latency

You might have spotted two unusual registers: address_fb1 and address_fb2.

address_fb1 <= (sprite_pos_y + SPRITE_Y) * SCREEN_WIDTH 
    + sprite_pos_x + SPRITE_X;
address_fb2 <= address_fb1;
...
address_a <= address_fb2;

The memory we're using has a two clock cycle latency: it takes two clock cycles to retrieve the pixel colour from the sprite buffer. Thus, the write to the frame buffer address needs to be delayed two clock cycles too. If we didn't do this, our sprites would be drawn two pixels to the right.

If you're not using an Arty, Basys 3 or Nexys Video board, then you might need to adjust the length of this delay by removing or inserting additional registers.

Programming the Board

Constraints

Obtain the appopriate constraints file from the Time To Explore git repo and add it to your project:

NB. The slide switch constraints have not been tested on the S7-50T. Please report any success or failure to @WillFlux·

For reference the Nexys Video constraints look like this:

## FPGA VGA Graphics Part 3: Nexys Video Board Constraints

## Clock
set_property -dict {PACKAGE_PIN R4  IOSTANDARD LVCMOS33} [get_ports {CLK}];
create_clock -add -name sys_clk_pin -period 10.00 \
    -waveform {0 5} [get_ports {CLK}];

## Reset Button (active low)
set_property -dict {PACKAGE_PIN G4  IOSTANDARD LVCMOS15} [get_ports {RST_BTN}];

## Slide Switches
set_property -dict {PACKAGE_PIN E22 IOSTANDARD LVCMOS12} [get_ports {sw[0]}];
set_property -dict {PACKAGE_PIN F21 IOSTANDARD LVCMOS12} [get_ports {sw[1]}];
set_property -dict {PACKAGE_PIN G21 IOSTANDARD LVCMOS12} [get_ports {sw[2]}];
set_property -dict {PACKAGE_PIN G22 IOSTANDARD LVCMOS12} [get_ports {sw[3]}];

## VGA Pmod Header JB
set_property -dict {PACKAGE_PIN V9  IOSTANDARD LVCMOS33} [get_ports {VGA_R[0]}];
set_property -dict {PACKAGE_PIN V8  IOSTANDARD LVCMOS33} [get_ports {VGA_R[1]}];
set_property -dict {PACKAGE_PIN V7  IOSTANDARD LVCMOS33} [get_ports {VGA_R[2]}];
set_property -dict {PACKAGE_PIN W7  IOSTANDARD LVCMOS33} [get_ports {VGA_R[3]}];
set_property -dict {PACKAGE_PIN W9  IOSTANDARD LVCMOS33} [get_ports {VGA_B[0]}];
set_property -dict {PACKAGE_PIN Y9  IOSTANDARD LVCMOS33} [get_ports {VGA_B[1]}];
set_property -dict {PACKAGE_PIN Y8  IOSTANDARD LVCMOS33} [get_ports {VGA_B[2]}];
set_property -dict {PACKAGE_PIN Y7  IOSTANDARD LVCMOS33} [get_ports {VGA_B[3]}];

## VGA Pmod Header JC
set_property -dict {PACKAGE_PIN Y6  IOSTANDARD LVCMOS33} [get_ports {VGA_G[0]}];
set_property -dict {PACKAGE_PIN AA6 IOSTANDARD LVCMOS33} [get_ports {VGA_G[1]}];
set_property -dict {PACKAGE_PIN AA8 IOSTANDARD LVCMOS33} [get_ports {VGA_G[2]}];
set_property -dict {PACKAGE_PIN AB8 IOSTANDARD LVCMOS33} [get_ports {VGA_G[3]}];
set_property -dict {PACKAGE_PIN R6  IOSTANDARD LVCMOS33} [get_ports {VGA_HS_O}];
set_property -dict {PACKAGE_PIN T6  IOSTANDARD LVCMOS33} [get_ports {VGA_VS_O}];

Build & Program

Run synthesis, implementation, bitstream generation. See the FPGA introductory post if you need a reminder on how to do this.

Next, hook up your VGA Pmod to the middle two connectors (JB and JC) on your Arty and use your VGA cable to connect your monitor to the VGA Pmod. Basys 3 users can connect the VGA cable directly to their board. Finally, connect your board to your computer via USB and program it with vga03/vga03.runs/impl_1/top.bit.

You should see a purple star field. If your screen is black, check you correctly added sprites.mem and sprites_palette.mem to your project.

Ready Player One

Next, we add our player ship sprite at the bottom of the screen.

Within top.v, add the following after the draw background block and before the line if (pix_stb) [view full source]:

// draw player ship
if (address_fb1 >= VRAM_DEPTH)  // background drawing is finished 
begin
    if (pl_pix_y < SPRITE_SIZE)
    begin
        if (pl_pix_x < SPRITE_SIZE - 1)
            pl_pix_x <= pl_pix_x + 1;
        else
        begin
            pl_pix_x <= 0;
            pl_pix_y <= pl_pix_y + 1;
        end

        address_s <= SPRITE_PL_OFFSET 
                    + (SPRITE_SIZE * pl_pix_y) + pl_pix_x;
        address_fb1 <= SCREEN_WIDTH * (pl_y + pl_pix_y) 
                    + pl_x + pl_pix_x;
        address_fb2 <= address_fb1;

        if (we_a)
        begin
            address_a <= address_fb2;
            datain_a <= dataout_s;
        end
        else
        begin
            address_b <= address_fb2;
            datain_b <= dataout_s;
        end
    end
end

Regenerate the bitstream and program your board again. You should see a ship at the bottom of the screen.

Learning to Fly

A static ship is hardly a ship. Let's control the ship's position with slide switches on our FPGA board. We use four slide switches labelled SW0 to SW3.

Add the following to the bottom of the if (screenend) block, below the line address_fb1 <= 0; [view full source]:

// update ship position based on switches
if (sw[0] && pl_x < SCREEN_WIDTH - SPRITE_SIZE)
    pl_x <= pl_x + 1;
if (sw[1] && pl_x > 0)
    pl_x <= pl_x - 1;      
if (sw[2] && pl_y < SCREEN_HEIGHT - SPRITE_SIZE)
    pl_y <= pl_y + 1;
if (sw[3] & pl_y > 0)
    pl_y <= pl_y - 1;

Regenerate the bitstream and program your board again.

Try experimenting with the slide switches (not the push buttons; they won't do anything). You can make your ship move diagonally by combining switches, e.g. SW1 and SW3.

You can also change which sprites are drawn by updating the relevant lines in top.v:

localparam SPRITE_BG_INDEX = 7;  // background sprite
localparam SPRITE_PL_INDEX = 0;  // player sprite

For example, if SPRITE_PL_INDEX is set to 4, then you'll get the alien spaceship instead.

Bonus: Super VGA & Hires Sprites

If you're lucky enough to have a Nexys Video (or other FPGA with at least 9 Mbit of BRAM), then try creating an 800x600 buffer and using high-resolution sprites: hires_sprites.mem and hires_sprites_palette.mem.

These sprites are 128x128 pixels, so you'll need to increase the size of the sprite buffer to 128 * 1024 and the address width (SPRITEBUF_A_WIDTH) to 17. See the end of part 1 if you need help creating an 800x600 VGA driver and updating your top module.

What's Next?

I'm currently writing the next part, which will make use of the other sprites to add asteroids and aliens. Follow @WillFlux for updates.

Find more on FPGAs and Verilog in the Time to Explore FPGA Index.

©2018-2019 Will Green.

Graphics Credit: The sample spaceship game graphics come from KenneyNL and are in the public domain.