Block Ram in Verilog with Vivado

Welcome to the FPGA Cookbook.

This is part of a new series of handy recipes to solve common FPGA development problems. Look out for more FPGA cookbook posts soon.

You want to use Block Ram in Verilog with Vivado

This recipe is currently in draft form. Additional content will be added soon.

There are two types of internal memory available on a typical FPGA:

  • Distributed Ram: made from the FPGA logic (LUTs)
  • Block Ram: dedicated memory blocks within the FPGA; also known as bram

Many modern FPGAs have relatively generous allocations of block ram (bram). For example the Xilinx Artix-7 A35T, found in the Arty board, has 1800 Kbits (225 kilobytes) of bram. This makes it an attractive resource for many designs: graphical FPGA projects on this site make extensive use of it. Moreover, Xilinx advises block ram for most use cases where more than 64 data bits are required (see chapter 4 of 7 Series FPGAs CLB User Guide).

However, persuading Vivado to make use of block ram isn't simple a case of changing a preference. You need to create a Verilog implementation that Vivado can infer as block ram. This recipe looks at inferring block ram on Xilinx 7 Series FPGAs (Spartan-7, Artix-7, Kintex-7, and Virtex-7), but this information should be relevant to other developers too.

Feedback to @WillFlux is most welcome.

All testing was done with Vivado 2017.4 using Vivado synthesis and implementation defaults.

Simple Memory Array

The simplest way to create a memory array in Verilog is in one line with the data and address sizes.

For example, to create a 4,096 byte memory array:

reg [7:0] memory_array [0:4095];  // 32,768 bits

Given Xilinx's advice on larger memories you might expect this to use block ram, but Vivado implements this as distributed ram. My understanding is that Vivado uses distributed ram because this design is asynchronous: we've not explicitly tied use of the ram to the clock.

For simple ram like this one LUT can store 64-bits, so this distributed memory will use 512 LUTs. You get a handy summary of resource usage on the Vivado Project Summary tab:

Only certain slices (SLICEM) support distributed ram usage. In the case of the Artix-7 XC7A35T 9,600 of the 20,800 LUTs can be used as ram. So while this array can easily be implemented with distributed ram it's almost certainly better as block ram (and saves logic into the bargain).

Simple Synchronous Ram

In order to persuade Vivado to infer block memory we need a slightly more complex implementation. The following sram module provides simple synchronous ram that Vivado can infer as block ram. It's almost as easy to use as the basic memory array and requires little logic. Being synchronous we need to be concious of allowing clock ticks between actions. For example you can't set the address and read the data back on the same clock. We'll see this when we test the memory in the next section.

module sram #(parameter ADDR_WIDTH = 8, DATA_WIDTH = 8, DEPTH = 256) (
    input clk,
    input [ADDR_WIDTH-1:0] i_addr, 
    input i_write,
    input wire [DATA_WIDTH-1:0] i_data,
    output reg [DATA_WIDTH-1:0] o_data 
    );

    reg [DATA_WIDTH-1:0] memory_array [0:DEPTH-1]; 

    always @ (posedge clk)
    begin
        if(i_write) begin
            memory_array[i_addr] <= i_data;
        end
        else begin
            o_data <= memory_array[i_addr];
        end     
    end
endmodule

The sram module accepts three parameters:

  • ADDR_WIDTH - address size in bits (default: 8)
  • DATA_WIDTH - data size in bits (default: 8)
  • DEPTH - number of items in the memory array (default: 256)

You need to ensure the address width, ADDR_WIDTH, is appropriate for the depth; otherwise some data will be inaccessible. The depth needs to be less than 2ADDR_WIDTH. For example if your depth were 30,000 then the address width should to be 15 bits (215 = 32,768).

The module provides four inputs and one output:

  • clk - the clock
  • i_addr - the address of the item to act on
  • i_write - write enable (read data when false)
  • i_data - input for writing data
  • o_data - output for reading data

Testing Simple Synchronous Ram

To test and demonstrate the module's usage we can create a simple test bench:

`timescale 1ns / 1ps

module sram_basic_testbench();
    parameter ADDR_WIDTH = 8;
    parameter DATA_WIDTH = 8;
    parameter DEPTH = 256;

    reg clk;
    reg [ADDR_WIDTH-1:0] address;
    reg write_enable;    
    reg [DATA_WIDTH-1:0] data_in;
    wire [DATA_WIDTH-1:0] data_out;

    initial begin
        $display("sram test bench from timetoexplore.net.");
        clk = 1;

        #10 write_enable = 1;
        address = 0;
        data_in = 8'haa;  // 1010 1010
        #10 address = 1;
        data_in = 8'h55;  // 0101 0101

        #10 write_enable = 0;
        #10 $display("0x%02h", data_out);  // expect 0x55
        #10 address = 0;
        #10 $display("0x%02h", data_out);  // expect 0xaa
        #10 address = 1;
        #10 $display("0x%02h", data_out);  // expect 0x55

        #10 write_enable = 1;
        address = 1;
        data_in = 8'h2a;  // 0010 1010

        #10 write_enable = 0;
        #10 $display("0x%02h", data_out);  // expect 0x2a

        #40 $finish;
    end

    always begin
        #5 clk = ~clk;  // timescale is 1ns so #5 provides 100MHz clock
    end

    sram sram_test (
        .clk(clk), 
        .i_addr(address), 
        .i_write(write_enable), 
        .i_data(data_in),
        .o_data(data_out));

endmodule

If we look at the waveform we can see what's happening:

Practical Usage

The following example shows usage in a simple application.

Section coming soon...

Dual Ports

The implementation described above is very simple. This is appropriate for many uses, but sometimes you want to read and write at the same time. For this we can extend our design to provide two ports, which can be read and written to independently. Dual ports allow more complex use patterns, for example we can be change one pixel while reading out another for display. External memories (such as DDR3) rarely provide dual ports, so this can be a good reason to build block ram into your design.

Section coming soon...

Testing Dual-Port Synchronous Ram

Section coming soon...

Xilinx Series 7 Block Ram Considerations

Section coming soon...

More Complex Memories

If you have more complex requirements or need to extract more performance from block ram you'll need to dig into 7 Series FPGAs Memory Resources. However, I hope these simple designs have helped you quickly get block ram working in your design.

©2018 Will Green.

Image Credit: The image is a small section of MT4C1024 DRAM from Zeptobars and is licensed under a Creative Commons licence.