如何使用Verilog和FPGA通过一系列组合电路计算传播延迟? [英] How can I calculate propagation delay through series of combinational circuits using Verilog and FPGA?

查看:231
本文介绍了如何使用Verilog和FPGA通过一系列组合电路计算传播延迟?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是FPGA和HDL的新手,但我试图学习并无法弄清这一点.如何通过几个组合逻辑级别来计算或估计传播延迟.我可以仅凭经验确定此结果,还是可以在设计时确定它.在这种情况下,我使用FPGA来实现奇偶校验设置和检查电路.该电路看起来像示例图片所示的异或门的树状网络,只是我打算使用异或16寄存器,因此会有更多的电平或异或操作.我希望能够通过每个电平"异或逻辑来计算传播延迟,因此我可以确定整个奇偶校验和设置操作将花费多少个时钟周期分数或多少纳秒.希望我有道理.

I'm new to FPGA and HDL but I'm trying to learn and cant figure this out. How can I calculate or estimate propagation delay though several levels of combination logic. Can I only determine this empirically or can I figure it out at design time. In this situation I'm using and FPGA to implement a parity setting and checking circuit. The circuit would look like a tree network of xor gates like the example pictures, except I intent xor 16 registers so there will be more levels or xor operations. I would like to be able to calculate the propagation delay though each "level" xor logic so I can determine how many fractions of clock cycles or how many nanoseconds the entire parity checking and setting operations will take. Hope I'm making sense.

非常感谢您的帮助.

推荐答案

正如我在高性能FPGA设计的艺术"中所解释的那样,您需要知识". http://www.fpgacpu.org/log/aug02.html#art 您必须. ..改进您的工具并设计一些测试电路,然后打开时序分析器和FPGA编辑器,然后查看结果,延迟(逻辑和路由)趋向于什么,等等."

You need "The Knowledge" as I explain here in "The Art of High Performance FPGA Design". http://www.fpgacpu.org/log/aug02.html#art "You have to ... crank up your tools and design some test circuits, and then open up the timing analyzer and the FPGA editor and pour over what came out, what the latencies (logic and routing) tend to be, etc."

一段时间后,您将看到这种问题,并且知道(或有一个很好的主意).

After you do that for a while, you will look at this kind of question, and just know (or have a pretty good idea).

例如,在这种情况下,例如,我知道在FPGA中,将使用两层深的4或6输入查找表(4-LUT或6-LUT)的树来构建16输入的XOR.它不能仅在一个LUT深度的电路中实现.因此,在流水线实施中,这种电路的最小延迟将是(在Xilinx时序命名法中):

In this case, for example, I know in an FPGA, a 16-input XOR will be built out of a tree of 4- or 6-input lookup tables (4-LUTs or 6-LUTs) two deep, and it cannot be implemented in circuit only one LUT deep. Therefore the minimum delay for such a circuit in a pipelined implementation is going to be (in Xilinx timing nomenclature):

  • tCKO-时钟到任何16个触发器的输出延迟

  • tCKO -- clock to output delay of any of the 16-flip-flops

tILO-延迟通过第一级LUT

tILO -- delay through the first level LUTs

tAS-假设在同一片中实现了第二级LUTS延迟+触发器建立时间

tAS -- delay through 2nd level of LUTS + flip-flop setup time assuming implemented in the same slice

对于Virtex-6速度-1,我希望它约为1.5 ns.

and for Virtex-6 speed -1 I would expect this to be ~1.5 ns.

正如其他人所说,组件切换延迟数据在相关设备的数据手册中,但净布线延迟则不在.确实,随着时间的流逝,您甚至可能会开始记住关键的延迟,并逐渐了解可以使用多少个FPGA原语(如LUT),并且仍将特定的时钟周期/时钟频率作为目标.

As others have said, the component switching delay data is in the data sheets for your device in question, but the net routing delays are not. Indeed, in time, you may even start to remember the key delays and develop a sense for how many FPGA primitives like LUTs you can use and still make a particular clock period / clock frequency target.

无论如何,我只是用一些我编写的一次性Verilog尝试过:

Anyway I just tried this with some throwaway Verilog I coded up:

module t(clk, i, o);
  input clk;
  input [15:0] i;
  output reg o;

  reg [15:0] d;
  always @(posedge clk) begin
    d <= i;
    o <= ^d;
  end
endmodule

和一个简单的UCF文件:

and a simple UCF file:

net clk period = 1.5 ns;

,我设备的总延迟约为1.4 ns.自己尝试一下,看看!

and the total delay in my device was about 1.4 ns. Try it for yourself and see!

这是静态时序分析器输出的一条路径:

Here is one path from the static timing analyzer output:

Paths for end point o (SLICE_X3Y68.A5), 6 paths
--------------------------------------------------------------------------------
Slack (setup path):     0.198ns (requirement - (data path - clock path skew + uncertainty))
  Source:               d_13 (FF)
  Destination:          o (FF)
  Requirement:          1.500ns
  Data Path Delay:      1.248ns (Levels of Logic = 2)
  Clock Path Skew:      -0.019ns (0.089 - 0.108)
  Source Clock:         clk_BUFGP rising at 0.000ns
  Destination Clock:    clk_BUFGP rising at 1.500ns
  Clock Uncertainty:    0.035ns

  Clock Uncertainty:          0.035ns  ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE
    Total System Jitter (TSJ):  0.070ns
    Total Input Jitter (TIJ):   0.000ns
    Discrete Jitter (DJ):       0.000ns
    Phase Error (PE):           0.000ns

  Maximum Data Path at Slow Process Corner: d_13 to o
    Location             Delay type         Delay(ns)  Physical Resource
                                                       Logical Resource(s)
    -------------------------------------------------  -------------------
    SLICE_X3Y67.BQ       Tcko                  0.337   d<15>
                                                       d_13
    SLICE_X2Y68.A2       net (fanout=1)        0.590   d<13>
    SLICE_X2Y68.A        Tilo                  0.068   d<11>
                                                       d[15]_reduce_xor_21_xo<0>1
    SLICE_X3Y68.A5       net (fanout=1)        0.180   d[15]_reduce_xor_21_xo<0>
    SLICE_X3Y68.CLK      Tas                   0.073   d<10>
                                                       d[15]_reduce_xor_21_xo<0>3
                                                       o
    -------------------------------------------------  ---------------------------
    Total                                      1.248ns (0.478ns logic, 0.770ns route)
                                                       (38.3% logic, 61.7% route)

如您所见,数据手册中的逻辑延迟仅为480 ps,而净布线延迟为770 ns,时钟偏斜等则更多,总计不到1.3 ns.这实际上比700 MHz/1.43 ns的全局时钟树上的组件切换限制/Fmax快.

As you can see, the logic delays from the datasheets are only about 480 ps whereas the net routing delays are 770 ns and clock skew etc. is a bit more, total under 1.3 ns. This is actually faster than a component switching limit / Fmax on the global clock tree of 700 MHz / 1.43 ns...

因此,总而言之,当您尝试一些测试电路并尝试对其进行调整时,您将获得经验,可帮助您估算电路在以LUT之类的FPGA原语实现时的运行速度.

So in summary, as you try some test circuits, and trying tuning them, you will get experience that helps you estimate how fast your circuit will run when implemented in FPGA primitives like LUTs.

如果真的很重要,那么没有什么东西可以通过综合,布局布线和静态时序分析来实现设计.不要忘记添加时序约束以使工具具有一定的针对性,然后尝试迭代地降低最小时钟周期,直到收敛到最小周期为止.

And if it really matters, there is no substite for implementing the design through synthesis, place-and-route, and static timing analysis. Don't forget to add timing constraints to give the tools something to target, and then experiment lowering the min clock period iteratively until you converge on a min period.

黑客很开心!

这篇关于如何使用Verilog和FPGA通过一系列组合电路计算传播延迟?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆