1 Introduction
This study concerns high level VHDL programming [1, 2, 3, 4]. A first approach is to provide a solution to combinational synthesis of loops. Sequential synthesis is a second approach in order to explore many multi-cycles solutions. A simple solution to deal with combinational loops of N iterations is to unroll them in 1 to N clock cycles by reducing the combinational loop and introducing a counter of cycles as we will see in the second part of this paper. Automatic pipeline synthesis is the last approach aimed to improve the throughput of the synthesized component. Several tools have introduced such possibilities in order to explore architectural solutions and compromises between full combinational, full sequential, pipelined or not solutions [5, 6, 7]. The aim of this paper is to present the two first approaches of loop synthesis in the two first parts. The last part will illustrate these approachs with a simple algorithm, and then the conclusion will summarize the topic and propose future work.
2 Combinational dynamic loop synthesis
We have considered here that the data types used are from integer type. This is not a restriction because any type can be transformed, through coding, in a range of integer.
The two main loop constructs are for loops and while loops. Both can be synthesized by means of loop unrolling [8].
For loops follow this structure :
for i in A to B loop….end loop ;
It can be transformed in :
for i in min(A) to max(B) loop
if i>=A and i<=B then…
with min(A) the left value of A type and max(B) the right value of B type. A and B are the result of dynamic expressions. The size of A and B can be obtained with range calculus if their expressions are only arithmetic and logical ones, that means if there is no condition in their calculus. In that case, the size of A and B can be calculated by propagating the size of the input types in the expressions using A and B. This method points out the important aspect of well defining the data types in order to restrict them to their real range. By this way, it is possible to obtain the range of the result of all arithmetic and logical expressions.
If we consider R(A)=[A-,A+] the range of A and R(B)=[B-,B+], we can find the range of all the results of the VHDL operators using A and B like :
R(B-A)=[(B-)-A+,(B+)-A-]
R(A mod B)=[0,min(A+, (B+)-1)]
R(kA)=[kA-,kA+] if k>=0
R(kA)=[kA+,kA-] otherwise,
and so on for all the operators.
With such operations on ranges, it is possible to propagate the ranges in the direct data dependencies graph of the algorithm to obtain the range of the for loops size. The general case uses dynamic conditions that have to be explored exhaustively, that means we have to consider all the cases of the conditions and to obtain thus all the possible ranges of the final data (A and B in our example).
This step of ranges is a maximal sufficient condition. That means if we take the union of this ranges, we can provide a solution to the maximum size of the loop, that is in fact less or equal than this size. We could improve this result by propagating ranges in the conditions and reducing each range in accordance to the condition. This is possible in the general with set treatment and not range treatment. We have limited at present our study to ranges. Our future work will focus on set definitions and operations in order to improve maximum loop size calculus.
For example, if we consider this condition “ if A<4 then ”, the range of A after the condition is reduced to (A-,4), but with this condition “ if A mod 2 = 1 then ”, the range of A after the condition is (A- +1, A+ -1) when A- mod 2=0 and A+ mod 2=0 else (A-, A+) when A- mod 2=1 and A+ mod 2=1 else (A- +1, A+) when A- mod 2=0 and A+ mod 2=1 else (A-, A+ +1) when A- mod 2=1 and A+ mod 2=0.
With a set and not a range, the resulting set of A would have half the number of values of the initial set of A (even or odd values).
Most of real cases don’t require the use of sets to find the maximum size because conditions are often simples and not nested, so the range of the set is sufficient.
The second loop construct is
while condition loop…end loop ;
That can be transformed in :
for i in 0 to N-1 loop
if condition then…
with N the maximum size of the loop.
The maximal size can be obtained by the same way than for the previous solution, but here, by unrolling the loop and propagating the ranges of the inputs in order to determine the maximal unrolling, that is to say when the condition is always met. If there is no maximal size loop, the unrolling will not converge to meet the condition, and the tool will then detect this non convergence and inform the designer.
The treatment is the following one :
The unrolling of the loop will conduct to define arrays of values (one value per iteration of the loop) for each data used in the loop. Also, the condition can be expressed like a comparison of a data to zero (c>0, c=0…). Let us consider this example :
R(A)=(0,15)
R(B)=(1,15)
while (AFirst step : modify the condition and extract all the dependencies of the condition from the loop, that means eliminate the code that is not linked to the loop exit condition. In our case, we obtain :
while C>0 loop A :=A*2 ;
C :=B-A ;
end loop ;
Second step : organize each data as an array of values, one per iteration :
A[0] :=A
B[0] :=B
C[0] := B[0]- A[0]
while C(i)>0 loop
A[i+1] :=A[i]*2 ;
C[i+1] :=B[0]-A[i+1] ;
end loop ;
Last step : unroll the loop with ranges until the condition is met and verify the convergence.
R(A[0]) :=R(A)
R(B[0]) :=R(B)
R(C[0]) := R(B[0]- A[0])
:= [(B-)-A+,(B+)-A-]
i :=0
if C[0]+>0
i :=1
R(A[1]) :=R(2*A[0]) :=(2*A[0]-, 2*A[0]+)
R(B[1]) :=R(B[0])
R(C[1]) := R(B[0]- A[1])
:= [(B[0]-)- A[1]+,( B[0]+)- A[1]-] :=[(B[0]-)-2*A[0]+,(B[0]+)- 2*A[0]-]
if C[1]+>0
i :=2
R(A[2]) :=R(4*A[0]) :=(4*A[0]-, 4*A[0]+)
R(B[2]) :=R(B[0])
R(C[2]) := R(B[1]- 2*A[1])
:= [(B[0]-)- 2*A[1]+,( B[0]+)- 2*A[1]-] :=[(B[0]-)-4*A[0]+,(B[0]+)- 4*A[0]-]
Idem if C[2]+>0
i :=3
R(C[3]) := R(B[2]- A[2])
:=[(B[0]-)-8*A[0]+,(B[0]+)- 8*A[0]-]
Considering the previous range definition, we obtain :
R(C[0])=(-15, 14)
R(C[1])=(-30, 13)
R(C[2])=(-60, 11)
R(C[3])=(-120, 7)
R(C[4])=(-240,-1)
So N=4 because C[4]<0. The maximal size of the loop is 4. The propagation of ranges can also be performed in more complex algorithms with conditions as mentioned in the previous part. We are developing an automatic VHDL precompiler that modifies such dynamic loops.
3 Sequential loop synthesis
The most difficult part is combinational loop synthesis. The sequential synthesis is only an over set of combinational loop synthesis. This can be done with the splitting of the loops in two parts : one combinational nested loop and one sequential loop. The sequential one is performed with one cycle per iteration. This is a multicycles solution for the implementation of a loop. For example, let us consider the previous case. The synthesizable while loop obtained was :
for i in 0 to N-1 loop
if condition then…
The sequential one organized with k clock cycles is then :
if rising_edge(clk) then
if start_algo then j :=k
elsif j>0 then
for i in 0 to N/k-1 loop
if condition then…
The k constant has to be provided by the user as a timing constraint to the CAD tool that calculates it directly according to these constraints.
The pipeline solution, which is similar to software pipelining [8], is another approach to sequential loop synthesis where each iteration or each set of iterations of the loop is designed as a stage of the pipeline, that is to say that the datas calculated in the iteration are organized as an array propagated from stage to stage according to the iterations of the loop.
By this way, the loop is also propagating the “ start_algo ” signal to indicate the validity of the datas at the end of the pipeline. The previous example can be expressed as follows, with k2 stages and k cycles per stage (multi-cycles stages) :
if (rising_edge(clk)) then
for i in k2-1 downto 0 loop
if start_algo(i) then
j(i) :=k ;
nb(i) :=0 ;
start_algo(i) :=start_algo(i-1) ;
A[i] :=A[i-1] ;
B[i] :=B[i-1] ;
elsif j(i)>0 then
for u in 0 to N/k-1 loop
if A[i]< B[i] then
A[i] := A[i]*2 ;
nb(i) :=nb(i)+1 ;
end if ;
end loop ;
else
j(i) :=k ;
nb(i) :=0 ;
start_algo(i) :=start_algo(i-1) ;
A[i] :=A[i-1] ;
B[i] :=B[i-1] ;
end if ;
4 Full example
4.1 Algorithm and hypothesis
We take here, as an example to illustrate this method, the typical algorithm used to trace segments of pixels in an image. This algorithm is the following one, written in the C language :
Void segment (int xd, int yd, int xf, int yf)
{int dx, dy, s, xinc, yinc, x, y ;
if ((dx=xf-xd)<0)
{ dx=-dx ; xinc=-1 ;} else xinc=1 ;
if ((dy=yf-yd)<0)
{ dy=-dy ; yinc=-1 ;} else yinc=1 ;
x=xd ;
y=yd ;
if (dx>=dy)
{s=-(dx>>1) ;
while (x != xf)
{point (x,y) ;
x+=xinc ;
if ((s+=dy)>=0)
{ y+=yinc ;
s-=dx ;}
}
}
else
{s=-(dy>>1) ;
while (y != yf)
{point (x,y) ;
y+=yinc ;
if ((s+=dx)>=0)
{ x+=xinc ;
s-=dy ;}
}
}
point(xf,yf) ;
}
where the “ point ” function allows to print a point on a screen.
Before we treat this example, we make the following hypothesis :
-the ranges of x, xd, xf, are from 0 to 639 and those of y, yd, yf from 0 to 479. (like those of a PC screen)
-dx > 0, dy > 0, that is to say the direction of the segment is always from the smallest coordonnates to the highest ones.
-dx > dy, that is to say the segment will be under the diagonal of the image. This simplifies the problem but does not change anything of the result because the two loops of the algorithm are completly symetrics.
With these hypothesis, the loop can be written this way :
while (x < xf)
{point (x,y) ;
x+=xinc ;
if ((s+=dy)>=0)
{ y+=yinc ;
s-=dx ; }
}
4.2 Combinational loop
If we consider the case of a combinational dynamic loop synthesis, we obtain then the following loop, in VHDL, by transforming the previous one as explained before (we suppose that all the points are put in an array of values) :
while (c > 0) loop
tabx[h] :=x ;
taby[h] :=y ;
h :=h+1 ;
x :=xinc+x;
c :=xf-x ;
s :=s+dy ;
if (s>=0) then
y :=yinc+y ;
s :=s-dx ;
end if ;
end loop ;
When we begin to unroll the loop, the range for x and xf is [0,639] and the range of c is [-639, 639]. At the first step of the unrolling, these ranges become [1, 640] for x and [-640, 638] for c Keeping on unrolling the loop until the convergence is verified, that is until c is negative, we obtain the maximal size of the loop when the range for x is [5639,1278] and the range for c is [-1278, 0], and they are obtained at the 638th step. Of course, because of the symmetry of the algorithm, if we have chosen dy>dx, we would have obtained a maximal loop size of 478.
4.3 Sequential loop
As we have already said, this loop will be divided in two loops, one nested combinational and one sequential. So, knowing that the maximal size of the loop is 638, and if we want to do the algorithm in k clock cycles, the loop becomes the following one :
if rising_edge(clk) then
if start_algo then
j :=k;
h :=0 ;
elsif j>0 then
for i in 0 to 638/k-1 loop
if c>0 then
tabx[h] :=x ;
taby[h] :=y;
h :=h+1 ;
x :=xinc+x ;
c :=xf-x ;
s :=s+dy ;
if (s>=0) then
y :=yinc +y;
s :=s-dx ;
end if ;
end if;
end loop ;
j :=j-1 ;
h :=0 ;
So, to obtain the segment in k cycles, we treat k times N/k iterations simultaneously.
5 Conclusion
The loop synthesis method presented here can be used in a tool dedicated to help the implementation of regular algorithms composed of loops such as a digital image and signal processing. The first results obtained are encouraging. We are preparing a more general version of our method in order to obtain a complete tool making possible the automatic evaluation of many implementation solutions from the full combinational one to the full sequential one. This will be completed with the introduction of sequential operators, that is to say with multicycles or pipelined versions of the VHDL operators (/, *, mod...) inside the loops to obtain full sequential versions. We are studying heuristics in order to limit the field of investigations by the tool according to timing constraints such as bandwidth and latency. The synthesis of one iteration provides a unit time which is used to guide the dimensions of the pipeline. The use of sets of data is also under study in order to provide a more general tool for any kind of loops.
References:
[1] X1. Author, Title of the Paper, International Journal of Science and Technology, Vol.X, No.X, 19XX, pp. XX-XX.
[2] R. Airiau, J.M. Bergé, V. Olive, L. Rouillard, “ VHDL : langage, modélisation, synthèse ”, Press Polytechniques et Universitaires Romandes, 1998.
[2] K.C. Chang, “ Digital Design and Modeling with VHDL and Synthesis ”, IEEE Computer Society Press, 1997.
[3] IEEE Standard VHDL Language Reference Manual, STD 1076-1993, IEEE, 1993.
[4] E.D. Douglas, T.J. Wilderotter, “ VHDL Synthesis ”, Kluwer Academic Publishers1994.
[5] R. Antri-Bouzar, “ Du câblage à la micro-programmation : Le micro-programme câblé ”, Ph.D. Thesis, INPT, France, 1998.
[6] H. Lidonne, J. Elliot (Mentor Graphics), “ Explorer tous les choix d’architectures ASIC avant la synthèse ”, Revue Electronique, N°86, Nov. 1998.
[7] J.N. Contensou, “ Le câblage des algorithmes ”, Ed. Hermès, France 1995.
[8] K. Ebcioglu, “ A compilation technique for software pipelining of loops with conditional jumps ”, 20th Annual Workshop on Microprogramming, Dec. 1987.
4>