Some Trends in High-level Synthesis Research Tools Tanguy Risset



Yüklə 445 b.
tarix27.10.2017
ölçüsü445 b.


Some Trends in High-level Synthesis Research Tools

  • Tanguy Risset

  • Compsys, Lip, ENS-Lyon

  • http://www.ens-lyon.fr/COMPSYS


Outline

  • Context: Why High level synthesis?

  • HLS Hard problems

  • Some solution in existing tools

  • Some on-going projects



Context: Embedded Computing Systems design

  • SoC or MPSoC for multimedia application will soon includes:

    • Network on chip
    • dozens of initiators (CPU, DMA,…)
    • Mbytes of code
    • Operating systems
    • Shared memory coherency protocols
  • SoC Design problems:

    • Time to market
    • Design space exploration
    • Software complexity


Some envisaged solutions

  • Time to market

    • IP re-use
    • High level design
  • Design space exploration

    • Fast prototyping and performance evaluation, refinement methodology (specification, algorithm, TLM, CABA)
  • Software complexity

    • Tools for embedded code generation/embedded OS
  • High level synthesis is only a small part of the « High level Design » process



Definition of High Level Synthesis

  • HLS: Generates register-transfer level description from behavioral specification, in an automatic or semi-automatic way.

  • Input:

    • A behavioral specification
    • Design constraints
    • Library of available RTL components
  • Output:



Refinement : from algorithm to hardware



Abstraction levels for HLS

  • AL = Algorithm prior to HW/SW partition

  • TLM = Transaction-Level Model after HW/SW partition models bit-true behavior, register bank, data transfers, system synchronisation no timing needed

  • T-TLM = Timed TLM (also PVT) TLM + timing annotation refined communication model

  • CABA = Cycle Accurate-Bit Accurate models state at each clock edge

  • RT = Register Transfer (ASIC flow entry point) synthesisable model



Pro’s and Cons

  • « Traditional » motivations:

    • Fast design
    • Safe design : formal refinement approach
    • « Must be used » to cope with Moore’s law
  • But!

    • Commercial tools are not here
    • A new tool is a big investment
    • Designers have managed without it


New motivations ?

  • IP-reuse

    • Slightly change design parameter for re-using IP
  • New target technologies and languages (FPGA, SystemC, etc.)

    • Tools can easily re-target the designs
  • CAD tools companies are investing a lot in « high level-like » synthesis tools

    • Monet, Behavioural compiler, VCC, …
  • Technological advantage



Outline

  • Context: Why High level synthesis?

  • HLS Hard problems

  • Some solution in existing tools

  • Some on-going projects



HLS Hard Problems

  • Huge design space

    • Complex design space exploration
    • Multi-criteria optimization techniques
  • Integration into a design environment

    • Lack of standard interchange format
    • SoC simulation time is a crucial issue
  • Acceptance by the designers

    • Find a language common to SoC designers and tools designer
  • Refinement technical problems

    • (detailed hereafter)


HLS technical problems

  • Compilation occurs when the target architecture is precisely known

  • In HLS, target architecture is only partially specified, Examples:

    • Data-flow architecture/systolic arrays : pure RTL description
    • FSM+data path : closer to processor description
  • HLS technical problems :

    • Initial specification format / language
    • Specification refinement : fixed point arithmetic
    • Scheduling/Mapping refinement: resource constraints
    • Technological Mapping refinement


Initial specification format

  • Restriction on the input language expressivity are necessary

  • … but designers hate new languages

  • C-like language (handel-C, silicon-C,hardware-C, etc…) are actually hardware description languages

  • Main problems:

    • How to express parallelism/sequentially
      • Data-flow, CSP-like, process network, event-driven
    • How to express both algorithmic and RTL description
    • How much expressivity
      • Dynamic control, loops
    • How to introduce constraints/hints


Fixed point arithmetic

  • Problem: translate a floating point computation to fixed point computation

  • Most of the tools start with an initial fixed point specification found by extensive simulation.

  • Automatic techniques are not handling loops

  • In the case of signal processing application the signal processing theory can help (transfer function used to compute signal-to-noise ratio).



Scheduling/Mapping

  • For a « basic bloc », resource constraints scheduling is NP-Hard, but widely studied.

  • Computations

    • Currently, two way to handle loops:
      • Unroll them
      • Keep them sequential
    • Other solutions:
      • Use software pipelining theory
      • Use the polyhedral model
  • Memory and communication

    • Memory mapping is usually strongly guided by the user
      • Highly active research field (Catthoor, Darte)
    • Communication refinement is also an important issue
      • Highly dependent on the chosen computation model (Gajski, Kenhuis)


Technological mapping refinement

  • Fine technological mapping are very target-dependent

  • Predefined libraries are not precise enough

    • Delays on wires
    • Power consumption
  • VLSI designers « tricks » are difficult to integrate in tools

  • Sub-Micronics technologies constraints are changing too fast for high level tools

    • Cross talk
    • Capacitance


Outline

  • Context: Why High level synthesis?

  • HLS Hard problems

  • Some solution in existing tools

  • Some on-going projects



Some solution in existing tools

  • Digital signal processing circuits:

    • Gaut: http://lester.univ-ubs.fr:8080
    • Source: signal processing (one infinite loop)
    • Target: RTL + FSM
  • FSM+datapath

    • Ugh: http://www-asim.lip6.fr/recherche/disydent/
    • Source: restricted C
    • Target: FSM+data path
  • Regular computation and polyhedral Model

    • MMAlpha: http://www.irisa.fr/cosi/ALPHA/
    • Source : functional specification
    • Systolic like architectures


GAUT:Génération Automatic d’Unité de Traitement

  • Developed first at LASTI (Lannion) and then LESTER (Lorient): free

  • Generate RTL description from behavioral description for signal processing algorithm

  • Kernel technology: highly optimized ressource constraint scheduling

  • Inputs are

      • a behavioral VHDL description (one process repeated infinitely)
      • Libraries of operators pre-characterized
      • Some design constraints
  • Outputs are

      • a synthesizable RTL VHDL description (data path, memory, and communication units)
      • Gantt chart for I/O specification




Gaut : VHDL Input code

  • Sequential instruction in one single process (no clock, no reset, no sensitivity list)



Gaut : Input code

  • Types

    • Bit, boolean, std_logic, Integer (single size), Bit_Vector, Std_Logic_Vector
    • Arrays (to be inlined)
  • Sequential instructions

    • Signal and variables assignment
    • Only one level of if
    • For and While loops (to be inlined)
    • Procedure calls (to be inlined)
    • Function calls corresponding to library elements


Gaut step1: Source code transformation

  • Control dependence elimination

    • Loop unrolling
  • y ( 0 ) := x ( 0 ) * h ( 0 ) ; y ( 0 ) := x ( 0 ) * h ( 0 ) ;

  • for i in 1 to n - 1 loop y ( 1 ) := y ( 1 - 1 ) + x ( 1 ) * h ( 1 );

  • y ( i ) := y ( i - 1 ) + x ( i ) * h ( i ) ; y ( 2 ) := y ( 2 - 1 ) + x ( 2 ) * h ( 2 ) ;

  • end loop ; y ( 3 ) := y ( 3 - 1 ) + x ( 3 ) * h ( 3 ) ;

    • Procedure inlining
    • Static single assignment
  • b := x + z ; b := x + z ;

  • a := b + c ; a := b + c ;

  • b := e + f ; b0001 := e + f ;

  • y := b; y := b0001;



Gaut step1: Source code transformation

  • Simple expression generation

  • b := x + z * u ; tmp := z * u ;

  • b := x + tmp ;

  • Constant propagation

  • Generation of GC Graph (Data-Flow Graph Format of Synchronous Programming)



GAUT step 2: Scheduling/Mapping

  • In addition to throughput and clock cycle, the user can give:

    • Ressource constraints and mapping constraints
    • Memory constraints
    • I/O constraints
    • Optimization type
  • The result is an architecture and a GANTT charts

    • For computations
    • For I/O
    • For memory




Gaut step 3: memory and communication synthesis

  • Optimizing memory layout and minimizing buses



Gaut: summary

  • Advantages

    • Advanced development status (still research tool)
    • User guided synthesis
    • Open library
    • Active research team: memory optimization, communication synthesis
  • Drawbacks

    • Loop flattening (complexity problem)
    • Predefined timing characteristics
    • Hard to get out of 1D signal processing


Ugh: User Guided High Level Synthesis

  • Developed at LIP6 (Paris), as part of the Disydent project (Digital System Design Environment): open source

  • Behavioral level synthesis tool for control dominated coprocessor

  • Emphasis on precise timing estimation

  • Kernel technology: ressource constraint scheduling and (GNU-like) compiler construction technology

  • Inputs are

      • a C or VHDL behavioral description with KPN
      • communication primitives
      • a draft data-path
      • a cycle time constraint TC
  • Outputs are

      • a synthesizable RTL VHDL model
      • a cycle accurate simulation model


Coprocessor System Environment



UGH Structure



Input 1 : UGH-C

  • Library IEEE;

  • Use ieee.std_logic_arith.all;

  • entity HCF is

  • port (CK : in bit;

  • DINA : in integer;

  • READA : out bit;

  • ROKA : in bit;

  • DINB : in integer;

  • READA : out bit;

  • ROKA : in bit;

  • DOUT : out integer;

  • WRITE : out bit;

  • WOK : int bit);

  • end HCF;



Input 2 : Draft Data-path



OUTPUT 1 : Refined Data path



OUTPUT 2 : FSM for control



Ugh summary

  • Advantages

    • Precise timing information
    • Multi cycle operation
    • Almost a compiler approach (restricted target architecture)
    • Interfacing (Integrated in a SoC design environment)
  • Drawbacks

    • Development status (research tool)
    • Low level information given by the user
    • Highly dependent on commercial tool (synopsys)
    • Dedicated to control oriented applications


MMAlpha

  • Developed in Irisa (Rennes): open source

  • High level synthesis of highly pipelined accelerators

  • Kernel technology: polyhedral model and systolic design methodology

  • Emphasis on loop transformations

  • Input :

    • functional specification (Alpha langage)
  • Output :

    • RTL description of systolic-like architecture (Alpha or VHDL)


MMAlpha design flow



What is polyhedral model?

  • Abstract a loop nest by the polyhedron described by the loop indices during execution of the loop

  • Can be used for any index-based structure : memory (arrays), communications (accesses), etc…

  • example: convolution (FIR filter)



FIR: iteration space



FIR polyhedral representation (MMAlpha input language)



MMAlpha polyhedral scheduling



MMAlpha space time transformation



MMAlpha mapping



MMAlpha resulting architecture



MMAlpha current features

  • Tool box for designers:

    • Powerful analyze tools
    • Pipelining, Change of basis, multi-dimensionnal scheduling, control signal generation.
    • Code generation (C, VHDL)
    • Hierarchical design methodology
  • Work in progress:

    • Ressource constraint scheduling (extention to Z-polyhedra)
    • Multi-dimensionnal scheduling and memory synthesys


MMAlpha summary

  • Advantages

    • Design tool integrating loop transformation
    • Parameterised design (N: size of the filter not fixed until VHDL generation)
    • Formal approach for refinement (functional to operational)
    • A real language that syntactically captures HLS input restriction
  • Drawbacks

    • Does not yet handle resource constraints
    • A language (Alpha) and design methodology very different from designer’s habits
    • Implementation status (research tool)


Some Design results

  • Ugh compares IDCT with CoWare and Gaut but the results are highly dependent upon design parameters

  • MMAlpha demonstrates real implementation on FPGA co-processor board (DLMS algorithm)



Outline

  • Context: Why High level synthesis?

  • HLS Hard problems

  • Some solution in existing tools

  • Conclusion and on-going projects



HLS conclusion

  • HLS tools are not mature enough to produce the famous « C-to-VHDL » magic tool

  • Most tool designer agree that a highly « user guided » approach is mandatory

  • CAD tools are still actively developping tools (Mentor: Catapult-C, CoWare: Cocentric….)

  • Some progress have been made

    • Domain specific constraints are more clearly identified (control oriented or data flow)
    • Interfacing is studied together with the synthesis
    • Fast simulation is an important issue addressed by HLS tools


On-going project: Data-Flow IP interface

  • Gaut (Lester) and MMAlpha (Irisa, Lip) are developing a common interface for their IPs (data-flow Ips)



On-going project: SocLib

  • SocLib environment

    • Public domain systemC simulation models for SoC IP:
      • Cycle-accurate hardware simulation
      • TLM Simulation
    • VCI interconnection standard
    • French open academic initiative (should become European through EuroSoc):http://soclib.lip6.fr/
  • Typical platform:



On-going project: Loop transformation for compilation

  • Unified loop nest transformation framework for optimization of compute/data intensive programs (Alchemy Inria project: http://www-rocq.inria.fr/~acohen/software.html).

  • WRaP-IT: and Open-64/ORC Interface tool



Thanks

  • Slides with Help from Lester, LIP6

  • Here are some tools I did not talk about: Amical, Cathedral, High2, RapidPath, Flash, A/RT, Compaan, Syndex, Phideo, Bach, SPARK, CriticalBlue, Chinook, SCE, CodeSign, Esterel, precisionC, Polis, Atomium, Ptolemy, Handel-C, Cyber, Bridge, MCSE, Madeo, SpecC, and many more….




Dostları ilə paylaş:


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2017
rəhbərliyinə müraciət

    Ana səhifə