Some Trends in High-level Synthesis Research Tools Tanguy Risset
tarix 27.10.2017 ölçüsü 445 b. #16764
Tanguy Risset Compsys, Lip, ENS-Lyon http://www.ens-lyon.fr/COMPSYS
Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects
Context: Embedded Computing Systems design SoC or MPSoC for multimedia application will soon includes: Network on chip dozens of initiators (CPU, DMA,…) Mbytes of code Operating systems Shared memory coherency protocols … SoC Design problems: Time to market Design space exploration Software complexity
Some envisaged solutions Time to market IP re-use High level design Design space exploration Fast prototyping and performance evaluation, refinement methodology (specification, algorithm, TLM, CABA) Software complexity Tools for embedded code generation/embedded OS High level synthesis is only a small part of the « High level Design » process
Definition of High Level Synthesis HLS: Generates register-transfer level description from behavioral specification, in an automatic or semi-automatic way. Input: Output: RTL description Performance evaluations
Refinement : from algorithm to hardware
Abstraction levels for HLS AL = Algorithm prior to HW/SW partition TLM = Transaction-Level Model after HW/SW partition models bit-true behavior, register bank, data transfers, system synchronisation no timing needed T-TLM = Timed TLM (also PVT) TLM + timing annotation refined communication model CABA = Cycle Accurate-Bit Accurate models state at each clock edge RT = Register Transfer (ASIC flow entry point) synthesisable model
Pro’s and Cons « Traditional » motivations: Fast design Safe design : formal refinement approach « Must be used » to cope with Moore’s law But! Commercial tools are not here A new tool is a big investment Designers have managed without it
New motivations ? IP-reuse Slightly change design parameter for re-using IP New target technologies and languages (FPGA, SystemC, etc.) Tools can easily re-target the designs CAD tools companies are investing a lot in « high level-like » synthesis tools Technological advantage Traditional RTL design will be de-localized to Asia
Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects
HLS Hard Problems Huge design space Complex design space exploration Multi-criteria optimization techniques Integration into a design environment Lack of standard interchange format SoC simulation time is a crucial issue Acceptance by the designers Find a language common to SoC designers and tools designer Refinement technical problems
HLS technical problems Compilation occurs when the target architecture is precisely known In HLS, target architecture is only partially specified, Examples: Data-flow architecture/systolic arrays : pure RTL description FSM+data path : closer to processor description HLS technical problems : Initial specification format / language Specification refinement : fixed point arithmetic Scheduling/Mapping refinement: resource constraints Technological Mapping refinement
Initial specification format Restriction on the input language expressivity are necessary … but designers hate new languages C-like language (handel-C, silicon-C,hardware-C, etc…) are actually hardware description languages Main problems: How to express parallelism/sequentially Data-flow, CSP-like, process network, event-driven How to express both algorithmic and RTL description How much expressivity How to introduce constraints/hints
Fixed point arithmetic Problem: translate a floating point computation to fixed point computation Most of the tools start with an initial fixed point specification found by extensive simulation. Automatic techniques are not handling loops In the case of signal processing application the signal processing theory can help (transfer function used to compute signal-to-noise ratio).
Scheduling/Mapping For a « basic bloc », resource constraints scheduling is NP-Hard, but widely studied. Computations Currently, two way to handle loops: Unroll them Keep them sequential Other solutions: Use software pipelining theory Use the polyhedral model Memory mapping is usually strongly guided by the user Highly active research field (Catthoor, Darte) Communication refinement is also an important issue Highly dependent on the chosen computation model (Gajski, Kenhuis)
Technological mapping refinement Fine technological mapping are very target-dependent Predefined libraries are not precise enough Delays on wires Power consumption VLSI designers « tricks » are difficult to integrate in tools Sub-Micronics technologies constraints are changing too fast for high level tools
Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Some on-going projects
Some solution in existing tools Digital signal processing circuits: Gaut: http://lester.univ-ubs.fr:8080 Source: signal processing (one infinite loop) Target: RTL + FSM FSM+datapath U gh: http://www-asim.lip6.fr/recherche/disydent/ Source: restricted C Target: FSM+data path Regular computation and polyhedral Model MMAlpha: http://www.irisa.fr/cosi/ALPHA/ Source : functional specification Systolic like architectures
GAUT:Génération Automatic d’Unité de Traitement Developed first at LASTI (Lannion) and then LESTER (Lorient): free Generate RTL description from behavioral description for signal processing algorithm Kernel technology: highly optimized ressource constraint scheduling Inputs are Outputs are a synthesizable RTL VHDL description (data path, memory, and communication units) Gantt chart for I/O specification
Gaut : VHDL Input code Sequential instruction in one single process (no clock, no reset, no sensitivity list)
Gaut : Input code Types Bit, boolean, std_logic, Integer (single size), Bit_Vector, Std_Logic_Vector Arrays (to be inlined) Sequential instructions Signal and variables assignment Only one level of if For and While loops (to be inlined) Procedure calls (to be inlined) Function calls corresponding to library elements
Gaut step1: Source code transformation Control dependence elimination y ( 0 ) := x ( 0 ) * h ( 0 ) ; y ( 0 ) := x ( 0 ) * h ( 0 ) ; for i in 1 to n - 1 loop y ( 1 ) := y ( 1 - 1 ) + x ( 1 ) * h ( 1 ); y ( i ) := y ( i - 1 ) + x ( i ) * h ( i ) ; y ( 2 ) := y ( 2 - 1 ) + x ( 2 ) * h ( 2 ) ; end loop ; y ( 3 ) := y ( 3 - 1 ) + x ( 3 ) * h ( 3 ) ; b := x + z ; b := x + z ; a := b + c ; a := b + c ; b := e + f ; b0001 := e + f ; y := b; y := b0001;
Gaut step1: Source code transformation Simple expression generation b := x + z * u ; tmp := z * u ; b := x + tmp ; Constant propagation Generation of GC Graph (Data-Flow Graph Format of Synchronous Programming)
GAUT step 2: Scheduling/Mapping In addition to throughput and clock cycle, the user can give: Ressource constraints and mapping constraints Memory constraints I/O constraints Optimization type The result is an architecture and a GANTT charts For computations For I/O For memory
Gaut step 3: memory and communication synthesis Optimizing memory layout and minimizing buses
Gaut: summary Advantages Advanced development status (still research tool) User guided synthesis Open library Active research team: memory optimization, communication synthesis Drawbacks Loop flattening (complexity problem) Predefined timing characteristics Hard to get out of 1D signal processing
Ugh: User Guided High Level Synthesis Developed at LIP6 (Paris), as part of the Disydent project (Digital System Design Environment): open source Behavioral level synthesis tool for control dominated coprocessor Emphasis on precise timing estimation Kernel technology: ressource constraint scheduling and (GNU-like) compiler construction technology Inputs are a C or VHDL behavioral description with KPN communication primitives a draft data-path a cycle time constraint TC Outputs are
Coprocessor System Environment
UGH Structure
Input 1 : UGH-C Library IEEE; Use ieee.std_logic_arith.all; entity HCF is port (CK : in bit; DINA : in integer; READA : out bit; ROKA : in bit; DINB : in integer; READA : out bit; ROKA : in bit; DOUT : out integer; WRITE : out bit; WOK : int bit); end HCF;
Input 2 : Draft Data-path
OUTPUT 1 : Refined Data path
OUTPUT 2 : FSM for control
Ugh summary Advantages Precise timing information Multi cycle operation Almost a compiler approach (restricted target architecture) Interfacing (Integrated in a SoC design environment) Drawbacks Development status (research tool) Low level information given by the user Highly dependent on commercial tool (synopsys) Dedicated to control oriented applications
MMAlpha Developed in Irisa (Rennes): open source High level synthesis of highly pipelined accelerators Kernel technology: polyhedral model and systolic design methodology Emphasis on loop transformations Input : functional specification (Alpha langage) Output : RTL description of systolic-like architecture (Alpha or VHDL)
MMAlpha design flow
What is polyhedral model? Abstract a loop nest by the polyhedron described by the loop indices during execution of the loop Can be used for any index-based structure : memory (arrays), communications (accesses), etc… example: convolution (FIR filter)
FIR: iteration space
FIR polyhedral representation (MMAlpha input language)
MMAlpha space time transformation
MMAlpha mapping
MMAlpha resulting architecture
MMAlpha current features Tool box for designers: Powerful analyze tools Pipelining, Change of basis, multi-dimensionnal scheduling, control signal generation. Code generation (C, VHDL) Hierarchical design methodology Work in progress: Ressource constraint scheduling (extention to Z-polyhedra) Multi-dimensionnal scheduling and memory synthesys
MMAlpha summary Advantages Design tool integrating loop transformation Parameterised design (N: size of the filter not fixed until VHDL generation) Formal approach for refinement (functional to operational) A real language that syntactically captures HLS input restriction Drawbacks Does not yet handle resource constraints A language (Alpha) and design methodology very different from designer’s habits Implementation status (research tool)
Some Design results Ugh compares IDCT with CoWare and Gaut but the results are highly dependent upon design parameters MMAlpha demonstrates real implementation on FPGA co-processor board (DLMS algorithm)
Outline Context: Why High level synthesis? HLS Hard problems Some solution in existing tools Conclusion and on-going projects
HLS conclusion HLS tools are not mature enough to produce the famous « C-to-VHDL » magic tool Most tool designer agree that a highly « user guided » approach is mandatory CAD tools are still actively developping tools (Mentor: Catapult-C, CoWare: Cocentric….) Domain specific constraints are more clearly identified (control oriented or data flow) Interfacing is studied together with the synthesis Fast simulation is an important issue addressed by HLS tools
On-going project: Data-Flow IP interface Gaut (Lester) and MMAlpha (Irisa, Lip) are developing a common interface for their IPs (data-flow Ips)
On-going project: SocLib SocLib environment Public domain systemC simulation models for SoC IP: Cycle-accurate hardware simulation TLM Simulation VCI interconnection standard French open academic initiative (should become European through EuroSoc):http://soclib.lip6.fr/ Typical platform:
On-going project: Loop transformation for compilation Unified loop nest transformation framework for optimization of compute/data intensive programs (Alchemy Inria project: http://www-rocq.inria.fr/~acohen/software.html). WRaP-IT: and Open-64/ORC Interface tool
Thanks Slides with Help from Lester, LIP6 Here are some tools I did not talk about: Amical, Cathedral, High2, RapidPath, Flash, A/RT, Compaan, Syndex, Phideo, Bach, SPARK, CriticalBlue, Chinook, SCE, CodeSign, Esterel, precisionC, Polis, Atomium, Ptolemy, Handel-C, Cyber, Bridge, MCSE, Madeo, SpecC, and many more….
Dostları ilə paylaş: