This project is based on YosysHQ/icestorm, which was published and announced in 2015:
Project IceStorm aims at documenting the bitstream format of Lattice iCE40 FPGAs and providing simple tools for analyzing and creating bitstream files. (…) The focus of the project is on the iCE40 LP/HX 1K/4K/8K chips. Most of the work was done on HX1K-TQ144 and HX8K-CT256 parts.
Later, support for devices of Lattice’s iCE40 Ultra Plus family was added. Moreover, Lattice embraced the open source community by providing a list of Community Sourced development boards. Today, the list of available boards with iCE40 FPGAs supported by open source synthesis tools comprises, but is not limited to: iCEstick Evaluation Kit, iCE40-HX8K Breakout Board, iCEBreaker FPGA, ICESugar, icoBOARD 1.0, Kéfir I iCE40-HX4K, Nandland Go board, myStorm, BlackIce, eCow-Logic, TinyFPGA… See hdl.github.io/awesome/boards.
This site documents existing examples, bootloaders and references about dynamic reconfiguration through the cold/warm boot feature available in iCE40 FPGAs.
This article, examples and tool enhancements are the joint effort of several people.
In 2015, Claire (@clairexen) released Icestorm, including icemulti.
In March 2017, Juanma (@juanmard) and Unai (@umarcor) met and the latter introduced the former to the concept of (partial) dynamic reconfiguration in FPGAs (using high-end devices as a market reference). Juanma was eagger to know more about it, so Unai analysed the configuration options in the iCE40 family manuals and datasheets; then explaining the possibilities that warm/cold boot provided. During a couple of months, a continuous feedback was built. Juanma, modified IceStorm’s icemulti and iceprog for prototyping and proving the features, while Unai proposed solutions for extending the scope beyond four addressable images (see Hypothesis). Juanma published a demo and enhancements to icemulti and iceprog were made available in juanmard/icestorm.
In January 2018, independently, Luke (@tinyfpga) developed the TinyFPGA-Bootloader. That one implemented an USB to SPI core for using the reset image as a passthrough for programming the flash memory. Therefore, although warm/cold boot are not explicitly mentioned in the description, it was, as far as we are aware, the first open source and documented practical use of the feature.
In December 2018, Luke helped Tim (@mithro) and Sean (@xobs) implement the im-tomu/foboot. It was also based on loading a bootloader upon reset for allowing programming a user image in one of the cold/warm boot addresses. However, due to size constraints, it’s a completely different implementation based on a soft core.
In May 2021, Sylvain (@sylefeb) and Bruno (@brunolevy01) were discussing about SPI-flash difficulties on Twitter, when Unai jumped in and let them know about the existing work done together with Juanma, as well as the similarities with both TinyFPGA and FOMU bootloaders. As it happened four years earlier, Sylvain got so excited about the feature and a nice feedback was built between him and Juanma. Sylvain implemented a demo and a tutorial about warmboot. Within less than a couple of days, he implemented the first actual demo using 4+ images (8 precisely): Dynamic warm boot on the ice40, proof of concept.
Overall, this document is an attempt at gathering the information that all those projects have in common (from a theoretical/technical point of view) and for linking to all the specific implementations and examples.
Introduction to iCE40 configuration modes
Since iCE40 FPGAs are SRAM-based, thus volatile, it is common practice to include a flash memory in any board design. That is typically used for automatically loading a configuration on power-up through SPI. As a result, it is common in FPGAs devices to find hard IP cores implementing SPI controllers. Furthermore, according to TN1248: iCE40 Programming and Configuration, the hard SPI cores in the iCE40 devices support not only the master mode required for loading from flash. In slave mode, "the iCE40 configuration data can be downloaded from an external processor, microcontroller, or DSP processor using the SPI interface".
On top of that, some devices support so-called Cold Boot and/or Warm Boot configuration options. That allows writing up to four addressable images/bitstreams to the flash memory, so that any of them can be loaded afterwards, without requiring any additional external communication. That is known as Dynamic Reconfiguration in the FPGA community.
There is another configuration mode: the one-time programmable NVCM (Non-Volatile Configuration Memory). That is, naturally, out of the scope of this project.
Introduction to cold/warm boot
To avoid mixing terms, image is used for referring to the bitstream corresponding to a single design, and pack relates to multiple images packed in a single bitstream.
Since most of the boards are based on the iCEstick, it’s design is the reference for the tests explained below. As shown in the iCEstick User Manual, the FPGA, the Flash memory and the FTDI chip (which is a processor), are all connected to the same SPI bus:
FTDI is always a master. Writes/reads to/from the FPGA or the Flash.
Flash memory is always a slave. It is read from the FTDI or the FPGA.
FPGA is master/slave depending on the configuration mode.
In slave mode, it is written by the FTDI.
In master mode, data is read from the Flash.
On top of that, the FTDI chip controls the programming reset signal of the FPGA.
After exiting the Power-On Reset (POR) state or when CRESET_B returns High after being held Low, the iCE40 device samples the logical value on its SPI_SS_B pin.
If the SPI_SS_B pin is sampled as a logic ‘1’ (High), then …
If enabled to configure from NVCM, the device configures itself using the Nonvolatile Configuration Memory (NVCM).
If not enabled to configure from NVCM, then the device configures using the SPI Master Configuration Interface.
If the SPI_SS_B pin is sampled as a logic ‘0’ (Low), then the device waits to be configured from an external controller or from another device in SPI Master Configuration Mode using an SPI-like interface.
TN1248, pp 3-4
Therefore a single bitstream can be directly loaded to the FPGA with:
That is, in the iCEstick and similar boards, the FTDI resets the FPGA by asserting
CRESET_B and lets it power up in
slave SPI mode by keeping
Then, the image is written to the SRAM directly.
The flash memory ignores any command, because asserting the communication to the FPGA disables the memory’s chip select.
This is explained in detail in [TN1248, pp 17-20].
However, with the option above, the FPGA will lose it’s functionaly as soon as it is powered off. To avoid so, the following command can be used instead:
This time, the FTDI explicitly holds the FPGA in reset state and asserts the chip select signal of the flash memory. Then, the image is written to the flash memory. When the transference is complete, the reset state is released and the FPGA is powered up in master SPI mode. Therefore, the image written just before is loaded from the flash memory. For more information check [TN1248, pp 10-13].
TODO: to pos 0? Is an applet added?
Coarse understanding of the bitstream format
Instead of thoroughly analyzing the details of the format, which is explained at bygone.clairexen.net/icestorm/format, a naive approach was followed. Four bitstreams generated with Yosys and nextpnr were analyzed:
Checking the size reveals that all of the images require the same number of bytes: 31.4 KB (32220 bytes), although 32KB are required on disk.
An hexadecimal dump of the images, reveals that, as expected, the first eight bytes are the same:
$ hexdump -C img01_counter8.bin > img01.dump $ hexdump -C img02_blink.bin > img02.dump $ hexdump -C img03_led_on.bin > img03.dump $ hexdump -C img04_pushbutton_and.bin > img04.dump
00000000 ff 00 00 ff 7e aa 99 7e 51 00 01 05 92 00 20 62 00000010 01 4b 72 00 90 82 00 00 11 00 01 01 00 00 00 00 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Actually, in the example images at least the first 32 bytes are the same. However, that may change if more heterogeneous designs sets are used.
As it is explained below, that is the applet, a table/index of addressable images.
Packing up to four images
In [TN1248, pp 14-15] the Cold Boot Configuration Option is explained. The procedure is roughly the same as the second one explained in the single bitstream example above, but up to four images can be written at the same time. The advantage is that this allows the user to later change from one image to another without requiring an external processor for transferring it.
To support such a feature, an applet is written to the first addresses of the flash. Then, when the cold boot option is enabled:
(…) the iCE40 FPGA boots normally from power-on or a master reset (CRESET_B = Low pulse), but monitors the value on two PIO pins that are borrowed during configuration (…). These pins, labeled PIO2/CBSEL0 and PIO2/CBSEL1, tell the FPGA which of the four possible SPI configurations to load into the device.
(…) If the applet is written, but the cold boot option is disabled:
(…) the FPGA configuration starts from the default location (image 0) defined in the Cold/Warm Boot applet.
TN1248, pp 3-4
|Actually, five image can be addressed since there is a fifth one identified as the power-on reset image.|
Packing images is achieved with a tool named icemulti from the IceStrom toolchain.
Usage: icemulti [options] input-files -c coldboot mode, power on reset image is selected by CBSEL0/CBSEL1 -p0, -p1, -p2, -p3 select power on reset image when not using coldboot mode -a<n>, -A<n> align images at 2^<n> bytes. -A also aligns image 0. -o filename write output image to file instead of stdout -v verbose (repeat to increase verbosity)
For example, to program four images at a time, by setting the first one as the default and not enabling cold boot:
$ icemulti -p0 -o pack_cp0.bin img01_counter8.bin img02_blink.bin img03_pushbutton_and.bin img04_led_on.bin $ iceprog pack_cp0.bin init.. cdone: high reset.. cdone: low flash ID: 0x20 0xBA 0x16 0x10 0x00 0x00 0x23 0x54 0x82 0x46 0x06 0x00 0x56 0x00 0x29 0x19 0x01 0x16 0xA4 0xB5 file size: 130524 erase 64kB sector at 0x000000.. erase 64kB sector at 0x010000.. programming.. reading.. VERIFY OK cdone: high Bye.
The same pack can be generated with the second image as the default option by changing
When programming any of these packs, the transference will last longer than in the single image example, because four
full images are being written.
However there will be no functional difference, since only the default image will be used by the FPGA.
This is a good starting point for understanding how packs are generated.
The size of both packs is the same: 127 KB (130524 bytes), on disk 128KB.
As done previously, an hexdump of one of the packs was generated.
If we compare it with the hexdump of a single image, the starting point of each of them is easily found.
Indeed, looking for
ff 7e aa 99 7e is enough.
In the following block only the most meaninful parts are shown:
00000000 7e aa 99 7e 92 00 00 44 03 00 01 00 82 00 00 01 00000010 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000020 7e aa 99 7e 92 00 00 44 03 00 01 00 82 00 00 01 00000030 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040 7e aa 99 7e 92 00 00 44 03 00 80 00 82 00 00 01 00000050 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060 7e aa 99 7e 92 00 00 44 03 01 00 00 82 00 00 01 00000070 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080 7e aa 99 7e 92 00 00 44 03 01 80 00 82 00 00 01 00000090 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000000a0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff * 00000100 ff 00 00 ff 7e aa 99 7e 51 00 01 05 92 00 20 62 00000110 01 4b 72 00 90 82 00 00 11 00 01 01 00 00 00 00 * 00008000 ff 00 00 ff 7e aa 99 7e 51 00 01 05 92 00 20 62 00008010 01 4b 72 00 90 82 00 00 11 00 01 01 00 00 00 00 * 00010000 ff 00 00 ff 7e aa 99 7e 51 00 01 05 92 00 20 62 00010010 01 4b 72 00 90 82 00 00 11 00 01 01 00 00 00 00 * 00018000 ff 00 00 ff 7e aa 99 7e 51 00 01 05 92 00 20 62 00018010 01 4b 72 00 90 82 00 00 11 00 01 01 00 00 00 00
Then, we can derive the following memory map:
These addresses match columns 10-12 in lines
Hence, those three bytes tell the warm/cold boot feature where to load the bitstream from.
Furthermore, the address of img01 is also present in columns 10-12 at address 0x00000000.
That is the power-on reset image.
Every image, except
img01 is placed in a 32KB section, which makes sense if no compression is used at all.
The space for
img01 is smaller, because of the applet.
However, since images require 32220 bytes, there are still 292 free bytes.
Indeed, there are 3*548+292=1936 free bytes between images in the space
0x00000000-0x0001FFFF, which can be used for
user applications, even if cold boot is active.
Moreover, the hexdump of the second pack is equal to the previous one, except for the power-on reset image, which is set
img02 instead of
img01. Actually, they differ in a single byte:
00000000 7e aa 99 7e 92 00 00 44 03 00 01 00 82 00 00 01 | pack_cp0.bin 00000000 7e aa 99 7e 92 00 00 44 03 00 80 00 82 00 00 01 | pack_cp1.bin
Therefore, even though vector addresses (4) are mentioned in [TN1248, Fig. 11], a single one is used when
is not passed to icemulti.
TODO: what’s the byte-difference between cold-boot active/inactive? See icemulti sources.
TODO: option "-c" to activate cold boot and select with CBSELx
Warm boot demo
The warm boot feature is functionally the same as the cold boot.
The same external memory layout is used.
The only difference is that warm boot is triggered from inside the FPGA.
That is, a hard module/component named
SB_WARMBOOT needs to be instantiated in each of the designs which should change
to some other under certain conditions.
It has two bits for selecting one of the four images, along with an additional bit for triggering the reboot/reload.
That replaces the external pins used for cold-boot.
Beyond four images/bitstreams
After diving into the existing documentation, and having performed some experiments, the contributors to this project realized that the cold/warm boot feature can be extended far beyond the limit of four (in)directly addressable images/bitstreams. For instance, ~128 images can fit in the 4MB (32Mb) flash included in the iCEstick.
The configuration defaults to reading pointers in fixed positions and directly jumping to them.
Three bytes (24 bits) are reserved for each pointer, so
0xFFFFFF is the largest value they can take.
As a result, up to
floor((224-1)/215)=511 images can be addressed, if a memory of at least 16MB (128Mb) is used and
32KB are used for each image.
The size of the flash memory in the iCEstick is 4MB (32Mb), so up to
floor((222-1)/215)=127 images can be addressed.
The expression for computing the address corresponding to an image in position
x = 0,…,$number_of_images-1
x==0 ? 0x000100 ; (x-1)*0x8000.
If images are appended without free space between them, slightly larger packs can fit:
Apparently, this extended memory map can not be addressed through
However, either the processor or a component in the FPGA can be used for updating just the pointers (applet), allowing
changing between groups of four images in the extended pack.
Cold/warm boot features of Lattice’s iCE40 FPGAs allows mimicking high-performance SoC designs which include programmable logic, such as Xilinx’s Zynq or Intel/Altera’s Arria/Cyclone. The main orchestrator in those systems is expected to be a CPU (either a PC or a microcontroller), which is already true for most of the available open source boards. Furthermore, embedded CPUs can be synthetised. Actually, that’s the case of FOMU, which loads a RISC-V based design as the demo design. That sets a quite large list of devices to choose from. Although not exclusively, examples here are focused on the following:
USB-TTL adapter: FTDI, CH340, PL2303…
External uC: AVR, ARM, FTDI, ESP32…
Embedded uC: VexRiscv (RISC-V), Lattuino (AVR)…
Depending on the design of the boards, multiple connection schemes might be possible in order to achieve the same functional result. See the specific documentation of each of the examples.
The upstream icemulti allows packing up to four images for using the default cold/warm boot features. It also allows the power on reset image. However, it does not currently support features beyond the default usage.
@juanmard extended both icemulti and iceprog for allowing packing any number of images, up to the size of the target memory. See juanmard/icestorm. It also allowed modifying entries in the header/applet for switching the addressable images efficiently. Moreover, he used some spare bytes at the beginning of each image for writing an string identifier of the bitstream. That allows listing the content of the memory (through iceprog) and getting a human readable output.
@sylefeb complemented @juanmard’s solution by writing a hot-swap HDL core that can manipulate the header/applet in the external memory, so that an external CPU or PC is not required for switching addresses/pointers. Furthermore, he cleverly implemented it by employing an unused region of the external memory as an scratchpad and modifying the header, on the fly (while passing through a reduced footprint HDL). Chunks of 256 bytes are used. The actual HDL is a RISC-V soft core requiring ~2K LUTs. Yet, as he explains in sylefeb/Silice: draft/projects/ice40-dynboot, there is room for improvement there!
When using development boards with iCE40 devices which don’t use FTDI for programming,
iceprog cannot be used.
That is the case of e.g. FOMU, which uses
However, @sylefeb found that dfu-util is happy to upload binary files larger than the default bitstream size.
Hence, data can be concatenated and it is the available at address 262144 (warmboot slot) + 104106 (bitstream size) (
see im-tomu/foboot: doc/FLASHLAYOUT.md).
From the potentially hundreds of images available in external memory, only five of them can be directly loaded by the FPGA.
Therefore, all others can be stored in a compressed format.
From a software point of view, there are many compression algorithms adapted for being executed on low power/performance
It might be more challenging to achieve it with a pure HDL solution.
Still, dictionary or block based compressions such as LZO might be efficient enough.
Some quick experiments show that the size of each bitstream can be reduced to 3-5% (from 32KB to 1-1.6KB) by using
Preserving BRAM data
It would be interesting to know whether BRAM data is necessarily overwritten when a new bitstream is loaded. If that is the case, it might be possible to use the pipeline approach from Sylvain for hot-replacing the content of the BRAMs when an image is changed. That would allow the implementation of complex algorithms on the same data. The advantage would be that freshly loaded images could start computation straightaway after load. However, depending on the use case, it might more efficient to have some custom save/load mechanism.
Nevertheless, nextpnr allows specifying absolute placement constraints in the HDL sources. See YosysHQ/nextpnr: master/docs/constraints.md. Hence, that is worth exploring before considering it a dead-end. Sylvain did some preliminary tests, without success: gitter.im/im-tomu/warmboot?at=60919f992cc8c84d850db0dd.
Change default pointer only, through FTDI.
Rearrange the pointers, through FTDI.
Pack more than four images and write the binary to flash.
Set a fifth image as default (which is not referred by any of the four pointers).
Rearrange the poiners, through FTDI.
TODO: CLI to rearrange pointers. measure and compare reconfiguration time.
iCE40 FPGAs do have hard SPI modules, which can be instantiated for user applications (see iCE40™ LP/HX/LM Family Handbook, page 62). Hence, it might be possible to prototype a module/component in HDL for overwriting the pointers in the applet without requiring an external CPU. A look-up-table and some FSM would be required, apart from enough BRAM for holding the minimal ammount of data that needs to be read from the flash.
Cold/warm boot allows dynamic reconfiguration, but partial dynamic reconfiguration is not supported. Therefore, the warm boot module/controller needs to be implemented in each of the images which needs to be capable of dynamically changing to another one. That would provide the illusion of partial reconfiguration with an stop-the-world approach.
It would also be possible to handle uncompressing some image and overwritting one of the existing addressable locations, instead of modifying the pointers. However, dealing with uncompression algorithms in HDL might be non trivial.
gatecat/hrt: a proof-of-concept for using partial dynamic reconfiguration on ECP5 devices.
hackaday.com: Three Part Deep Dive Explains Lattice ICE40 FPGA Details
USB CDC ACM