Architecture
FPOA Architecture Overview
The Field Programmable Object Array™ (or FPOA™) contains hundreds of heterogeneous, medium-grained processing elements called “objects”. Each of these objects is interconnected by multiple 16 bit data and 5 bit control paths that operate at speeds up to 1 GHz. Multiple objects can be cascaded to create wider data paths while maintaining 1 GHz operation. Each object has its own program and data memories and operates without the aid of global control. Within the array, the data paths and control paths are loosely coupled and independently configured. Download Arrix FPOA Architecture Guide
Silicon Objects
FPOA objects come in two basic types: core objects and periphery objects. Core objects operate at clock rates up to 1 GHz and typically perform high-speed computations. Periphery objects provide additional memory resources and access to I/O.
The three types of core objects are the Arithmetic Logic Unit (ALU) for logical and mathematical functions, the Multiply Accumulator (MAC) for 16x16 multiply-accumulate operations, and the Register File (RF) for buffering data as a FIFO or configurable RAM. Periphery objects include dedicated Internal RAM (IRAM), External DRAM (XRAM), High-Speed I/O (RX/TX) and General Purpose I/O (GPIO). All core and periphery objects are interconnected by a 1 GHz Programmable Communication Framework. The ratio and placement of different objects allow the FPOA to be programmed for high-performance image, video and signal processing applications.
1 GHz Programmable Communication Framework
Communication between objects is achieved via two complementary mechanisms. First, each object can transmit to or receive data from each of eight adjacent objects via Nearest Neighbor connections with zero latency. As the distance between objects increases, Party Line connections provide pipelined connectivity, allowing data transfer at the full core clock rate. For FPOAs operating at 1 GHz, Party Line connections provide data movement to a distance of up to four objects within a single clock cycle. Objects can be programmed to change communication patterns on a per-clock basis.

The ALU is a 1 GHz, programmable, multi-state core object that provides a general-purpose 16-bit arithmetic logic block for data operations, and four general-purpose truth functions for control bit operations. The ALU object is made up of four components:
- Arithmetic Logic Block (ALB) - performs a wide range of arithmetic, logical, and multiplexing functions. The ALB operates on R bits and is controlled through an instruction state machine.
- Instruction State Machine — The ALU contains an eight state instruction machine, where each state supplies the ALB input selection, an ALB operation code (opcode), and result destination information.
- Truth Functions (TFs) — Four general-purpose truth functions operate on and generate control bits.
- Truth Function for the ALB (TFA) — An additional truth function provides control signals to the ALU's instruction state machine. These controls allow dynamic instruction flow control (next state determination), ALU instruction operation overrides, and can block ALU result writes to destination registers.

The MAC performs multiply and accumulate functions at speeds up to 1 GHz. The multiplier function multiplies two 16-bit inputs and generates a 32-bit result plus carry. The accumulator function adds a 32-bit input to an existing number and, depending on the configuration, provides a 40-bit output result. Operands and results can be signed or unsigned integers or signed fixed fractional numbers (Q15 format).
The RF object is essentially a 1 GHz, dual-ported local memory resource. The RF contains 64 memory locations of 20 bits each (16 data bits + 4 control bits). The RF object supports three operating modes (RAM mode, FIFO mode, and Read Sequence mode). All modes support simultaneous read and write every clock cycle. The Register File content can be preloaded during FPOA initialization.
The FPOA architecture supports internal SRAM (also known as block RAM) in the periphery. The configuration and size of IRAM blocks is dependent on the FPOA product family. The Arrix Family of FPOAs support multiple IRAM blocks that are configured to 2048x76 bits for a total capacity of ~19KB per block.
The FPOA architecture supports external DRAM (XRAM) in the periphery. The configuration and size of the XRAM controller is dependent on the FPOA product family. For the Arrix family of FPOAs, there are two independent XRAM controllers where each XRAM controller provides access to external 36-bit Double-Data-Rate (DDR) Reduced Latency DRAM (RLDRAM-II) memory. Each XRAM controller supports three possible configurations of external RLDRAM-II with memory sizes up to 144 Mbytes and very low latency access. The controller contains all necessary external memory refresh logic. The maximum supported XRAM clock frequency is 300 MHz and the maximum bandwidth per XRAM interface is 2.7 GBytes/sec.
High Speed I/O is accomplished through separate transmit (TX) and receive (RX) interfaces. Each interface is a high-speed, parallel LVDS interface 17 bits in width (16 data and 1 tag) with a source-synchronous clock input. The RX interface can be configured to operate in either double data rate (DDR) or single data rate (SDR) mode. The RX clock frequencies supported are between 116 MHz and 500 MHz for DDR, and between 116 MHz and 640 MHz for SDR. The TX interface can be configured to operate in either double data rate (DDR) or single data rate (SDR) mode. In DDR, data is transmitted at the rising and falling edge of the clock. The clock frequencies supported are between 18.75 MHz and 500 MHz for DDR, and between 18.75 MHz and 640 MHz for SDR. The Arrix Family of FPOAs have two RX and two TX ports for a total high-speed I/O bandwidth of 64 Gbps.
The GPIO interface facilitates communication between the FPOA and external devices.
There are typically hundreds of GPIO pins on an FPOA, each supporting LVCMOS style buffering and both synchronous and asynchronous operation. In the Arrix Family of FPOAs, the GPIO data and clock pins allow operation at frequencies up to 100 MHz.
FPOA Initialization and Control
There are three interfaces involved in initialization and control. The PROM controller oversees the loading and initialization process of the FPOA. A JTAG controller provides an alternate way to load an FPOA configuration and provides access to memory, such as IRAM, to aid in debugging. The Control object can be used to stop the core clock - it also contains a PLL, which multiplies an external reference clock to generate the FPOA core clock.
| Document | Description | Document |
| Arrix FPOA Architecture Guide | Theory of operations for the Arrix Product Family | |
| Arrix FPOA Data Sheet | Detailed timing, pin descriptions, configuration options and operating modes for the Arrix FPOA Product Family |
If you would like to learn more about MathStar's architecture, please contact your local sales rep today.
Thank You!
