Patents Assigned to Ascenium, Inc.

SEMANTIC ORDERING FOR PARALLEL ARCHITECTURE WITH COMPUTE SLICES

Publication number: 20250085970

Abstract: Techniques for managing compute slice tasks are disclosed. A processing unit comprising compute slices, load-store units (LSUs), a control unit, and a memory system is accessed. The compute slices are coupled. Each compute slice includes an LSU which is coupled to a predecessor LSU and a successor LSU. A compiled program is executed as the control unit distributes slice tasks to the compute slices for execution. A slice task, which includes a load instruction, is distributed to a current compute slice. The current compute slice can execute the slice task speculatively. A previously executed store instruction is committed to memory by a predecessor LSU. Address aliasing is checked between an address associated with the previously executed store instruction and the load address associated with the load instruction. The slice task running on the current compute slice can be cancelled when aliasing is detected.

Type: Application

Filed: September 6, 2024

Publication date: March 13, 2025

Applicant: Ascenium, Inc.

Inventor: Jacob John Vorland Taylor
PARALLEL ARCHITECTURE WITH COMPILER-SCHEDULED COMPUTE SLICES

Publication number: 20250021405

Abstract: Techniques for task processing based on compiler-scheduled compute slices are disclosed. A processing unit comprising compute slices, barrier register sets, a control unit, and a memory system is accessed. Each compute slice includes an execution unit and is coupled to other compute slices by a barrier register set. A first slice task is distributed to a first compute slice. A second slice task is allotted to a second compute slice, based on a branch prediction logic. The second compute slice is coupled to the first by a first barrier register set. Pointers are initialized. A compiled program is executed, beginning at the first compute slice. The second slice task can be executed in parallel while a branch decision is being made. If the branch decision determines that the second slice task is not the next sequential slice task, results from the second compute slice are discarded.

Type: Application

Filed: July 11, 2024

Publication date: January 16, 2025

Applicant: Ascenium, Inc.

Inventors: Tore Jahn Bastiansen, Peter Aaser, Trond Hellem Bø
PARALLEL PROCESSING ARCHITECTURE WITH BLOCK MOVE BACKPRESSURE

Publication number: 20240419507

Abstract: Techniques for monitoring block moves in an array of compute elements and applying backpressure are disclosed. An array of compute elements is accessed. The array of compute elements is coupled to at least one data cache. The data cache provides memory storage for the array of compute elements. Control for the array of compute elements is enabled by a stream of wide control words generated by the compiler. A load address and a store address comprising memory block move addresses are generated. The memory block move addresses point to memory storage locations in the at least one data cache. Load buffers are coupled to the array of compute elements. The load buffers are located adjacent to at least one edge of the array of compute elements. A memory block move is executed using at least one of the load buffers, based on the memory block move addresses.

Type: Application

Filed: August 30, 2024

Publication date: December 19, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH BLOCK MOVE SUPPORT

Publication number: 20240385965

Abstract: Techniques for task processing are disclosed. An array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements. The array of compute elements is coupled to at least one data cache. The data cache provides memory storage for the array. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. A load address and a store address are generated. The load and the store addresses comprise memory block move addresses. The memory block move addresses point to memory storage locations in the data cache. A memory block move is executed, based on the memory block move addresses. The data for the memory block move is transferred outside of the array.

Type: Application

Filed: July 26, 2024

Publication date: November 21, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING HAZARD MITIGATION AVOIDANCE

Publication number: 20240264974

Abstract: Techniques for parallel processing based on hazard mitigation avoidance are disclosed. An array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. Memory access operation hazard mitigation is enabled. The hazard mitigation is enabled by a control word tag. The control word tag supports memory access precedence information and is provided by the compiler at compile time. A hazardless memory access operation is executed. The hazardless memory access operation is determined by the compiler, and the hazardless memory access operation is designated by a unique set of precedence information contained in the tag. The tag is modified during runtime by hardware.

Type: Application

Filed: April 19, 2024

Publication date: August 8, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE FOR BRANCH PATH SUPPRESSION

Publication number: 20240193009

Abstract: Techniques for a parallel processing architecture for branch path suppression are disclosed. An array of compute elements is accessed. Each element is known to a compiler and is coupled to its neighboring elements. Control for the elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. The control includes a branch. A plurality of compute elements is mapped. The mapping distributes parallelized operations to the compute elements. The mapping is determined by the compiler. A column of compute elements is enabled to perform vertical data access suppression and a row of compute elements is enabled to perform horizontal data access suppression. Both sides of the branch are executed. The executing includes making a branch decision. Branch operation data accesses are suppressed, based on the branch decision and an invalid indication. The invalid indication is propagated among compute elements.

Type: Application

Filed: February 23, 2024

Publication date: June 13, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING WITH HAZARD DETECTION AND STORE PROBES

Publication number: 20240168802

Abstract: Techniques for parallel processing using hazard detection and store probes are disclosed. An array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. Data to be stored by the array of compute elements is managed. The data to be stored is targeted to a data cache coupled to the array of compute elements. The managing includes detecting and mitigating memory hazards. Pending data cache accesses are probed for hazards. The examining comprises a store probe. Store data is committed to the data cache. The committing is based on a result of the store probe.

Type: Application

Filed: January 30, 2024

Publication date: May 23, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING WITH SWITCH BLOCK EXECUTION

Publication number: 20240078182

Abstract: Techniques for parallel processing based on parallel processing with switch block execution are disclosed. An array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. A plurality of compute elements is initialized within the array with a switch statement. The switch statement is mapped into a primitive operation in each element of the plurality of compute elements. The initializing is based on a control word from the stream of control words. Each of the primitive operations is executed in an architectural cycle. A result is returned for the switch statement. The returning is determined by a decision variable.

Type: Application

Filed: November 13, 2023

Publication date: March 7, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING USING HAZARD DETECTION AND MITIGATION

Publication number: 20240070076

Abstract: Techniques for parallel processing using hazard detection and mitigation are disclosed. An array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. Memory access operations are tagged with precedence information. The tagging is contained in the control words. The tagging is provided by the compiler at compile time. Memory access operations are monitored. The monitoring is based on the precedence information and a number of architectural cycles of the cycle-by-cycle basis. The tagging is augmented at run time, based on the monitoring. Memory access data is held before promotion, based on the monitoring.

Type: Application

Filed: November 7, 2023

Publication date: February 29, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH BIN PACKING

Publication number: 20240028340

Abstract: Techniques for parallel processing based on a parallel processing architecture with bin packing are disclosed. An array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. A plurality of compressed control words is generated by the compiler. The plurality of control words enables compute element operation and compute element memory access. The compressed control words are operationally sequenced. The compressed control words are linked by the compiler. Linking information is contained in at least one field of each of the compressed control words. The compressed control words are loaded into a control word cache coupled to the array of compute elements. The compressed control words are loaded into the control word cache in an operationally non-sequenced order. The plurality of compressed control words is ordered into an operationally sequenced execution order, based on the linking information.

Type: Application

Filed: August 22, 2023

Publication date: January 25, 2024

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH MEMORY BLOCK TRANSFERS

Publication number: 20230409328

Abstract: Techniques for task processing based on a parallel processing architecture with memory block transfers are disclosed. An array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. Control for the array is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. A control word from the stream of control words includes a source address, a target address, a block size, and a stride. Memory block transfer control logic is used. The memory block transfer logic is implemented outside of the array of compute elements. A memory block transfer is executed. The memory block transfer is initiated by a control word from the stream of wide control words. Data for the memory block transfer is moved independently from the array of compute elements.

Type: Application

Filed: August 30, 2023

Publication date: December 21, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH DUAL LOAD BUFFERS

Publication number: 20230376447

Abstract: Techniques for parallel processing based on a parallel processing architecture with dual load buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. A first data cache is coupled to the array. The first data cache enables loading data to a first portion of the array. The first data cache supports an address space. A second data cache is coupled to the array. The second data cache enables loading data to a second portion of the array. The second data cache supports the address space. Instructions are executed within the array. Instructions executed within the first portion of the array of compute elements use data loaded from the first data cache, and instructions executed within the second portion of the array of compute elements use data loaded from the second data cache.

Type: Application

Filed: July 31, 2023

Publication date: November 23, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH COUNTDOWN TAGGING

Publication number: 20230350713

Abstract: Techniques for parallel processing based on a parallel processing architecture with countdown tagging are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. A load operation is tagged with a countdown tag. Tagging is performed by the compiler, and the load operation is targeted to a memory system associated with the array of compute elements. The countdown tag comprises a time value. The time value is decremented as the load operation is being performed. The time value that is decremented is based on an architectural cycle. Countdown tag status is monitored by a control unit. The monitoring occurs as the load operation is performed. A load status is generated by the control unit, based on the monitoring. The load status allows compute element operation.

Type: Application

Filed: July 11, 2023

Publication date: November 2, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE WITH SPLIT CONTROL WORD CACHES

Publication number: 20230342152

Abstract: Techniques for a parallel processing architecture with split control word caches are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements. A first control word cache is coupled to the array. The first control word cache enables loading control words to a first array portion. A second control word cache is coupled to the array. The second control word cache enables loading control words to a second array portion. The control words are split between the first and the second control word caches. The splitting is based on the constituency of the first and the second array portions. Instructions are executed within the array. Instructions executed within the first array portion use control words loaded from the first cache. Instructions executed within the second array portion use control words loaded from the second cache.

Type: Application

Filed: June 29, 2023

Publication date: October 26, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING OF MULTIPLE LOOPS WITH LOADS AND STORES

Publication number: 20230281014

Abstract: Techniques for parallel processing of multiple loops with loads and stores are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array. Control for the compute elements is provided on a cycle-by-cycle basis. Control is enabled by a stream of wide control words generated by the compiler. Memory access operations are tagged with precedence information. The tagging is contained in the control words and is implemented for loop operations. The tagging is provided by the compiler at compile time. Control word data is loaded for multiple, independent loops into the compute elements. The multiple, independent loops are executed. Memory is accessed based on the precedence information. The memory access includes loads and/or stores for data relating to the independent loops.

Type: Application

Filed: May 10, 2023

Publication date: September 7, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
HIGHLY PARALLEL PROCESSING ARCHITECTURE WITH OUT-OF-ORDER RESOLUTION

Publication number: 20230273818

Abstract: Techniques for task processing based on a highly parallel processing architecture with out-of-order resolution are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. The array of compute elements is coupled to supporting logic and to memory, which, along with the array of compute elements, comprise compute hardware. A set of directions is provided to the hardware, through a control word generated by the compiler, for compute element operation. The set of directions is augmented with data access ordering information. The data access ordering is performed by the hardware. A compiled task is executed on the array of compute elements, based on the set of directions that was augmented.

Type: Application

Filed: March 9, 2023

Publication date: August 31, 2023

Applicants: Ascenium, Inc., Ascenium, Inc.

Inventor: Peter Foley
AUTONOMOUS COMPUTE ELEMENT OPERATION USING BUFFERS

Publication number: 20230221931

Abstract: Techniques for task processing based on autonomous compute element operation using buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control is provided for the array of compute elements on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. An autonomous operation buffer is loaded with at least two operations contained in control words. The autonomous operation buffer is integrated in a compute element. A compute element operation counter coupled to the autonomous operation buffer is set. The compute element operation counter is integrated in the compute element. The at least two operations are executed using the autonomous operation buffer and the compute element operation counter. The operations complete autonomously from direct compiler control.

Type: Application

Filed: March 21, 2023

Publication date: July 13, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
COMPUTE ELEMENT PROCESSING USING CONTROL WORD TEMPLATES

Publication number: 20230128127

Abstract: Techniques for task processing based on compute element processing using control word templates are disclosed. One or more control word templates are generated for use in a two-dimensional array of compute elements. Each compute element within the array is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Each control word template designates a topological set of compute elements from the array of compute elements. The one or more control word templates are customized with a specific set of compute element operations. The one or more control word templates that were customized are stored. The specific set of compute element operations is executed on the topological set of compute elements. The one or more control word templates that were stored are reused. The one or more control word templates that were stored are modified and executed using compute elements.

Type: Application

Filed: December 23, 2022

Publication date: April 27, 2023

Applicant: Ascenium, Inc.

Inventors: Ionut Hristodorescu, Peter Foley
LOAD LATENCY AMELIORATION USING BUNCH BUFFERS

Publication number: 20230031902

Abstract: Techniques for task processing based on load latency amelioration using bunch buffers are disclosed. A two-dimensional array of compute elements is accessed. Each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. Sets of control word bits are loaded into buffers. Each buffer is associated with and coupled to a unique compute element within the array of compute elements. The sets of control word bits provide operational control for the compute element with which it is associated. Operations are executed within the array of elements. The operations are based on a selected set of control word bits which comprise a control word bunch.

Type: Application

Filed: October 11, 2022

Publication date: February 2, 2023

Applicant: Ascenium, Inc.

Inventor: Peter Foley
PARALLEL PROCESSING ARCHITECTURE FOR ATOMIC OPERATIONS

Publication number: 20220374286

Abstract: Techniques for task processing in a parallel processing architecture for atomic operations are disclosed. A two-dimensional array of compute elements is accessed, where each compute element within the array of compute elements is known to a compiler and is coupled to its neighboring compute elements within the array of compute elements. Control for the array of compute elements is provided on a cycle-by-cycle basis. The control is enabled by a stream of wide control words generated by the compiler. At least one of the control words involves an operation requiring at least one additional operation. A bit of the control word is set, where the bit indicates a multicycle operation. The control word is executed, on at least one compute element within the array of compute elements, based on the bit. The multicycle operation comprises a read-modify-write operation.

Type: Application

Filed: August 3, 2022

Publication date: November 24, 2022

Applicant: Ascenium, Inc.

Inventor: Peter Foley

1 2 next