Patents Assigned to Akeana, Inc.

STREAMING MATRIX TRANSPOSER WITH DIAGONAL STORAGE

Publication number: 20250217069

Abstract: A memory array is accessed. The memory array includes a plurality of memory banks. Each column in the memory array is associated with a unique memory bank within the plurality of memory banks. A request is received to transpose a matrix which comprises i rows and j columns. The matrix is saved in the memory array, which includes rotating, to the right, each row within the i rows. The rotating is based on a row index. The saving results in a diagonal format of the matrix within the memory array. The diagonal format of the matrix is read from the memory array, which includes rotating, to the left, each row with the diagonal format of the matrix. The reading produces a transpose of the matrix.

Type: Application

Filed: December 24, 2024

Publication date: July 3, 2025

Applicant: Akeana, Inc.

Inventor: David Cureton Baker
Coherency management using distributed snoop

Patent number: 12332791

Abstract: Techniques for coherency management using distributed snoop requests are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a shared memory structure coupled to the plurality of processor cores. Snoop requests are ordered in a two-dimensional matrix. The snoop requests are based on physical addresses for the shared memory structure. The two-dimensional matrix is extensible along each axis of the two-dimensional matrix. Snoop responses are mapped to a first-in first-out (FIFO) mapping queue. Each snoop response corresponds to a snoop request. Each processor core of the plurality of processor cores is coupled to at least one FIFO mapping queue. A memory access operation is completed, based on a comparison of the snoop requests and the snoop responses.

Type: Grant

Filed: December 29, 2023

Date of Patent: June 17, 2025

Assignee: Akeana, Inc.

Inventor: Sanjay Patel
OPTIMIZED SNOOP MULTI-CAST WITH MESH REGIONS

Publication number: 20250173268

Abstract: Processor data sharing is described. A system-on-a-chip (SOC) is accessed. The SOC includes a network-on-a-chip (NOC). The NOC includes an M×N mesh topology with a coherent tile at each point of the M×N mesh topology. The M×N mesh topology is divided into a plurality of regions. Each region in the plurality of regions includes one or more coherent tiles. A snoop operation is initiated by a first coherent tile within a first region. A snoop vector is generated by the first coherent tile for each region. The snoop vector for each region selects at least one other coherent tile. The snoop operation is sent by the first coherent tile for each region. The sending is based on the snoop vector for each region. The snoop operation is processed by the at least one other coherent tile.

Type: Application

Filed: November 22, 2024

Publication date: May 29, 2025

Applicant: Akeana, Inc.

Inventors: Madhavi Kondapaneni, Aqdas Javaid, Ayesha Zahid
MULTI-CAST SNOOP VECTORS WITHIN A MESH TOPOLOGY

Publication number: 20250147919

Abstract: Techniques for sharing processor data within a network are disclosed. A system-on-a-chip (SOC) is accessed. The SOC includes a network-on-a-chip (NOC), which comprises an M×N mesh topology. The mesh includes a coherent tile at each mesh point. Each tile includes local snoop vectors (LSVs). A first coherent tile initiates a snoop operation. The tile generates a snoop vector that indicates other tiles to be notified of the snoop operation. The first coherent tile creates directional snoop vectors (DSVs). The creating logically combines the snoop vector with each of the LSVs. A coherent tile adjacent to the first coherent tile is selected. The adjacent tile is located in a cardinal direction from the first tile. A first DSV is chosen based on the cardinal direction. The first tile sends the snoop operation and the chosen first DSV to the selected adjacent tile.

Type: Application

Filed: November 6, 2024

Publication date: May 8, 2025

Applicant: Akeana, Inc.

Inventors: Madhavi Kondapaneni, Aqdas Javaid, Ayesha Zahid
CACHE EVICT DUPLICATION MANAGEMENT

Publication number: 20250147893

Abstract: Techniques for coherent processor cache control are disclosed. A plurality of processor cores is accessed. Each processor of the plurality of processor cores includes a shared local cache, and the plurality of processor cores implements special cache coherency operations. An evict buffer is coupled to the plurality of processor cores. The evict buffer is shared among the plurality of processor cores, and the evict buffer enables delayed writes. Evict buffer writes are monitored. The monitoring evict buffer writes identifies a special cache coherency operation. The special cache coherency operation that was identified comprises a global snoop operation. The global snoop operation is initiated from a non-local agent within a globally coherent system. An evict buffer entry is marked. The marking corresponds to the special cache coherency operation that was identified, and the marking enables management of cache evict duplication.

Type: Application

Filed: November 5, 2024

Publication date: May 8, 2025

Applicant: Akeana, Inc.

Inventor: Sanjay Patel
PIPELINE OPTIMIZATION WITH VARIABLE LATENCY EXECUTION

Publication number: 20250138828

Abstract: Disclosed embodiments provide techniques for instruction execution in computer processors. A dispatch unit dispatches instructions to one or more issue queues. Instructions from the issue queues feed into execution pipelines. Each execution pipeline includes instruction queue control logic, and two execution engines. A first execution engine is assigned to variable latency instructions while a second execution engine is assigned to fixed latency instructions. While a variable latency instruction executes, fixed latency instructions can be issued, executed, and completed. When the variable latency instruction finishes execution, a request is issued by the first execution engine to the instruction queue control logic. In response, the instruction queue control logic introduces a stall in a common write-back pipeline, allowing the variable latency instruction to complete. The result of the variable latency instruction is provided to a depending fixed latency instruction via a bypass path.

Type: Application

Filed: October 31, 2024

Publication date: May 1, 2025

Applicant: Akeana, Inc.

Inventors: Ricardo Ramirez, Abhijit Sil
MIXED-SOURCED DEPENDENCY CONTROL FOR VECTOR INSTRUCTIONS

Publication number: 20250117226

Abstract: Disclosed techniques enable processors that are capable of performing a wide range of vector operations. A processor can support multiple types of instructions. The instructions can include one or more operands, and the one or more operands can include different data types. An A-type instruction can have dependencies on a B-type instruction. An A-type instruction includes a vector instruction. A B-type instruction includes an integer instruction or a floating-point instruction. A datapath is provided to enable intermediate results from a B-type instruction to be supplied to the A-type instruction on which it depends, without utilizing register file resources, such as general-purpose register (GPR) register resources. Vector instruction performance is thereby enabled without the additional resources used with GPR register access.

Type: Application

Filed: October 4, 2024

Publication date: April 10, 2025

Applicant: Akeana, Inc.

Inventors: Ricardo Ramirez, Abhijit Sil
POLARITY-BASED DATA PREFETCHER WITH UNDERLYING STRIDE DETECTION

Publication number: 20250021336

Abstract: A processor core includes a local cache hierarchy, prefetch logic, and a prefetch table, where the processor core is coupled to an external memory system. A data stream is detected, where the data stream includes multiple load instructions, including a load instruction that causes a cache miss, resulting in prefetching. A prefetch table is initialized with information pertaining to load instructions, and includes a Positive or Negative value (PON), a stride, and a saturation count. Information in the prefetch table is updated as new load instructions are prefetched. An underlying stride of the data stream is discovered, based on the updating. Data is prefetched using an offset, where a polarity of the offset is based on the PON, enabling effective stride detection with dynamic directionality and out-of-order instructions.

Type: Application

Filed: July 10, 2024

Publication date: January 16, 2025

Applicant: Akeana, Inc.

Inventor: Rabin Sugumar
ARCHITECTURAL REDUCTION OF VOLTAGE AND CLOCK ATTACK WINDOWS

Publication number: 20240419551

Abstract: Disclosed embodiments provide techniques for enhancing security of a processor. Multiple consistency units are distributed within a processor core. Instructions are executed in an architecturally defined mode. The architecturally defined mode can be based on an instruction set architecture (ISA). In response to detecting an error in at least one consistency unit, disclosed embodiments reduce the functionality of the processor core. The reduced functionality includes halting the processor core, shutting down the processor core, switching the functionality of the processor core to a safe mode, and/or other suitable actions. The consistency unit can include a program counter comparison function. The consistency unit can include a completion signal check function. The consistency unit can include an address check function. The consistency unit can include a temporal proximity check function. Disclosed embodiments provide safeguards against various environmental attacks, such as voltage and/or clock alterations.

Type: Application

Filed: May 17, 2024

Publication date: December 19, 2024

Applicant: Akeana, Inc.

Inventor: Rabin Sugumar
COHERENT HIERARCHICAL CACHE LINE TRACKING

Publication number: 20240419595

Abstract: Techniques for coherency management based on coherent hierarchical cache line tracking are disclosed. A plurality of processor cores is accessed. Each processor of the plurality of processor cores includes a local cache. A hierarchical cache is coupled to the plurality of processor cores. The hierarchical cache is shared among the plurality of processor cores. Coherency between the plurality of processor cores and the hierarchical cache is managed by a compute coherency block (CCB). A cache line directory is provided for the CCB. The cache line directory includes a core list field and a cache line present field. A cache line operation is detected. The cache line operation is detected by the CCB. The cache line operation is represented by an entry in the cache line directory. The cache line operation is performed, based on corresponding values of the core list field and the line present field.

Type: Application

Filed: June 5, 2024

Publication date: December 19, 2024

Applicant: Akeana, Inc.

Inventor: Sanjay Patel
DIRECT CACHE TRANSFER WITH SHARED CACHE LINES

Publication number: 20240419599

Abstract: Disclosed embodiments provide techniques for direct cache transfer with shared cache. A system on a chip (SOC) is accessed. The SOC includes a plurality of coherent request nodes and a home node. The home node includes a directory-based snoop filter (DSF). A request node requests ownership of a coherent cache line within the SOC. The requesting includes an address associated with the coherent cache line. The home node detects that the coherent cache line is shared with one or more other request nodes. The home node determines a current owner of the coherent cache line. The home node sends an invalidating snoop instruction to the one or more other request nodes and transmits a forwarding snoop instruction. The forwarding snoop instruction establishes a direct cache transfer between the request node and the current owner of the coherent cache line.

Type: Application

Filed: June 14, 2024

Publication date: December 19, 2024

Applicant: Akeana, Inc.

Inventor: Madhavi Kondapaneni
ACCESS REQUEST DYNAMIC MULTILEVEL ARBITRATION

Publication number: 20240272905

Abstract: Techniques for processor request arbitration using access request dynamic multilevel arbitration are disclosed. A plurality of processor cores is accessed. The plurality of processor cores is coupled to a memory subsystem. A plurality of access requests is generated within the processor cores coupled to the memory subsystem. The plurality of access requests is generated by the plurality of processor cores. Multiple access requests are made in a single processor cycle. Only one access request is serviced in a single processor cycle. A set of at least two criteria is associated to each access request in the plurality of access requests criteria which are dynamically assigned. The requests are organized in two vectors and a stack. The vectors are organized as linear vectors. The stack is organized as a push-pop stack. The request is granted, based on data in the two vectors and the stack.

Type: Application

Filed: February 9, 2024

Publication date: August 15, 2024

Applicant: Akeana, Inc.

Inventor: Sanjay Patel
CACHE MANAGEMENT USING SHARED CACHE LINE STORAGE

Publication number: 20240241830

Abstract: Techniques for cache management based on cache management using memory queues are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a common memory structure coupled to the plurality of processor cores. Coherency for the coherency domain is managed using a compute coherency block (CCB). The CCB includes a memory queue for controlling transfer of cache lines determined by the CCB. The memory queue includes an evict queue and a miss queue. Snoop requests are generated by the CCB. The snoop requests correspond to entries in the memory queue. Cache lines are transferred between the CCB and a bus interface unit. The transferring is controlled by the memory queue. The bus interface unit controls memory accesses.

Type: Application

Filed: January 16, 2024

Publication date: July 18, 2024

Applicant: Akeana, Inc.

Inventors: Sanjay Patel, Yogesh Shamkant Thombre
PRIORITIZED UNIFIED TLB LOOKUP WITH VARIABLE PAGE SIZES

Publication number: 20240220416

Abstract: Techniques for address translation are disclosed. A processor core is accessed. The processor core includes a memory management unit (MMU) and a unified translation lookaside buffer (TLB) within the MMU. The TLB is configured to support a plurality of page sizes, and the processor core is coupled to an external memory system. The TLB receives a lookup request for a virtual memory address, wherein the virtual memory address corresponds to a process running on the processor core. The TLB accesses a linked list that comprises a page size priority order for the plurality of page sizes. A lookup is performed in the TLB on the virtual memory address, and the lookup is conducted in the page size priority order. The linked list is updated, the updating moves a page size associated with the lookup to a location in the linked list, and a physical address is returned.

Type: Application

Filed: December 28, 2023

Publication date: July 4, 2024

Applicant: Akeana, Inc.

Inventor: Abbas Rashid
RETURN ADDRESS STACK WITH BRANCH MISPREDICT RECOVERY

Publication number: 20240220267

Abstract: Techniques for providing a return address stack with branch mispredict recovery are disclosed. A processor core is accessed. The processor core includes a return address stack (RAS), a local cache hierarchy, and branch prediction logic. RAS state information, including a write pointer, a read pointer, and a RAS count, is sent to a branch execution unit. One or more call instructions are detected in an instruction stream. The detecting generates a predicted return address for each of the one or more call instructions which are pushed on the RAS. The pushing is directed by the write pointer. One or more return instructions are recognized in the instruction stream. The write pointer and the read pointer for the RAS are updated, based on information from the branch execution unit. The predicted return address for each of the one or more return instructions is popped from the RAS.

Type: Application

Filed: December 29, 2023

Publication date: July 4, 2024

Applicant: Akeana, Inc.

Inventors: James Youngsae Cho, Rabin Sugumar
COHERENCY MANAGEMENT USING DISTRIBUTED SNOOP

Publication number: 20240220412

Abstract: Techniques for coherency management using distributed snoop requests are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a shared memory structure coupled to the plurality of processor cores. Snoop requests are ordered in a two-dimensional matrix. The snoop requests are based on physical addresses for the shared memory structure. The two-dimensional matrix is extensible along each axis of the two-dimensional matrix. Snoop responses are mapped to a first-in first-out (FIFO) mapping queue. Each snoop response corresponds to a snoop request. Each processor core of the plurality of processor cores is coupled to at least one FIFO mapping queue. A memory access operation is completed, based on a comparison of the snoop requests and the snoop responses.

Type: Application

Filed: December 29, 2023

Publication date: July 4, 2024

Applicant: Akeana, Inc.

Inventor: Sanjay Patel
PREFETCHING WITH SATURATION CONTROL

Publication number: 20240211259

Abstract: Disclosed embodiments provide techniques for data prefetching. A processor core is accessed. The processor core includes prefetch logic and a local cache hierarchy and is coupled to a memory system. A stride of a data stream is detected. The data stream comprises two or more load instructions that cause two or more misses in the local cache hierarchy. Information about the data stream is accumulated. The information includes a stride count. Prefetch operations to the memory system are generated, based on the information. The prefetch operations include prefetch addresses. A rate of the prefetch operations is limited, based on the stride count. Based on the stride count, the prefetcher can enter a saturation state. The saturation state keeps the cache supplied with prefetched data. A number of stride prefetch operations is based on the stride of the data stream. The number is stored in a software-updatable configuration register array.

Type: Application

Filed: December 27, 2023

Publication date: June 27, 2024

Applicant: Akeana, Inc.

Inventors: James Youngsae Cho, Rabin Sugumar
PROCESSOR PERFORMANCE PROFILING USING AGENTS

Publication number: 20240211366

Abstract: Techniques for performance profiling based on processor performance profiling using agents are disclosed. A processor core is accessed. The processor core includes a performance counter, a performance counter storage area, and a performance counter control register. The processor core includes a performance monitoring interface. The performance counter, performance counter storage area, and performance counter control register are assigned to an external profiling agent, which loads the performance counter and the performance counter control register. The loading is based on a particular event in the processor core. A program state is saved to the storage area, based on a counter event in the performance counter and an enable bit in the performance counter control register being set. The program state that is saved corresponds to code being executed on the processor core. The program state is read, from the storage area, by the external profiling agent.

Type: Application

Filed: December 20, 2023

Publication date: June 27, 2024

Applicant: Akeana, Inc.

Inventor: Rabin Sugumar
PROCESSOR INSTRUCTION EXCEPTION HANDLING

Publication number: 20240192961

Abstract: Techniques for instruction execution based on processor instruction exception handling are disclosed. A processor core is accessed. The processor core executes at least one instruction thread. The processor core executes one or more instructions out of order. An ordered list of instructions is maintained. The ordered list is based on instructions that are presented to the processor core for execution. The ordered list is organized using one or more pointers. An execution exception is detected in the processor core. The execution exception corresponds to one of the instructions in the ordered list. The execution exception requires initiating an exception handling routine. An effective age of an instruction in the ordered list is determined. The effective age corresponds to the execution exception. The exception handling routine is initiated, based on matching the effective age of an instruction in the ordered list with one of the one or more pointers.

Type: Application

Filed: December 6, 2023

Publication date: June 13, 2024

Applicant: Akeana, Inc.

Inventors: Ricardo Ramirez, Rabin Sugumar
BRANCH TARGET BUFFER OPERATION WITH AUXILIARY INDIRECT CACHE

Publication number: 20240192958

Abstract: Disclosed embodiments provide techniques for branch prediction. A processor core is accessed. The processor core is coupled to memory and includes branch prediction circuitry. The branch prediction circuitry includes a branch target buffer (BTB) and an indirect branch target buffer (BTBI). A hashed program counter within the processor core is read. The BTB and BTBI are searched. The searching the BTB is accomplished with the hashed program counter and the searching the BTBI is accomplished with the hashed program counter and branch history information. A predicted branch target address within the BTBI or the BTB is matched. The matching within the BTBI is based on an indirect branch instruction, and the matching within the BTB is based on other branch instruction types. The predicted branch target address that was matched is predicted taken. The processor core is directed to fetch a next instruction from the predicted branch target address.

Type: Application

Filed: December 11, 2023

Publication date: June 13, 2024

Applicant: Akeana, Inc.

Inventors: James Youngsae Cho, Chandramouli Banerjee, Rabin Sugumar

1 2 next