Patents Assigned to Akeana, Inc.
-
Patent number: 12639254Abstract: Techniques for sharing processor data within a network are disclosed. A system-on-a-chip (SOC) is accessed. The SOC includes a network-on-a-chip (NOC), which comprises an M×N mesh topology. The mesh includes a coherent tile at each mesh point. Each tile includes local snoop vectors (LSVs). A first coherent tile initiates a snoop operation. The tile generates a snoop vector that indicates other tiles to be notified of the snoop operation. The first coherent tile creates directional snoop vectors (DSVs). The creating logically combines the snoop vector with each of the LSVs. A coherent tile adjacent to the first coherent tile is selected. The adjacent tile is located in a cardinal direction from the first tile. A first DSV is chosen based on the cardinal direction. The first tile sends the snoop operation and the chosen first DSV to the selected adjacent tile.Type: GrantFiled: November 6, 2024Date of Patent: May 26, 2026Assignee: Akeana, Inc.Inventors: Madhavi Kondapaneni, Aqdas Javaid, Ayesha Zahid
-
Publication number: 20260119176Abstract: Disclosed embodiments provide techniques for improved performance in processing vector instructions. A processor core is accessed. The processor core is coupled to a memory hierarchy, and the processor core includes one or more vector execution units (VUs), and one or more load store units (LSUs). The processor core includes a vector register file (VRF). The VRF includes multiple vector registers, and each vector register includes multiple vector elements. Vector elements that have a source or destination in contiguous memory are identified. Load store units (LSUs) take advantage of the contiguous memory condition by executing a vector load or vector store operation as a single memory access, requiring a reduced number of clock cycles. The single memory access satisfies each memory operation for each vector element within the vector register file.Type: ApplicationFiled: October 25, 2024Publication date: April 30, 2026Applicant: Akeana, Inc.Inventors: Hai Ngoc NGUYEN, Rabin SUGUMAR
-
Patent number: 12596647Abstract: Techniques for cache management based on cache management using memory queues are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a common memory structure coupled to the plurality of processor cores. Coherency for the coherency domain is managed using a compute coherency block (CCB). The CCB includes a memory queue for controlling transfer of cache lines determined by the CCB. The memory queue includes an evict queue and a miss queue. Snoop requests are generated by the CCB. The snoop requests correspond to entries in the memory queue. Cache lines are transferred between the CCB and a bus interface unit. The transferring is controlled by the memory queue. The bus interface unit controls memory accesses.Type: GrantFiled: January 16, 2024Date of Patent: April 7, 2026Assignee: Akeana, Inc.Inventors: Sanjay Patel, Yogesh Shamkant Thombre
-
Publication number: 20260093489Abstract: A processor core is coupled to a memory hierarchy. The processor core is configured to execute vector floating-point instructions and micro-operations. A vector floating-point instruction is decoded. The decoding includes replacing the vector floating-point instruction with one or more vector floating-point micro-operations (VFPMs). A reorder buffer assigns a reorder buffer ID (ROBID) to each of the one or more VFPMs, in which the assigning includes a micro-sequencer ID (MSID). The processor core executes the one or more VFPMs. The executing includes requiring, by a first VFPM within the one or more VFPMs, a first update to an architectural floating-point flag. The architectural floating-point flag is set, based on the first update. The setting occurs after the one or more VFPMs have been committed by the processor core. A temporary floating-point flag is revised. The revising is based on the first update.Type: ApplicationFiled: November 12, 2025Publication date: April 2, 2026Applicant: Akeana, Inc.Inventors: Abhijit Sil, Ricardo Ramirez
-
Publication number: 20260086733Abstract: A system-on-chip (SoC) is accessed. The SoC includes a mesh network and one or more coherency ordering agents (COAs). The COAs coordinate coherency for one or more processors coupled to the mesh network. The COAs are coupled to one or more communication converters (CCs) by the mesh network. A processor sends a request to a target device. The request is based on a first communications protocol and includes a memory address. The request is sent by a COA to a CC. A request queue within the CC stores the request. The request is checked against one or more additional requests. The CC translates the request, resulting in a converted request, based on a second communications protocol. The translating is based on the checking. The CC transmits the converted request to the target device.Type: ApplicationFiled: September 25, 2025Publication date: March 26, 2026Applicant: Akeana, Inc.Inventors: Ali Shair Khan, Madhavi Kondapaneni
-
Patent number: 12578967Abstract: Disclosed embodiments provide techniques for prefetching. A processor core that executes instructions out of order (OOO) is accessed. The processor core includes a local cache hierarchy, data prefetch logic, and a prefetch table and is coupled to an external memory system. A first load instruction with a first address is detected and causes a miss in the local cache hierarchy. Information pertaining to the first load instruction is saved in an entry of the prefetch table. The information includes the first address, a confidence count, and an out-of-order mask. A second load instruction with a second address is identified. The information is updated based on the detecting. The information is advanced. The second address is the next sequential address after the first address. The advancing is based on the detecting. One or more data prefetch instructions are issued to the second address plus an offset.Type: GrantFiled: May 1, 2024Date of Patent: March 17, 2026Assignee: Akeana, Inc.Inventor: Rabin Sugumar
-
Publication number: 20260064619Abstract: Disclosed embodiments provide techniques for communication. A system-on-a-chip (SoC) is accessed. The SoC includes a mesh network that includes a plurality of nodes. At least one node within the plurality of nodes includes a quality-of-service (QoS) agent. Network traffic data is collected by a first QoS agent within a first node. The network traffic data is associated with the first node and the traffic occurs during a first timing window. The first QoS agent receives a request by a primary device within the first node to send data to a secondary device in a second node. The first QoS agent analyzes the network traffic data. A first routing agent within the first node selects an intermediate node within the plurality of nodes, based on the analyzing. The primary device sends the data to the intermediate node.Type: ApplicationFiled: August 29, 2025Publication date: March 5, 2026Applicant: Akeana, Inc.Inventor: Yogesh Shamkant Thombre
-
Publication number: 20260064421Abstract: A processor core is accessed. The processor core supports atomic memory operations. The atomic memory operations include multi-operand operations. A compare and swap (CAS) instruction is issued in the processor core. The CAS instruction necessitates three source operands. One of the source operands comprises a destination register. The CAS instruction is split into a plurality of micro-operations. A first value is written from a memory location indicated by a first source operand into a temporary register. A memory word location addressed by a second source operand is accessed using a second micro-operation. The first micro-operation and the second micro-operation are interlocked. Contents of the memory word location are compared. A third source operand is stored to the memory word location addressed by the second source operand. The storing is based on a match of the comparing.Type: ApplicationFiled: August 27, 2025Publication date: March 5, 2026Applicant: Akeana, Inc.Inventors: Ricardo Ramirez, Abhijit Sil
-
Publication number: 20260064600Abstract: A processor core is accessed. The processor core supports virtual memory addressing. The processor core includes a memory management unit (MMU) and a load store unit (LSU). A page table walk is performed by the MMU. The page table walk is responsive to a memory operation. The page table walk identifies a page table entry (PTE) for a virtual to physical address translation. The PTE is read. The reading obtains a first value from the PTE and includes determining, by the MMU, to update one or more status bits within the PTE. The PTE is re-read. The re-reading obtains a second value from the PTE. The PTE is updated to include the one or more status bits, based on a match between the first and second value. The updated PTE is stored in a page table. The re-reading, the updating, and the storing are performed atomically.Type: ApplicationFiled: August 29, 2025Publication date: March 5, 2026Applicant: Akeana, Inc.Inventors: Ricardo Ramirez, Sundeep Chadha, Hai Ngoc Nguyen, Abbas Rashid
-
Publication number: 20260056740Abstract: A processor core is accessed. The processor core is configured to execute vector instructions, scalar instructions, and micro-operations. A vector memory instruction is decoded. The vector memory instruction is associated with a memory addressing mode. The decoding includes replacing the vector memory instruction with one or more vector memory micro-operations (VMMOs). The one or more VMMOs are substituted with one or more vector memory element micro-operations (VMEMOs). The substituting is based on the memory addressing mode. At least one VMEMO within the one or more VMEMOs is forwarded to a memory queue within a plurality of memory queues. A memory operation is issued to a load-store unit within the processor core. The issuing includes selecting, from the plurality of memory queues, the memory operation. The replacing is based on a micro-operation sequencer. One or more destination registers for the vector memory instruction are determined.Type: ApplicationFiled: October 30, 2025Publication date: February 26, 2026Applicant: Akeana, Inc.Inventors: Hai Ngoc Nguyen, Abhijit Sil
-
Patent number: 12554503Abstract: Disclosed embodiments provide techniques for instruction execution with a processor pipeline for data transfer operations. A processor core is accessed. The processor core executes one or more instructions out of order. The processor core supports integer operations and floating-point operations. An instruction in the processor core is decoded. The instruction is a data transfer operation. The data transfer operation necessitates a floating-point operation and an integer operation. The floating-point operation and the integer operation are dispatched to one or more issue queues. The floating-point operation and the integer operation are interlocked. The interlocking is accomplished using at least one entry in the one or more issue queues. A first operation of the floating-point operation and the integer operation is executed. A second operation of the floating-point operation and the integer operation is executed. The execution of the second operation is based on the interlocking.Type: GrantFiled: April 26, 2024Date of Patent: February 17, 2026Assignee: Akeana, Inc.Inventors: Ricardo Ramirez, Albert Anthony Martin, Abhijit Sil, Rabin Sugumar
-
Publication number: 20260044348Abstract: Disclosed techniques enable vector instruction processing. A processor core is accessed. The processor core is coupled to a memory hierarchy, and is configured to execute vector operations, scalar operations, and micro-operations. A decode unit decodes a vector memory operation. The vector memory operation is associated with a unit stride addressing mode. The decoding includes dividing the vector memory operation into one or more vector memory micro-operations. A dispatch unit sends at least one vector micro-operation within the one or more vector micro-operations to a scalar request queue within a plurality of request queues. The at least one vector micro-operation is issued to a load-store unit within the processor core. The issuing includes selecting, from the plurality of request queues, the at least one vector memory micro-operation.Type: ApplicationFiled: September 29, 2025Publication date: February 12, 2026Applicant: Akeana, Inc.Inventors: Hai Ngoc Nguyen, Abhijit Sil, Rabin Sugumar
-
Publication number: 20260044339Abstract: A processor core is coupled to a memory hierarchy. The processor core is configured to execute vector instructions, scalar instructions, and micro-operations. A dispatch unit within the processor core receives a vector memory operation. The dispatch unit sends the vector memory operation to a first vector input queue of multiple vector input queues. The sending is based on the memory addressing mode. A micro-operation sequencer splits the vector memory operation into one or more memory micro-operations, which includes forwarding each micro-operation within the one or more micro-operations to a first memory queue within multiple memory queues. A memory operation is then issued to a load-store unit within the processor core. The issuing includes selecting, from the multiple memory queues, the memory operation. The vector memory operation comprises either a vector load operation or a vector store operation.Type: ApplicationFiled: August 5, 2025Publication date: February 12, 2026Applicant: Akeana, Inc.Inventors: Hai Ngoc Nguyen, Abhijit Sil
-
Patent number: 12547407Abstract: Techniques for providing a return address stack with branch mispredict recovery are disclosed. A processor core is accessed. The processor core includes a return address stack (RAS), a local cache hierarchy, and branch prediction logic. RAS state information, including a write pointer, a read pointer, and a RAS count, is sent to a branch execution unit. One or more call instructions are detected in an instruction stream. The detecting generates a predicted return address for each of the one or more call instructions which are pushed on the RAS. The pushing is directed by the write pointer. One or more return instructions are recognized in the instruction stream. The write pointer and the read pointer for the RAS are updated, based on information from the branch execution unit. The predicted return address for each of the one or more return instructions is popped from the RAS.Type: GrantFiled: December 29, 2023Date of Patent: February 10, 2026Assignee: Akeana, Inc.Inventors: James Youngsae Cho, Rabin Sugumar
-
Publication number: 20260037599Abstract: An accelerator is accessed. The accelerator includes a weight-stationary systolic array of one or more multiply-accumulate units. The accelerator is coupled to a memory hierarchy and a processor core. The processor core sends a work request to the accelerator. The work request is based on execution of a machine learning model and an activation matrix. In response to the work request, the accelerator loads a weight matrix and the activation matrix. The loading uses the memory hierarchy. The accelerator multiplies the weight matrix by the activation matrix. The multiplication results in an answer matrix. The accelerator stores the answer matrix in the memory hierarchy. The processor core obtains the answer matrix that was stored. The machine learning model is trained. The training produces the weight matrix, which is transposed and saved to the memory hierarchy.Type: ApplicationFiled: August 4, 2025Publication date: February 5, 2026Applicant: Akeana, Inc.Inventors: David Cureton Baker, David St Clair Scott, Yogesh Shamkant Thombre
-
Patent number: 12517734Abstract: Disclosed techniques enable processors that are capable of performing a wide range of vector operations. A processor can support multiple types of instructions. The instructions can include one or more operands, and the one or more operands can include different data types. An A-type instruction can have dependencies on a B-type instruction. An A-type instruction includes a vector instruction. A B-type instruction includes an integer instruction or a floating-point instruction. A datapath is provided to enable intermediate results from a B-type instruction to be supplied to the A-type instruction on which it depends, without utilizing register file resources, such as general-purpose register (GPR) register resources. Vector instruction performance is thereby enabled without the additional resources used with GPR register access.Type: GrantFiled: October 4, 2024Date of Patent: January 6, 2026Assignee: Akeana, Inc.Inventors: Ricardo Ramirez, Abhijit Sil
-
Publication number: 20260003631Abstract: Disclosed embodiments provide techniques for prefetching. A processor core that executes instructions out of order (OOO) is accessed. The processor core includes a local cache hierarchy, data prefetch logic, and a prefetch table and is coupled to an external memory system. A first load instruction with a first address is detected and causes a miss in the local cache hierarchy. Information pertaining to the first load instruction is saved in an entry of the prefetch table. The information includes the first address, a confidence count, and an out-of-order mask. A second load instruction with a second address is identified. The information is updated based on the detecting. The information is advanced. The second address is the next sequential address after the first address. The advancing is based on the detecting. One or more data prefetch instructions are issued to the second address plus an offset.Type: ApplicationFiled: May 1, 2024Publication date: January 1, 2026Applicant: Akeana, Inc.Inventor: Rabin SUGUMAR
-
Patent number: 12499056Abstract: Techniques for address translation are disclosed. A processor core is accessed. The processor core includes a memory management unit (MMU) and a unified translation lookaside buffer (TLB) within the MMU. The TLB is configured to support a plurality of page sizes, and the processor core is coupled to an external memory system. The TLB receives a lookup request for a virtual memory address, wherein the virtual memory address corresponds to a process running on the processor core. The TLB accesses a linked list that comprises a page size priority order for the plurality of page sizes. A lookup is performed in the TLB on the virtual memory address, and the lookup is conducted in the page size priority order. The linked list is updated, the updating moves a page size associated with the lookup to a location in the linked list, and a physical address is returned.Type: GrantFiled: December 28, 2023Date of Patent: December 16, 2025Assignee: Akeana, Inc.Inventor: Abbas Rashid
-
Publication number: 20250370932Abstract: Techniques for data sharing are disclosed. A system-on-a-chip (SoC) is accessed. The SoC includes one or more cache coherency blocks (CCBs) and one or more coherency ordering agents (COAs). Each COA includes a directory snoop filter (DSF). Each CCB is communicatively coupled to each COA by a network-on-a-chip (NOC) interface. A CCB requests a cache line associated with a memory address. The CCB is not a sharer of the cache line. A directory snoop filter (DSF) within a COA is read. The reading reveals one or more CCB sharers of the cache line and indicates there is no CCB owner. The COA includes a coherent last level cache (LLC) that contains a valid copy of the cache line. The COA assigns ownership of the cache line to the CCB. The assigning is recorded in the DSF. The cache line is forwarded by the coherent LLC to the CCB.Type: ApplicationFiled: May 29, 2025Publication date: December 4, 2025Applicant: Akeana, Inc.Inventor: Madhavi Kondapaneni
-
Publication number: 20250342127Abstract: Techniques for managing computer processors that implement speculative reads are disclosed. A circular queue is accessed. The circular queue comprises a plurality of entries and includes a head pointer and a tail pointer. The head pointer and the tail pointer move independently in a single direction within the circular queue. A software agent selects a read entry associated with a read index within the circular queue. A validity of the read entry is interpreted based on a head wrap bit, a tail wrap bit, a read index, a head index, and a tail index. The circular queue returns an invalid signal to the software agent. The read entry is not modified when the read entry is interpreted as invalid. The circular queue sends data within the read entry to the software agent. The head wrap bit is calculated to be equal to the tail wrap bit.Type: ApplicationFiled: April 30, 2025Publication date: November 6, 2025Applicant: Akeana, Inc.Inventor: Nagesh Suranna Kanakapura