Patents Assigned to Akeana, Inc.
  • Patent number: 12639254
    Abstract: Techniques for sharing processor data within a network are disclosed. A system-on-a-chip (SOC) is accessed. The SOC includes a network-on-a-chip (NOC), which comprises an M×N mesh topology. The mesh includes a coherent tile at each mesh point. Each tile includes local snoop vectors (LSVs). A first coherent tile initiates a snoop operation. The tile generates a snoop vector that indicates other tiles to be notified of the snoop operation. The first coherent tile creates directional snoop vectors (DSVs). The creating logically combines the snoop vector with each of the LSVs. A coherent tile adjacent to the first coherent tile is selected. The adjacent tile is located in a cardinal direction from the first tile. A first DSV is chosen based on the cardinal direction. The first tile sends the snoop operation and the chosen first DSV to the selected adjacent tile.
    Type: Grant
    Filed: November 6, 2024
    Date of Patent: May 26, 2026
    Assignee: Akeana, Inc.
    Inventors: Madhavi Kondapaneni, Aqdas Javaid, Ayesha Zahid
  • Publication number: 20260119176
    Abstract: Disclosed embodiments provide techniques for improved performance in processing vector instructions. A processor core is accessed. The processor core is coupled to a memory hierarchy, and the processor core includes one or more vector execution units (VUs), and one or more load store units (LSUs). The processor core includes a vector register file (VRF). The VRF includes multiple vector registers, and each vector register includes multiple vector elements. Vector elements that have a source or destination in contiguous memory are identified. Load store units (LSUs) take advantage of the contiguous memory condition by executing a vector load or vector store operation as a single memory access, requiring a reduced number of clock cycles. The single memory access satisfies each memory operation for each vector element within the vector register file.
    Type: Application
    Filed: October 25, 2024
    Publication date: April 30, 2026
    Applicant: Akeana, Inc.
    Inventors: Hai Ngoc NGUYEN, Rabin SUGUMAR
  • Patent number: 12596647
    Abstract: Techniques for cache management based on cache management using memory queues are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a common memory structure coupled to the plurality of processor cores. Coherency for the coherency domain is managed using a compute coherency block (CCB). The CCB includes a memory queue for controlling transfer of cache lines determined by the CCB. The memory queue includes an evict queue and a miss queue. Snoop requests are generated by the CCB. The snoop requests correspond to entries in the memory queue. Cache lines are transferred between the CCB and a bus interface unit. The transferring is controlled by the memory queue. The bus interface unit controls memory accesses.
    Type: Grant
    Filed: January 16, 2024
    Date of Patent: April 7, 2026
    Assignee: Akeana, Inc.
    Inventors: Sanjay Patel, Yogesh Shamkant Thombre
  • Publication number: 20260093489
    Abstract: A processor core is coupled to a memory hierarchy. The processor core is configured to execute vector floating-point instructions and micro-operations. A vector floating-point instruction is decoded. The decoding includes replacing the vector floating-point instruction with one or more vector floating-point micro-operations (VFPMs). A reorder buffer assigns a reorder buffer ID (ROBID) to each of the one or more VFPMs, in which the assigning includes a micro-sequencer ID (MSID). The processor core executes the one or more VFPMs. The executing includes requiring, by a first VFPM within the one or more VFPMs, a first update to an architectural floating-point flag. The architectural floating-point flag is set, based on the first update. The setting occurs after the one or more VFPMs have been committed by the processor core. A temporary floating-point flag is revised. The revising is based on the first update.
    Type: Application
    Filed: November 12, 2025
    Publication date: April 2, 2026
    Applicant: Akeana, Inc.
    Inventors: Abhijit Sil, Ricardo Ramirez
  • Publication number: 20260086733
    Abstract: A system-on-chip (SoC) is accessed. The SoC includes a mesh network and one or more coherency ordering agents (COAs). The COAs coordinate coherency for one or more processors coupled to the mesh network. The COAs are coupled to one or more communication converters (CCs) by the mesh network. A processor sends a request to a target device. The request is based on a first communications protocol and includes a memory address. The request is sent by a COA to a CC. A request queue within the CC stores the request. The request is checked against one or more additional requests. The CC translates the request, resulting in a converted request, based on a second communications protocol. The translating is based on the checking. The CC transmits the converted request to the target device.
    Type: Application
    Filed: September 25, 2025
    Publication date: March 26, 2026
    Applicant: Akeana, Inc.
    Inventors: Ali Shair Khan, Madhavi Kondapaneni
  • Patent number: 12578967
    Abstract: Disclosed embodiments provide techniques for prefetching. A processor core that executes instructions out of order (OOO) is accessed. The processor core includes a local cache hierarchy, data prefetch logic, and a prefetch table and is coupled to an external memory system. A first load instruction with a first address is detected and causes a miss in the local cache hierarchy. Information pertaining to the first load instruction is saved in an entry of the prefetch table. The information includes the first address, a confidence count, and an out-of-order mask. A second load instruction with a second address is identified. The information is updated based on the detecting. The information is advanced. The second address is the next sequential address after the first address. The advancing is based on the detecting. One or more data prefetch instructions are issued to the second address plus an offset.
    Type: Grant
    Filed: May 1, 2024
    Date of Patent: March 17, 2026
    Assignee: Akeana, Inc.
    Inventor: Rabin Sugumar
  • Publication number: 20260064619
    Abstract: Disclosed embodiments provide techniques for communication. A system-on-a-chip (SoC) is accessed. The SoC includes a mesh network that includes a plurality of nodes. At least one node within the plurality of nodes includes a quality-of-service (QoS) agent. Network traffic data is collected by a first QoS agent within a first node. The network traffic data is associated with the first node and the traffic occurs during a first timing window. The first QoS agent receives a request by a primary device within the first node to send data to a secondary device in a second node. The first QoS agent analyzes the network traffic data. A first routing agent within the first node selects an intermediate node within the plurality of nodes, based on the analyzing. The primary device sends the data to the intermediate node.
    Type: Application
    Filed: August 29, 2025
    Publication date: March 5, 2026
    Applicant: Akeana, Inc.
    Inventor: Yogesh Shamkant Thombre
  • Publication number: 20260064421
    Abstract: A processor core is accessed. The processor core supports atomic memory operations. The atomic memory operations include multi-operand operations. A compare and swap (CAS) instruction is issued in the processor core. The CAS instruction necessitates three source operands. One of the source operands comprises a destination register. The CAS instruction is split into a plurality of micro-operations. A first value is written from a memory location indicated by a first source operand into a temporary register. A memory word location addressed by a second source operand is accessed using a second micro-operation. The first micro-operation and the second micro-operation are interlocked. Contents of the memory word location are compared. A third source operand is stored to the memory word location addressed by the second source operand. The storing is based on a match of the comparing.
    Type: Application
    Filed: August 27, 2025
    Publication date: March 5, 2026
    Applicant: Akeana, Inc.
    Inventors: Ricardo Ramirez, Abhijit Sil
  • Publication number: 20260064600
    Abstract: A processor core is accessed. The processor core supports virtual memory addressing. The processor core includes a memory management unit (MMU) and a load store unit (LSU). A page table walk is performed by the MMU. The page table walk is responsive to a memory operation. The page table walk identifies a page table entry (PTE) for a virtual to physical address translation. The PTE is read. The reading obtains a first value from the PTE and includes determining, by the MMU, to update one or more status bits within the PTE. The PTE is re-read. The re-reading obtains a second value from the PTE. The PTE is updated to include the one or more status bits, based on a match between the first and second value. The updated PTE is stored in a page table. The re-reading, the updating, and the storing are performed atomically.
    Type: Application
    Filed: August 29, 2025
    Publication date: March 5, 2026
    Applicant: Akeana, Inc.
    Inventors: Ricardo Ramirez, Sundeep Chadha, Hai Ngoc Nguyen, Abbas Rashid
  • Publication number: 20260056740
    Abstract: A processor core is accessed. The processor core is configured to execute vector instructions, scalar instructions, and micro-operations. A vector memory instruction is decoded. The vector memory instruction is associated with a memory addressing mode. The decoding includes replacing the vector memory instruction with one or more vector memory micro-operations (VMMOs). The one or more VMMOs are substituted with one or more vector memory element micro-operations (VMEMOs). The substituting is based on the memory addressing mode. At least one VMEMO within the one or more VMEMOs is forwarded to a memory queue within a plurality of memory queues. A memory operation is issued to a load-store unit within the processor core. The issuing includes selecting, from the plurality of memory queues, the memory operation. The replacing is based on a micro-operation sequencer. One or more destination registers for the vector memory instruction are determined.
    Type: Application
    Filed: October 30, 2025
    Publication date: February 26, 2026
    Applicant: Akeana, Inc.
    Inventors: Hai Ngoc Nguyen, Abhijit Sil
  • Patent number: 12554503
    Abstract: Disclosed embodiments provide techniques for instruction execution with a processor pipeline for data transfer operations. A processor core is accessed. The processor core executes one or more instructions out of order. The processor core supports integer operations and floating-point operations. An instruction in the processor core is decoded. The instruction is a data transfer operation. The data transfer operation necessitates a floating-point operation and an integer operation. The floating-point operation and the integer operation are dispatched to one or more issue queues. The floating-point operation and the integer operation are interlocked. The interlocking is accomplished using at least one entry in the one or more issue queues. A first operation of the floating-point operation and the integer operation is executed. A second operation of the floating-point operation and the integer operation is executed. The execution of the second operation is based on the interlocking.
    Type: Grant
    Filed: April 26, 2024
    Date of Patent: February 17, 2026
    Assignee: Akeana, Inc.
    Inventors: Ricardo Ramirez, Albert Anthony Martin, Abhijit Sil, Rabin Sugumar
  • Publication number: 20260044348
    Abstract: Disclosed techniques enable vector instruction processing. A processor core is accessed. The processor core is coupled to a memory hierarchy, and is configured to execute vector operations, scalar operations, and micro-operations. A decode unit decodes a vector memory operation. The vector memory operation is associated with a unit stride addressing mode. The decoding includes dividing the vector memory operation into one or more vector memory micro-operations. A dispatch unit sends at least one vector micro-operation within the one or more vector micro-operations to a scalar request queue within a plurality of request queues. The at least one vector micro-operation is issued to a load-store unit within the processor core. The issuing includes selecting, from the plurality of request queues, the at least one vector memory micro-operation.
    Type: Application
    Filed: September 29, 2025
    Publication date: February 12, 2026
    Applicant: Akeana, Inc.
    Inventors: Hai Ngoc Nguyen, Abhijit Sil, Rabin Sugumar
  • Publication number: 20260044339
    Abstract: A processor core is coupled to a memory hierarchy. The processor core is configured to execute vector instructions, scalar instructions, and micro-operations. A dispatch unit within the processor core receives a vector memory operation. The dispatch unit sends the vector memory operation to a first vector input queue of multiple vector input queues. The sending is based on the memory addressing mode. A micro-operation sequencer splits the vector memory operation into one or more memory micro-operations, which includes forwarding each micro-operation within the one or more micro-operations to a first memory queue within multiple memory queues. A memory operation is then issued to a load-store unit within the processor core. The issuing includes selecting, from the multiple memory queues, the memory operation. The vector memory operation comprises either a vector load operation or a vector store operation.
    Type: Application
    Filed: August 5, 2025
    Publication date: February 12, 2026
    Applicant: Akeana, Inc.
    Inventors: Hai Ngoc Nguyen, Abhijit Sil
  • Patent number: 12547407
    Abstract: Techniques for providing a return address stack with branch mispredict recovery are disclosed. A processor core is accessed. The processor core includes a return address stack (RAS), a local cache hierarchy, and branch prediction logic. RAS state information, including a write pointer, a read pointer, and a RAS count, is sent to a branch execution unit. One or more call instructions are detected in an instruction stream. The detecting generates a predicted return address for each of the one or more call instructions which are pushed on the RAS. The pushing is directed by the write pointer. One or more return instructions are recognized in the instruction stream. The write pointer and the read pointer for the RAS are updated, based on information from the branch execution unit. The predicted return address for each of the one or more return instructions is popped from the RAS.
    Type: Grant
    Filed: December 29, 2023
    Date of Patent: February 10, 2026
    Assignee: Akeana, Inc.
    Inventors: James Youngsae Cho, Rabin Sugumar
  • Publication number: 20260037599
    Abstract: An accelerator is accessed. The accelerator includes a weight-stationary systolic array of one or more multiply-accumulate units. The accelerator is coupled to a memory hierarchy and a processor core. The processor core sends a work request to the accelerator. The work request is based on execution of a machine learning model and an activation matrix. In response to the work request, the accelerator loads a weight matrix and the activation matrix. The loading uses the memory hierarchy. The accelerator multiplies the weight matrix by the activation matrix. The multiplication results in an answer matrix. The accelerator stores the answer matrix in the memory hierarchy. The processor core obtains the answer matrix that was stored. The machine learning model is trained. The training produces the weight matrix, which is transposed and saved to the memory hierarchy.
    Type: Application
    Filed: August 4, 2025
    Publication date: February 5, 2026
    Applicant: Akeana, Inc.
    Inventors: David Cureton Baker, David St Clair Scott, Yogesh Shamkant Thombre
  • Patent number: 12517734
    Abstract: Disclosed techniques enable processors that are capable of performing a wide range of vector operations. A processor can support multiple types of instructions. The instructions can include one or more operands, and the one or more operands can include different data types. An A-type instruction can have dependencies on a B-type instruction. An A-type instruction includes a vector instruction. A B-type instruction includes an integer instruction or a floating-point instruction. A datapath is provided to enable intermediate results from a B-type instruction to be supplied to the A-type instruction on which it depends, without utilizing register file resources, such as general-purpose register (GPR) register resources. Vector instruction performance is thereby enabled without the additional resources used with GPR register access.
    Type: Grant
    Filed: October 4, 2024
    Date of Patent: January 6, 2026
    Assignee: Akeana, Inc.
    Inventors: Ricardo Ramirez, Abhijit Sil
  • Publication number: 20260003631
    Abstract: Disclosed embodiments provide techniques for prefetching. A processor core that executes instructions out of order (OOO) is accessed. The processor core includes a local cache hierarchy, data prefetch logic, and a prefetch table and is coupled to an external memory system. A first load instruction with a first address is detected and causes a miss in the local cache hierarchy. Information pertaining to the first load instruction is saved in an entry of the prefetch table. The information includes the first address, a confidence count, and an out-of-order mask. A second load instruction with a second address is identified. The information is updated based on the detecting. The information is advanced. The second address is the next sequential address after the first address. The advancing is based on the detecting. One or more data prefetch instructions are issued to the second address plus an offset.
    Type: Application
    Filed: May 1, 2024
    Publication date: January 1, 2026
    Applicant: Akeana, Inc.
    Inventor: Rabin SUGUMAR
  • Patent number: 12499056
    Abstract: Techniques for address translation are disclosed. A processor core is accessed. The processor core includes a memory management unit (MMU) and a unified translation lookaside buffer (TLB) within the MMU. The TLB is configured to support a plurality of page sizes, and the processor core is coupled to an external memory system. The TLB receives a lookup request for a virtual memory address, wherein the virtual memory address corresponds to a process running on the processor core. The TLB accesses a linked list that comprises a page size priority order for the plurality of page sizes. A lookup is performed in the TLB on the virtual memory address, and the lookup is conducted in the page size priority order. The linked list is updated, the updating moves a page size associated with the lookup to a location in the linked list, and a physical address is returned.
    Type: Grant
    Filed: December 28, 2023
    Date of Patent: December 16, 2025
    Assignee: Akeana, Inc.
    Inventor: Abbas Rashid
  • Publication number: 20250370932
    Abstract: Techniques for data sharing are disclosed. A system-on-a-chip (SoC) is accessed. The SoC includes one or more cache coherency blocks (CCBs) and one or more coherency ordering agents (COAs). Each COA includes a directory snoop filter (DSF). Each CCB is communicatively coupled to each COA by a network-on-a-chip (NOC) interface. A CCB requests a cache line associated with a memory address. The CCB is not a sharer of the cache line. A directory snoop filter (DSF) within a COA is read. The reading reveals one or more CCB sharers of the cache line and indicates there is no CCB owner. The COA includes a coherent last level cache (LLC) that contains a valid copy of the cache line. The COA assigns ownership of the cache line to the CCB. The assigning is recorded in the DSF. The cache line is forwarded by the coherent LLC to the CCB.
    Type: Application
    Filed: May 29, 2025
    Publication date: December 4, 2025
    Applicant: Akeana, Inc.
    Inventor: Madhavi Kondapaneni
  • Publication number: 20250342127
    Abstract: Techniques for managing computer processors that implement speculative reads are disclosed. A circular queue is accessed. The circular queue comprises a plurality of entries and includes a head pointer and a tail pointer. The head pointer and the tail pointer move independently in a single direction within the circular queue. A software agent selects a read entry associated with a read index within the circular queue. A validity of the read entry is interpreted based on a head wrap bit, a tail wrap bit, a read index, a head index, and a tail index. The circular queue returns an invalid signal to the software agent. The read entry is not modified when the read entry is interpreted as invalid. The circular queue sends data within the read entry to the software agent. The head wrap bit is calculated to be equal to the tail wrap bit.
    Type: Application
    Filed: April 30, 2025
    Publication date: November 6, 2025
    Applicant: Akeana, Inc.
    Inventor: Nagesh Suranna Kanakapura