Patents Assigned to Akeana, Inc.
-
Publication number: 20250147919Abstract: Techniques for sharing processor data within a network are disclosed. A system-on-a-chip (SOC) is accessed. The SOC includes a network-on-a-chip (NOC), which comprises an M×N mesh topology. The mesh includes a coherent tile at each mesh point. Each tile includes local snoop vectors (LSVs). A first coherent tile initiates a snoop operation. The tile generates a snoop vector that indicates other tiles to be notified of the snoop operation. The first coherent tile creates directional snoop vectors (DSVs). The creating logically combines the snoop vector with each of the LSVs. A coherent tile adjacent to the first coherent tile is selected. The adjacent tile is located in a cardinal direction from the first tile. A first DSV is chosen based on the cardinal direction. The first tile sends the snoop operation and the chosen first DSV to the selected adjacent tile.Type: ApplicationFiled: November 6, 2024Publication date: May 8, 2025Applicant: Akeana, Inc.Inventors: Madhavi Kondapaneni, Aqdas Javaid, Ayesha Zahid
-
Publication number: 20250147893Abstract: Techniques for coherent processor cache control are disclosed. A plurality of processor cores is accessed. Each processor of the plurality of processor cores includes a shared local cache, and the plurality of processor cores implements special cache coherency operations. An evict buffer is coupled to the plurality of processor cores. The evict buffer is shared among the plurality of processor cores, and the evict buffer enables delayed writes. Evict buffer writes are monitored. The monitoring evict buffer writes identifies a special cache coherency operation. The special cache coherency operation that was identified comprises a global snoop operation. The global snoop operation is initiated from a non-local agent within a globally coherent system. An evict buffer entry is marked. The marking corresponds to the special cache coherency operation that was identified, and the marking enables management of cache evict duplication.Type: ApplicationFiled: November 5, 2024Publication date: May 8, 2025Applicant: Akeana, Inc.Inventor: Sanjay Patel
-
Publication number: 20250138828Abstract: Disclosed embodiments provide techniques for instruction execution in computer processors. A dispatch unit dispatches instructions to one or more issue queues. Instructions from the issue queues feed into execution pipelines. Each execution pipeline includes instruction queue control logic, and two execution engines. A first execution engine is assigned to variable latency instructions while a second execution engine is assigned to fixed latency instructions. While a variable latency instruction executes, fixed latency instructions can be issued, executed, and completed. When the variable latency instruction finishes execution, a request is issued by the first execution engine to the instruction queue control logic. In response, the instruction queue control logic introduces a stall in a common write-back pipeline, allowing the variable latency instruction to complete. The result of the variable latency instruction is provided to a depending fixed latency instruction via a bypass path.Type: ApplicationFiled: October 31, 2024Publication date: May 1, 2025Applicant: Akeana, Inc.Inventors: Ricardo Ramirez, Abhijit Sil
-
Publication number: 20250117226Abstract: Disclosed techniques enable processors that are capable of performing a wide range of vector operations. A processor can support multiple types of instructions. The instructions can include one or more operands, and the one or more operands can include different data types. An A-type instruction can have dependencies on a B-type instruction. An A-type instruction includes a vector instruction. A B-type instruction includes an integer instruction or a floating-point instruction. A datapath is provided to enable intermediate results from a B-type instruction to be supplied to the A-type instruction on which it depends, without utilizing register file resources, such as general-purpose register (GPR) register resources. Vector instruction performance is thereby enabled without the additional resources used with GPR register access.Type: ApplicationFiled: October 4, 2024Publication date: April 10, 2025Applicant: Akeana, Inc.Inventors: Ricardo Ramirez, Abhijit Sil
-
Publication number: 20250021336Abstract: A processor core includes a local cache hierarchy, prefetch logic, and a prefetch table, where the processor core is coupled to an external memory system. A data stream is detected, where the data stream includes multiple load instructions, including a load instruction that causes a cache miss, resulting in prefetching. A prefetch table is initialized with information pertaining to load instructions, and includes a Positive or Negative value (PON), a stride, and a saturation count. Information in the prefetch table is updated as new load instructions are prefetched. An underlying stride of the data stream is discovered, based on the updating. Data is prefetched using an offset, where a polarity of the offset is based on the PON, enabling effective stride detection with dynamic directionality and out-of-order instructions.Type: ApplicationFiled: July 10, 2024Publication date: January 16, 2025Applicant: Akeana, Inc.Inventor: Rabin Sugumar
-
Publication number: 20240419551Abstract: Disclosed embodiments provide techniques for enhancing security of a processor. Multiple consistency units are distributed within a processor core. Instructions are executed in an architecturally defined mode. The architecturally defined mode can be based on an instruction set architecture (ISA). In response to detecting an error in at least one consistency unit, disclosed embodiments reduce the functionality of the processor core. The reduced functionality includes halting the processor core, shutting down the processor core, switching the functionality of the processor core to a safe mode, and/or other suitable actions. The consistency unit can include a program counter comparison function. The consistency unit can include a completion signal check function. The consistency unit can include an address check function. The consistency unit can include a temporal proximity check function. Disclosed embodiments provide safeguards against various environmental attacks, such as voltage and/or clock alterations.Type: ApplicationFiled: May 17, 2024Publication date: December 19, 2024Applicant: Akeana, Inc.Inventor: Rabin Sugumar
-
Publication number: 20240419595Abstract: Techniques for coherency management based on coherent hierarchical cache line tracking are disclosed. A plurality of processor cores is accessed. Each processor of the plurality of processor cores includes a local cache. A hierarchical cache is coupled to the plurality of processor cores. The hierarchical cache is shared among the plurality of processor cores. Coherency between the plurality of processor cores and the hierarchical cache is managed by a compute coherency block (CCB). A cache line directory is provided for the CCB. The cache line directory includes a core list field and a cache line present field. A cache line operation is detected. The cache line operation is detected by the CCB. The cache line operation is represented by an entry in the cache line directory. The cache line operation is performed, based on corresponding values of the core list field and the line present field.Type: ApplicationFiled: June 5, 2024Publication date: December 19, 2024Applicant: Akeana, Inc.Inventor: Sanjay Patel
-
Publication number: 20240419599Abstract: Disclosed embodiments provide techniques for direct cache transfer with shared cache. A system on a chip (SOC) is accessed. The SOC includes a plurality of coherent request nodes and a home node. The home node includes a directory-based snoop filter (DSF). A request node requests ownership of a coherent cache line within the SOC. The requesting includes an address associated with the coherent cache line. The home node detects that the coherent cache line is shared with one or more other request nodes. The home node determines a current owner of the coherent cache line. The home node sends an invalidating snoop instruction to the one or more other request nodes and transmits a forwarding snoop instruction. The forwarding snoop instruction establishes a direct cache transfer between the request node and the current owner of the coherent cache line.Type: ApplicationFiled: June 14, 2024Publication date: December 19, 2024Applicant: Akeana, Inc.Inventor: Madhavi Kondapaneni
-
Publication number: 20240272905Abstract: Techniques for processor request arbitration using access request dynamic multilevel arbitration are disclosed. A plurality of processor cores is accessed. The plurality of processor cores is coupled to a memory subsystem. A plurality of access requests is generated within the processor cores coupled to the memory subsystem. The plurality of access requests is generated by the plurality of processor cores. Multiple access requests are made in a single processor cycle. Only one access request is serviced in a single processor cycle. A set of at least two criteria is associated to each access request in the plurality of access requests criteria which are dynamically assigned. The requests are organized in two vectors and a stack. The vectors are organized as linear vectors. The stack is organized as a push-pop stack. The request is granted, based on data in the two vectors and the stack.Type: ApplicationFiled: February 9, 2024Publication date: August 15, 2024Applicant: Akeana, Inc.Inventor: Sanjay Patel
-
Publication number: 20240241830Abstract: Techniques for cache management based on cache management using memory queues are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a common memory structure coupled to the plurality of processor cores. Coherency for the coherency domain is managed using a compute coherency block (CCB). The CCB includes a memory queue for controlling transfer of cache lines determined by the CCB. The memory queue includes an evict queue and a miss queue. Snoop requests are generated by the CCB. The snoop requests correspond to entries in the memory queue. Cache lines are transferred between the CCB and a bus interface unit. The transferring is controlled by the memory queue. The bus interface unit controls memory accesses.Type: ApplicationFiled: January 16, 2024Publication date: July 18, 2024Applicant: Akeana, Inc.Inventors: Sanjay Patel, Yogesh Shamkant Thombre
-
Publication number: 20240220416Abstract: Techniques for address translation are disclosed. A processor core is accessed. The processor core includes a memory management unit (MMU) and a unified translation lookaside buffer (TLB) within the MMU. The TLB is configured to support a plurality of page sizes, and the processor core is coupled to an external memory system. The TLB receives a lookup request for a virtual memory address, wherein the virtual memory address corresponds to a process running on the processor core. The TLB accesses a linked list that comprises a page size priority order for the plurality of page sizes. A lookup is performed in the TLB on the virtual memory address, and the lookup is conducted in the page size priority order. The linked list is updated, the updating moves a page size associated with the lookup to a location in the linked list, and a physical address is returned.Type: ApplicationFiled: December 28, 2023Publication date: July 4, 2024Applicant: Akeana, Inc.Inventor: Abbas Rashid
-
Publication number: 20240220267Abstract: Techniques for providing a return address stack with branch mispredict recovery are disclosed. A processor core is accessed. The processor core includes a return address stack (RAS), a local cache hierarchy, and branch prediction logic. RAS state information, including a write pointer, a read pointer, and a RAS count, is sent to a branch execution unit. One or more call instructions are detected in an instruction stream. The detecting generates a predicted return address for each of the one or more call instructions which are pushed on the RAS. The pushing is directed by the write pointer. One or more return instructions are recognized in the instruction stream. The write pointer and the read pointer for the RAS are updated, based on information from the branch execution unit. The predicted return address for each of the one or more return instructions is popped from the RAS.Type: ApplicationFiled: December 29, 2023Publication date: July 4, 2024Applicant: Akeana, Inc.Inventors: James Youngsae Cho, Rabin Sugumar
-
Publication number: 20240220412Abstract: Techniques for coherency management using distributed snoop requests are disclosed. A plurality of processor cores is accessed. The plurality of processor cores comprises a coherency domain. Two or more processor cores within the plurality of processor cores generate read operations for a shared memory structure coupled to the plurality of processor cores. Snoop requests are ordered in a two-dimensional matrix. The snoop requests are based on physical addresses for the shared memory structure. The two-dimensional matrix is extensible along each axis of the two-dimensional matrix. Snoop responses are mapped to a first-in first-out (FIFO) mapping queue. Each snoop response corresponds to a snoop request. Each processor core of the plurality of processor cores is coupled to at least one FIFO mapping queue. A memory access operation is completed, based on a comparison of the snoop requests and the snoop responses.Type: ApplicationFiled: December 29, 2023Publication date: July 4, 2024Applicant: Akeana, Inc.Inventor: Sanjay Patel
-
Publication number: 20240211259Abstract: Disclosed embodiments provide techniques for data prefetching. A processor core is accessed. The processor core includes prefetch logic and a local cache hierarchy and is coupled to a memory system. A stride of a data stream is detected. The data stream comprises two or more load instructions that cause two or more misses in the local cache hierarchy. Information about the data stream is accumulated. The information includes a stride count. Prefetch operations to the memory system are generated, based on the information. The prefetch operations include prefetch addresses. A rate of the prefetch operations is limited, based on the stride count. Based on the stride count, the prefetcher can enter a saturation state. The saturation state keeps the cache supplied with prefetched data. A number of stride prefetch operations is based on the stride of the data stream. The number is stored in a software-updatable configuration register array.Type: ApplicationFiled: December 27, 2023Publication date: June 27, 2024Applicant: Akeana, Inc.Inventors: James Youngsae Cho, Rabin Sugumar
-
Publication number: 20240211366Abstract: Techniques for performance profiling based on processor performance profiling using agents are disclosed. A processor core is accessed. The processor core includes a performance counter, a performance counter storage area, and a performance counter control register. The processor core includes a performance monitoring interface. The performance counter, performance counter storage area, and performance counter control register are assigned to an external profiling agent, which loads the performance counter and the performance counter control register. The loading is based on a particular event in the processor core. A program state is saved to the storage area, based on a counter event in the performance counter and an enable bit in the performance counter control register being set. The program state that is saved corresponds to code being executed on the processor core. The program state is read, from the storage area, by the external profiling agent.Type: ApplicationFiled: December 20, 2023Publication date: June 27, 2024Applicant: Akeana, Inc.Inventor: Rabin Sugumar
-
Publication number: 20240192961Abstract: Techniques for instruction execution based on processor instruction exception handling are disclosed. A processor core is accessed. The processor core executes at least one instruction thread. The processor core executes one or more instructions out of order. An ordered list of instructions is maintained. The ordered list is based on instructions that are presented to the processor core for execution. The ordered list is organized using one or more pointers. An execution exception is detected in the processor core. The execution exception corresponds to one of the instructions in the ordered list. The execution exception requires initiating an exception handling routine. An effective age of an instruction in the ordered list is determined. The effective age corresponds to the execution exception. The exception handling routine is initiated, based on matching the effective age of an instruction in the ordered list with one of the one or more pointers.Type: ApplicationFiled: December 6, 2023Publication date: June 13, 2024Applicant: Akeana, Inc.Inventors: Ricardo Ramirez, Rabin Sugumar
-
Publication number: 20240192958Abstract: Disclosed embodiments provide techniques for branch prediction. A processor core is accessed. The processor core is coupled to memory and includes branch prediction circuitry. The branch prediction circuitry includes a branch target buffer (BTB) and an indirect branch target buffer (BTBI). A hashed program counter within the processor core is read. The BTB and BTBI are searched. The searching the BTB is accomplished with the hashed program counter and the searching the BTBI is accomplished with the hashed program counter and branch history information. A predicted branch target address within the BTBI or the BTB is matched. The matching within the BTBI is based on an indirect branch instruction, and the matching within the BTB is based on other branch instruction types. The predicted branch target address that was matched is predicted taken. The processor core is directed to fetch a next instruction from the predicted branch target address.Type: ApplicationFiled: December 11, 2023Publication date: June 13, 2024Applicant: Akeana, Inc.Inventors: James Youngsae Cho, Chandramouli Banerjee, Rabin Sugumar
-
Publication number: 20240168882Abstract: Techniques for coherency management based on processor and network-on-chip coherency management are disclosed. A plurality of processor cores is accessed. Each processor of the plurality of processor cores accesses a common memory through a coherent network-on-chip. The coherent network-on-chip comprises a global coherency. A local cache is coupled to a grouping of two or more processor cores. The local cache is shared among the two or more processor cores. The grouping of two or more processor cores and the shared local cache operates using local coherency. The local coherency is distinct from the global coherency. A cache maintenance operation is performed in the grouping of two or more processor cores and the shared local cache. The cache maintenance operation generates cache coherency transactions between the global coherency and the local coherency. The cache coherency transactions enable coherency among the plurality of processor cores, local caches, and the memory.Type: ApplicationFiled: November 21, 2023Publication date: May 23, 2024Applicant: Akeana, Inc.Inventors: Sanjay Patel, Hai Ngoc Nguyen