Dynamic Instruction Dependency Checking, Monitoring Or Conflict Resolution Patents (Class 712/216)
-
Patent number: 11402822Abstract: To provide a numerical controller that can detect a position in a machining program at which a speed control abnormality is likely to occur due to an insufficient look-ahead blocks that are used to determine an acceleration/deceleration operation, and start a look-ahead processing function from the position in parallel with looking ahead at the machining program from the start of the machining program.Type: GrantFiled: October 25, 2019Date of Patent: August 2, 2022Assignee: FANUC CORPORATIONInventors: Daisuke Uenishi, Chikara Tango
-
Patent number: 11392410Abstract: Operand pool instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of operand pool reservation circuits each having an assigned number of source operands for an instruction stored that must be ready before the instruction is issued. Instructions having the same number of source operands that are not yet ready for its issuance can be stored in an operand pool reservation circuit having the same assigned number of source operands. In this manner, the number of reservation entries and associated comparator circuits in the clustered scheduler circuit is distributed among the plurality of operand pool reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit. This can avoid or reduce an increase in scheduling latency for a given number of reservation entries in the clustered scheduler circuit.Type: GrantFiled: April 8, 2020Date of Patent: July 19, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Shivam Priyadarshi, Yusuf Cagatay Tekmen, Rodney Wayne Smith, Vignyan Reddy Kothinti Naresh
-
Patent number: 11392380Abstract: Systems, methods, and apparatuses relating to circuitry to precisely monitor memory store accesses are described.Type: GrantFiled: December 28, 2019Date of Patent: July 19, 2022Assignee: Intel CorporationInventors: Ahmad Yasin, Raanan Sade, Liron Zur, Igor Yanover, Joseph Nuzman
-
Patent number: 11372662Abstract: Disclosed herein are methods, systems, and processes to perform granular and selective agent-based throttling of command executions. A resource consumption threshold is allocated to an agent process that is configured to perform data collection tasks on a host computing device. A desired throttle is generated for the agent process based on the resource consumption threshold allocated to the agent process and execution of the agent process is controlled in polling intervals. For each polling interval, a current throttle level for the agent process is determined based on a run count and a skip count of the agent process, the agent process is suspended if the agent process is active and the current throttle is greater than the desired throttle level, and the agent process is resumed if the agent process is idle and the current throttle level is not greater than the desired throttle level.Type: GrantFiled: June 9, 2021Date of Patent: June 28, 2022Assignee: Rapid7, Inc.Inventor: Shreyas Khare
-
Patent number: 11366664Abstract: Systems, apparatuses and methods are disclosed for efficient management of registers in a graph stream processing (GSP) system. The GSP system includes a thread scheduler module operative to initiate a Single Instruction Multiple Data (SIMD) thread, the SIMD thread including a dispatch mask with an initial value. A thread arbiter module operative to select an instruction from the instructions and provide the instruction to each of one or more compute resources, and an instruction iterator module, associated with the each of one or more compute resources operative to determine a data type of the instruction. The instruction iterator module iteratively executes the instruction based on the data type and the dispatch mask.Type: GrantFiled: December 8, 2019Date of Patent: June 21, 2022Assignee: Blaize, Inc.Inventors: Kamaraj Thangam, Srinivasulu Nagisetty, Venkata Divya Bharathi Palaparthy, Aswathy Asok, Satyaki Koneru
-
Patent number: 11366669Abstract: Data processing apparatus, data processing methods, a method and a computer program product are disclosed. The data processing apparatus includes a processor core operable to execute sequences of instructions of a plurality of program threads. The processor core has a plurality of pipeline stages, one of which is an instruction schedule stage having scheduling logic operable, in response to a thread pause instruction within a program thread, to prevent scheduling of instructions from that program thread following the thread pause instruction and instead to schedule instructions from another program thread for execution within the plurality of pipeline stages.Type: GrantFiled: November 30, 2016Date of Patent: June 21, 2022Assignee: Swarm64 ASInventor: Eivind Liland
-
Patent number: 11360773Abstract: Reusing fetched, flushed instructions after an instruction pipeline flush in response to a hazard in a processor to reduce instruction re-fetching is disclosed. An instruction processing circuit is configured to detect fetched performance degrading instructions (Pals) in a pre-execution stage in an instruction pipeline that may cause a precise interrupt that would cause flushing of the instruction pipeline. In response to detecting a PDI in an instruction pipeline, the instruction processing circuit is configured to capture the fetched PDI and/or its successor, younger fetched instructions that are processed in the instruction pipeline behind the PDI, in a pipeline refill circuit.Type: GrantFiled: June 22, 2020Date of Patent: June 14, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Rami Mohammad Al Sheikh, Michael Scott McIlvaine
-
Patent number: 11354129Abstract: A system for predicting latency of at least one variable-latency instruction, wherein a microprocessor includes at least one pipeline, the at least one pipeline having an instruction stream. The microprocessor is configured to issue at least one dependent instruction, execute the at least one pipeline to serve at least one variable-latency instruction, generate a result of the at least one variable-latency instruction, and serve the at least one dependent instruction by using the result of the at least one variable-latency instruction.Type: GrantFiled: October 9, 2015Date of Patent: June 7, 2022Assignee: SPREADTRUM HONG KONG LIMITEDInventor: Jeremy L. Branscome
-
Patent number: 11340787Abstract: The present disclosure includes apparatuses and methods related to a memory protocol. An example apparatus can perform operations on a number of block buffers of the memory device based on commands received from a host using a block configuration register, wherein the operations can read data from the number of block buffers and write data to the number of block buffers on the memory device.Type: GrantFiled: December 18, 2019Date of Patent: May 24, 2022Assignee: Micron Technology, Inc.Inventors: Robert M. Walker, James A. Hall, Jr.
-
Patent number: 11281466Abstract: A floating point unit includes a non-pickable scheduler queue (NSQ) that offers a load operation concurrently with a load store unit retrieving load data for an operand that is to be loaded by the load operation. The floating point unit also includes a renamer that renames architectural registers used by the load operation and allocates physical register numbers to the load operation in response to receiving the load operation from the NSQ. The floating point unit further includes a set of pickable scheduler queues that receive the load operation from the renamer and store the load operation prior to execution. A physical register file is implemented in the floating point unit and a free list is used to store physical register numbers of entries in the physical register file that are available for allocation.Type: GrantFiled: October 22, 2019Date of Patent: March 22, 2022Assignees: Advanced Micro Devices, Inc., ATI TECHNOLOGIES ULCInventors: Arun A. Nair, Michael Estlick, Erik Swanson, Sneha V. Desai, Donglin Ji
-
Patent number: 11281187Abstract: To provide a numerical controller that can detect a position in a machining program at which a speed control abnormality is likely to occur due to an insufficient look-ahead blocks that are used to determine an acceleration/deceleration operation, and supplement the look-ahead blocks at that position in order to stabilize feed rate, cutting speed and other factors. A numerical controller includes a required look-ahead blocks setting unit that sets a required look-ahead blocks, which is a look-ahead blocks required to execute a machining program, and an operation limitation unit that compares a look-ahead blocks calculated by a look-ahead blocks calculation unit to the required look-ahead blocks and, if the look-ahead blocks is less than the required look-ahead blocks, limits execution of the machining program until the look-ahead blocks reach the required look-ahead blocks.Type: GrantFiled: October 22, 2019Date of Patent: March 22, 2022Assignee: FANUC CORPORATIONInventors: Daisuke Uenishi, Chikara Tango
-
Patent number: 11275712Abstract: In an embodiment, a method for processing data in a single instruction multiple data (SIMD) computer architecture is provided. A processing element (PE) may determine based on a masking instruction, a predication state indicative of one of a conditional predication mode and an absolute predication mode. The PE may receive a predicated instruction and, based on a value of a head bit of the bits of a predication mask and on the value indicative of the predication state whether to commit a computation corresponding to execution of the predicated instruction. In another embodiment, a SIMD controller stores loops and sections of a program as a separate instruction stream record for generating the memory address of the next instruction. For data streams, the SIMD controller records information for each data memory access that references the same register files that are used by the instruction streams.Type: GrantFiled: August 20, 2020Date of Patent: March 15, 2022Assignee: NORTHROP GRUMMAN SYSTEMS CORPORATIONInventors: Paul Kenton Tschirhart, Brian Konigsburg
-
Patent number: 11249985Abstract: The present disclosure relates to systems, methods, and non-transitory computer readable storage media for implementing a scalable, secure, efficient, and adaptable distributed digital ledger transaction network. Indeed, the disclosed systems can reduce storage and processing requirements, improve security of implementing computing devices and underlying digital assets, accommodate a wide variety of different digital programs (or “smart contracts”), and scale to accommodate billions of users and associated digital transactions. For example, the disclosed systems can utilize a host of features that improve storage, account/address management, digital transaction execution, consensus, and synchronization processes. The disclosed systems can also utilize a new programming language that improves efficiency and security of the distributed digital ledger transaction network.Type: GrantFiled: June 15, 2019Date of Patent: February 15, 2022Assignee: Facebook, Inc.Inventors: Qinfan Wu, Benjamin D. Maurer
-
Patent number: 11243774Abstract: Methods, systems and computer program products for dynamically selecting an OSC hazard avoidance mechanism are provided. Aspects include receiving a load instruction that is associated with an operand store compare (OSC) prediction. The OSC prediction is stored in an entry of an OSC history table (OHT) and includes a multiple dependencies indicator (MDI). Responsive to determining the MDI is in a first state, aspects include applying a first OSC hazard avoidance mechanism in relation to the load instruction. Responsive to determining that the load instruction is dependent on more than one store instruction, aspects include placing the MDI in a second state. The MDI being in the second state provides an indication to apply a second OSC hazard avoidance mechanism in relation to the load instruction.Type: GrantFiled: March 20, 2019Date of Patent: February 8, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: James Raymond Cuffney, Adam Collura, James Bonanno, Edward Malley, Anthony Saporito, Jang-Soo Lee, Michael Cadigan, Jr., Jonathan Hsieh
-
Patent number: 11210193Abstract: A method for improving performance of a system including a first processor and a second processor includes obtaining a code region specified to be executed on the second processor, the code region including a plurality of instructions, calculating a performance improvement of executing at least one of the plurality of instructions included in the code region on the second processor over executing the at least one instruction on the first processor, removing the at least one instruction from the code region in response to a condition including that the performance improvement does not exceed a first threshold, and repeating the calculating and the removing to produce a modified code region specified to be executed on the second processor.Type: GrantFiled: October 28, 2019Date of Patent: December 28, 2021Assignee: International Business Machines CorporationInventor: Kazuaki Ishizaki
-
Patent number: 11204768Abstract: Instruction length based parallel instruction demarcators and methods for parallel instruction demarcation are included, wherein an instruction sequence is received at an instruction buffer, the instruction sequence comprising a plurality of instruction syllables, and the instruction sequence is stored at the instruction buffer. It is determined, using one or more logic blocks arranged in a sequence, a length of instructions and at least one boundary. Additionally, using a controlling logic block, the sequence is demarcated into individual instructions.Type: GrantFiled: August 12, 2020Date of Patent: December 21, 2021Inventor: Sitaram Yadavalli
-
Patent number: 11204773Abstract: A data processing apparatus is provided. It includes processing circuitry for speculatively executing a plurality of instructions. Storage circuitry stores a current state of the processing circuitry and a plurality of previous states of the processing circuitry. Execution of the plurality of instructions changes the current state of the processing circuitry. Flush circuitry replaces, in response to a miss-prediction, the current state of the processing circuitry with a replacement one of the plurality of previous states of the processing circuitry.Type: GrantFiled: September 7, 2018Date of Patent: December 21, 2021Assignee: Arm LimitedInventors: William Elton Burky, Glen Andrew Harris, Yasuo Ishii
-
Patent number: 11200064Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards wherein inter-pipeline data hazards are identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. Then when a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted (e.g. incremented) to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted (e.g. decremented) to indicate that the hazard related to the primary instruction has been resolved.Type: GrantFiled: October 14, 2020Date of Patent: December 14, 2021Assignee: Imagination Technologies LimitedInventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
-
Patent number: 11188255Abstract: An integrated circuit may include a memory controller circuit for communicating with an off-chip memory device. The memory controller is operable in a read-write major mode that is capable of dynamically adapting to any memory traffic pattern, which results in improved memory scheduling efficiency across different user applications. The memory controller may include at least a write command queue, a read command queue, an arbiter, and a command scheduler. The command scheduler may monitor a write command count, a read command count, a write stall count, and a read stall count to determine whether to dynamically adjust a read burst threshold setting and a write burst threshold setting.Type: GrantFiled: March 28, 2018Date of Patent: November 30, 2021Assignee: Intel CorporationInventors: Chee Hak Teh, Yu Ying Ong, Kevin Chao Ing Teoh
-
Patent number: 11188334Abstract: Obsoleting values stored in registers in a processor based on processing obsolescent register-encoded instructions is disclosed. The processor is configured to support execution of read and/or write instructions that include obsolescence encoding indicating that one or more of its source and/or target register operands are to be obsoleted by the processor. A register encoded as obsolescent means the data value stored in such register will not be used by subsequent instructions in an instruction stream, and thus does not need to be retained. Thus, such register can be set as being in an obsolescent state so that the data value stored in such register can be ignored to improve performance. As one example, data values for registers having an obsolescent state can be ignored and thus not stored in a saved context for a process being switched out, thus conserving memory and improving processing time for a process switch.Type: GrantFiled: December 2, 2019Date of Patent: November 30, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Thomas Andrew Sartorius, Thomas Philip Speier, Michael Scott McIlvaine, James Norris Dieffenderfer, Rodney Wayne Smith
-
Patent number: 11182207Abstract: Techniques are disclosed for reducing the latency between the completion of a producer task and the launch of a consumer task dependent on the producer task. Such latency exists when the information needed to launch the consumer task is unavailable when the producer task completes. Thus, various techniques are disclosed, where a task management unit initiates the retrieval of the information needed to launch the consumer task from memory in parallel with the producer task being launched. Because the retrieval of such information is initiated in parallel with the launch of the producer task, the information is often available when the producer task completes, thus allowing for the consumer task to be launched without delay. The disclosed techniques, therefore, enable the latency between completing the producer task and launching the consumer task to be reduced.Type: GrantFiled: June 24, 2019Date of Patent: November 23, 2021Assignee: NVIDIA CORPORATIONInventors: Gentaro Hirota, Brian Pharris, Jeff Tuckey, Robert Overman, Stephen Jones
-
Patent number: 11182168Abstract: A computer data processing system includes an instruction pipeline having a front end and a back end, a decoding and dispatch unit to dispatch a current instruction; and a pipeline by-pass unit to invoke an out-of-order pipeline by-pass operation. The pipeline by-pass unit by-passes a section of the instruction pipeline such that the current instruction architecturally completes before initiating instruction execution. The computer data processing system further includes a post-completion execution unit that executes the current instruction after the current instruction architecturally completes.Type: GrantFiled: December 21, 2020Date of Patent: November 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avery Francois, Christian Jacobi, Gregory William Alexander
-
Patent number: 11163675Abstract: Mutation testing can indicate whether mutants of a software application, created by intentionally altering source code of the software application, are successfully “killed” by test cases executed against the mutants. Mutation testing can be performed via parallel threads by, within each parallel thread, modifying individual source code class files and recompiling the modified class files to generate and test mutants. Individual mutation test results produced within each of the parallel threads can be aggregated to generate an aggregated test result report that indicates overall testing metrics associated with the mutation testing across the parallel threads.Type: GrantFiled: April 7, 2021Date of Patent: November 2, 2021Assignee: State Farm Mutual Automobile Insurance CompanyInventors: Andrew L Pearson, Nate Shepherd
-
Patent number: 11150979Abstract: A method for handling load faults in an out-of-order processor is described. The method includes detecting, by a memory ordering buffer of the out-of-order processor, a load fault corresponding to a load instruction that was executed out-of-order by the out-of-order processor; determining, by the memory ordering buffer, whether instant reclamation is available for resolving the load fault of the load instruction; and performing, in response to determining that instant reclamation is available for resolving the load fault of the load instruction, instant reclamation to re-fetch the load instruction for execution prior to attempting to retire the load instruction.Type: GrantFiled: August 13, 2019Date of Patent: October 19, 2021Assignee: Intel CorporationInventors: Zeev Sperber, Stanislav Shwartsman, Jared W. Stark, IV, Lihu Rappoport, Igor Yanover, George Leifman
-
Patent number: 11150902Abstract: Systems and methods of performing processor pipeline management include receiving an instruction for processing, determining that data in a first memory sub-group of a memory group needed to process the instruction is not available in a cache that ensures fixed latency access, and determining that the instruction should be put in a sleep state. The sleep state indicates that the instruction will not be reissued until the instruction is moved to a wakeup state. The methods also include associating the instruction with a ticket identifier (ID) that corresponds with a second memory sub-group of the memory group, and moving the instruction to the wakeup state based on the second memory sub-group of the memory group being moved into the cache.Type: GrantFiled: February 11, 2019Date of Patent: October 19, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Taylor J. Pritchard, Jonathan Hsieh, Michael Cadigan, Jr.
-
Patent number: 11150907Abstract: An execution unit circuit for use in a processor core provides efficient use of area and energy by reducing the per-entry storage requirement of a load-store unit issue queue. The execution unit circuit includes a recirculation queue that stores the effective address of the load and store operations and the values to be stored by the store operations. A queue control logic controls the recirculation queue and issue queue so that that after the effective address of a load or store operation has been computed, the effective address of the load operation or the store operation is written to the recirculation queue and the operation is removed from the issue queue, so that address operands and other values that were in the issue queue entry no longer require storage. When a load or store operation is rejected by the cache unit, it is subsequently reissued from the recirculation queue.Type: GrantFiled: July 30, 2018Date of Patent: October 19, 2021Assignee: International Business Machines CorporationInventors: Salma Ayub, Sundeep Chadha, Robert Allen Cordes, David Allen Hrusecky, Hung Qui Le, Dung Quoc Nguyen, Brian William Thompto
-
Patent number: 11113055Abstract: A computer implemented method for marking a store instruction overlap in a processor pipeline is provided. A non-limiting example of the method includes detecting a second store instruction subsequent to a first store instruction in an instruction stream, in which there is a match between the operand address information of the first store instruction and a load instruction. The operand address information of the first store instruction is compared with the operand address information of the second store instruction to determine whether there is match. In the event of a match, the second store instruction is delayed in the processor pipeline in response to determining that there is a memory image overlap between the operand address information of the second store instruction and the first store instruction.Type: GrantFiled: March 19, 2019Date of Patent: September 7, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Edward Malley, Jang-Soo Lee, Anthony Saporito, Chung-Lung K. Shum, Gregory William Alexander
-
Patent number: 11106431Abstract: A computing device to implement fast floating-point adder tree for the neural network applications is disclosed. The fast float-point adder tree comprises a data preparation module, a fast fixed-point Carry-Save Adder (CSA) tree, and a normalization module. The floating-point input data comprises a sign bit, exponent part and fraction part. The data preparation module aligns the fraction part of the input data and prepares the input data for subsequent processing. The fast adder uses a signed fixed-point CSA tree to quickly add a large number of fixed-point data into 2 output values and then uses a normal adder to add the 2 output values into one output value. The fast adder uses for a large number of operands is based on multiple levels of fast adders for a small number of operands. The output from the signed fixed-point Carry-Save Adder tree is converted to a selected floating-point format.Type: GrantFiled: February 22, 2020Date of Patent: August 31, 2021Assignee: DINOPLUSAI HOLDINGS LIMITEDInventors: Yutian Feng, Yujie Hu
-
Patent number: 11086526Abstract: The present disclosure provides techniques for implementing a computing system that includes a processing sub-system, a memory sub-system, and one or more memory controllers. The processing sub-system includes processing circuitry that performs an operation based on a target data block and a processor-side cache coupled between the processing circuitry and a system bus. The memory sub-system includes a memory that stores data blocks in a memory array and a memory-side caches coupled between the memory channel and the system bus. The one or more memory controllers control caching in the processor-side cache based at least in part on temporal relationship between previous data block targeting by the processing circuitry and control caching in memory-side cache based at least in part on spatial relationship between data block storage locations in the memory channel.Type: GrantFiled: August 2, 2018Date of Patent: August 10, 2021Assignee: Micron Technology, Inc.Inventors: Richard C. Murphy, Anton Korzh, Stephen S. Pawlowski
-
Patent number: 11080060Abstract: Managing application execution by receiving a store instruction, including a store instruction itag and store instruction address, creating a hash of the store instruction address, receiving a load instruction and matching a hash of a store instruction address associated with the load instruction with the hash of the store instruction address associated with the store instruction. The store instruction itag is sent to an instruction sequencing unit (ISU). The ISU delays execution of the load instruction according to the received itag.Type: GrantFiled: April 23, 2019Date of Patent: August 3, 2021Assignee: International Business Machines CorporationInventors: Ehsan Fatehi, Brian W. Thompto, John B. Griswell, Jr.
-
Patent number: 11048579Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.Type: GrantFiled: August 12, 2019Date of Patent: June 29, 2021Assignee: Intel CorporationInventors: Steven R. King, Frank L. Berry, Michael E. Kounavis
-
Patent number: 11023241Abstract: Systems and methods selectively bypass address-generation hardware in processor instruction pipelines. In an embodiment, a processor includes an address-generation stage and an address-generation-bypass-determination unit (ABDU). The ABDU receives a load/store instruction. If an effective address for the load/store instruction is not known at the ABDU, the ABDU routes the load/store instruction via the address-generation stage of the processor. If, however, the effective address of the load/store instruction is known at the ABDU, the ABDU routes the load/store instruction to bypass the address-generation stage of the processor.Type: GrantFiled: August 21, 2018Date of Patent: June 1, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Andrej Kocev, Jay Fleischman, Kai Troester, Johnny C. Chu, Tim J. Wilkens, Neil Marketkar, Michael W. Long
-
Patent number: 11010164Abstract: Predicting a Table of Contents (TOC) pointer value responsive to branching to a subroutine. A subroutine is called from a calling module executing on a processor. Based on calling the subroutine, a value of a pointer to a reference data structure, such as a TOC, is predicted. The predicting is performed prior to executing a sequence of one or more instructions in the subroutine to compute the value. The value that is predicted is used to access the reference data structure to obtain a variable value for a variable of the subroutine.Type: GrantFiled: October 2, 2019Date of Patent: May 18, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura
-
Patent number: 10969995Abstract: Systems and method are disclosed for monitoring processor performance. Embodiments described relate to differentiating function performance by input parameters. In one embodiment, a method includes configuring a counter contained in a processor to count occurrences of an event in the processor and to overflow upon the count of occurrences reaching a specified value, configuring a precise event based sampling (PEBS) handler circuit to generate and store a PEBS record into a PEBS memory buffer after at least one overflow, the PEBS record containing at least one stack entry read from a stack after the at least one overflow, enabling the PEBS handler circuit to generate and store the PEBS record after the at least one overflow, generating and storing the PEBS record into the PEBS memory buffer after the at least one overflow; and storing contents of the PEBS memory buffer to a PEBS trace file in a memory.Type: GrantFiled: October 16, 2018Date of Patent: April 6, 2021Assignee: Intel CorporationInventors: Ahmad Yasin, Stanislav Bratanov
-
Patent number: 10963389Abstract: An apparatus to facilitate data prefetching is disclosed. The apparatus includes a cache, one or more execution units (EUs) to execute program code, prefetch logic to maintain tracking information of memory instructions in the program code that trigger a cache miss and compiler logic to receive the tracking information, insert one or more pre-fetch instructions in updated program code to prefetch data from a memory for execution of one or more of the memory instructions that triggered a cache miss and download the updated program code for execution by the one or more EUs.Type: GrantFiled: February 11, 2020Date of Patent: March 30, 2021Assignee: Intel CorporationInventors: Vasileios Porpodas, Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen
-
Patent number: 10949205Abstract: A computer system includes a dispatch routing network to dispatch a plurality of instructions, and a processor in signal communication with the dispatch routing network. The processor determines a move instruction from the plurality of instructions to move data produced by an older second instruction, and copies a splice target file (STF) tag from a source register of the move instruction to a destination register of the move instruction without physically copying data in a slice target register and without assigning a new STF tag destination to the move instruction.Type: GrantFiled: December 20, 2018Date of Patent: March 16, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Joshua Bowman, Dung Q. Nguyen, Hung Le, Brian Thompto, Maureen A. Delaney, Cliff Kucharski, Steven J Battle
-
Patent number: 10922159Abstract: A method for performing a data dump includes detecting an error in a segmented application having an address space and a buffer. In response to detecting the error, the method quiesces the address space and copies content of the address space to another location while the address space is quiesced. The method reactivates the address space after the content of the address space is completely copied. The method suspends write access to the buffer and copies content of the buffer to another location while write access to the buffer is suspended. While write access to the buffer is suspended, the method redirects writes intended for the buffer to a temporary storage area, and directs reads intended for the buffer to one of the buffer and the temporary storage area, depending on where valid data is stored. A corresponding system and computer program product are also disclosed.Type: GrantFiled: April 16, 2019Date of Patent: February 16, 2021Assignee: International Business Machines CorporationInventors: Thomas C. Reed, David C. Reed
-
Patent number: 10838733Abstract: A load request to restore a plurality of architected registers is obtained. Based on obtaining the load request, one or more architected registers of the plurality of architected registers are restored. The restoring uses a snapshot that maps architected registers to physical registers to replace one or more physical registers currently assigned to the one or more architected registers with one or more physical registers of the snapshot corresponding to the one or more architected registers.Type: GrantFiled: April 18, 2017Date of Patent: November 17, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Michael K. Gschwind, Valentina Salapura, Chung-Lung K. Shum, Timothy J. Slegel
-
Patent number: 10838725Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.Type: GrantFiled: September 26, 2018Date of Patent: November 17, 2020Assignee: Apple Inc.Inventors: Andrew M. Havlir, Jeffrey T. Brady
-
Patent number: 10824430Abstract: Managing program instruction execution by receiving a first OSC (operand store compare) instruction, the first OSC instruction comprising a first itag and a first instruction address and creating a first OSC table entry according to the first itag and first instruction address. Further, receiving a second OSC instruction, the second OSC instruction comprising a second itag and a second instruction address and creating a second OSC table entry according to the second itag and an itag delta between the first itag and the second itag, then appending the second OSC table entry according to an itag delta between the second itag and a third itag, and providing an itag delta from the second OSC table entry to an instruction sequencing unit (ISU).Type: GrantFiled: April 25, 2019Date of Patent: November 3, 2020Assignee: International Business Machines CorporationInventors: Ehsan Fatehi, Brian W. Thompto
-
Patent number: 10776897Abstract: Embodiments described herein provide an apparatus comprising a processor to configure a plurality of contexts of a command engine to execute a graphics workload comprising a plurality of walkers, allocate, from a pool of execution units of a graphics processor, a subset of execution units to each walker in the plurality of walkers based at least in part on the predetermined number of walkers configured for the context, for each context in the plurality of contexts, dispatch one or more walkers of the plurality of walkers to the execution units, and upon dispatch of the one or more walkers of the plurality of walkers, write an opcode to a computer-readable memory indicating that the dispatch of the walker is complete, wherein the opcode comprises dependency data for the one or more walkers of the plurality of walkers. Other embodiments may be described and claimed.Type: GrantFiled: March 8, 2019Date of Patent: September 15, 2020Assignee: INTEL CORPORATIONInventors: James Valerio, Vasanth Ranganathan, Joydeep Ray, Abhishek R. Appu, Ben J. Ashbaugh, Brandon Fliflet, Jeffery S. Boles, Srinivasan Embar Raghukrishnan, Rahul Kulkarni
-
Patent number: 10776125Abstract: In an embodiment, at least one CPU processor and at least one coprocessor are included in a system. The CPU processor may issue operations to the coprocessor to perform, including load/store operations. The CPU processor may generate the addresses that are accessed by the coprocessor load/store operations, as well as executing its own CPU load/store operations. The CPU processor may include a memory ordering table configured to track at least one memory region within which there are outstanding coprocessor load/store memory operations that have not yet completed. The CPU processor may delay CPU load/store operations until the outstanding coprocessor load/store operations are complete. In this fashion, the proper ordering of CPU load/store operations and coprocessor load/store operations may be maintained.Type: GrantFiled: December 5, 2018Date of Patent: September 15, 2020Assignee: Apple Inc.Inventors: Aditya Kesiraju, Brett S. Feero, Nikhil Gupta
-
Patent number: 10762458Abstract: Systems and methods are provided for scheduling objects having pair-wise and cumulative constraints. The systems and methods presented can utilize a directed acyclic graph to increase or maximize a utilization function. The objects can comprise satellites in a constellation of satellites. In some implementations, the satellites are imaging satellites, and the systems and methods for scheduling can use human collaboration to determine events of interest for acquisition of images. In some implementations, dominant edges are removed from the directed acyclic graph. In some implementations, dynamic weights are assigned to nodes associated with downlink events in the directed acyclic graph.Type: GrantFiled: October 24, 2014Date of Patent: September 1, 2020Assignee: Planet Labs, Inc.Inventor: Sean Augenstein
-
Patent number: 10761594Abstract: In an embodiment, a processor includes a first core and a power management agent (PMA), coupled to the first core, to include a static table that stores a list of operations, and a plurality of columns each to specify a corresponding flow that includes a corresponding subset of the operations. Execution of each flow is associated with a corresponding state of the first core. The PMA includes a control register (CR) that includes a plurality of storage elements to receive one of a first value and a second value. The processor includes execution logic, responsive to a command to place the first core into a first state, to execute an operation of a first flow when a corresponding storage element stores the first value and to refrain from execution of an operation of the first flow when the corresponding element stores the second value. Other embodiments are described and claimed.Type: GrantFiled: June 15, 2017Date of Patent: September 1, 2020Assignee: Intel CorporationInventors: Israel Diamand, Asaf Rubinstein, Arik Gihon, Tal Kuzi, Tomer Ziv, Nadav Shulman
-
Patent number: 10761854Abstract: Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor including receiving a load instruction in a load reorder queue, wherein the load instruction is an instruction to load data from a memory location; subsequent to receiving the load instruction, receiving a store instruction in a store reorder queue, wherein the store instruction is an instruction to store data in the memory location; determining that the store instruction causes a hazard against the load instruction; preventing a flush of the load reorder queue based on a state of the load instruction; and re-executing the load instruction.Type: GrantFiled: April 19, 2016Date of Patent: September 1, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Robert A. Cordes, David A. Hrusecky, Elizabeth A. McGlone
-
Patent number: 10747539Abstract: Systems, apparatuses, and methods for instruction next fetch prediction. A scan-on-fill target predictor in a processor generates a predicted next fetch address for the instruction fetch unit. When a group of instructions is used to fill an instruction cache but is not currently being retrieved from the instruction cache for processing by other pipeline stages, the group of instructions are scanned to identify exit points of basic blocks within the group. An entry of a table in the scan-on-fill target predictor is allocated for an instruction in a basic block in the group when the basic block has an exit point with a target address that can be resolved within a single clock cycle. The scan-on-fill target predictor may perform a lookup of the table with the current fetch address. The prediction may be compared to a main branch predictor at a later pipeline stage for training purposes.Type: GrantFiled: November 14, 2016Date of Patent: August 18, 2020Assignee: Apple Inc.Inventors: James Robert Howard Hakewill, Constantin Pistol
-
Patent number: 10740269Abstract: Arbitration circuitry is provided for allocating up to M resources to N requesters, where M?2. The arbitration circuitry comprises group allocation circuitry to control a group allocation in which the N requesters are allocated to M groups of requesters, with each requester allocated to one of the groups; and M arbiters each corresponding to a respective one of the M groups. Each arbiter selects a winning requester from the corresponding group, which is to be allocated a corresponding resource of the M resources. In response to a given requester being selected as the winning requester by the arbiter for a given group, the group allocation is changed so that in a subsequent arbitration cycle the given requester is in a different group to the given group.Type: GrantFiled: July 17, 2018Date of Patent: August 11, 2020Assignee: ARM LimitedInventor: Andrew David Tune
-
Patent number: 10740107Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and an instruction sequencing unit. Operation of such a multi-slice processor includes: receiving, at the instruction sequencing unit, a load instruction indicating load address data and a load data length; determining a previous store instruction in an issue queue such that store address data for the previous store instruction corresponds to the load address data, wherein the previous store instruction corresponds to a store data length; and generating, in dependence upon the store data length matching the load data length, an indication in the issue queue that indicates a dependency between the load instruction and the previous store instruction.Type: GrantFiled: June 1, 2016Date of Patent: August 11, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Salma Ayub, Joshua W. Bowman, Jeffrey C. Brownscheidle, Kurt A. Feiste, Dung Q. Nguyen, Salim A. Shah, Brian W. Thompto
-
Patent number: 10684834Abstract: Embodiments of the present invention disclose a method and an apparatus for detecting inter-instruction data dependency. The method comprises: comparing a thread number corresponding to a historical access operation with a thread number corresponding to a write access operation, if the thread number corresponding to the write access operation is less than the thread number corresponding to the historical access operation, which indicates existence of data dependency for a to-be-detected instruction, terminating the detection; or comparing a thread number corresponding to a historical write access operation with a thread number corresponding to a read access operation, if the thread number corresponding to the read access operation is less than the thread number corresponding to the historical write access operation, which indicates existence of data dependency for the to-be-detected instruction, terminating the detection.Type: GrantFiled: March 21, 2019Date of Patent: June 16, 2020Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Hongyuan Liu, Cho-Li Wang, KingTin Lam, Huanxin Lin, Bin Zhang, Junchao Ma
-
Patent number: 10684859Abstract: Providing memory dependence prediction in block-atomic dataflow architectures is provided, in one aspect, la a memory dependence prediction circuit. The memory dependence prediction circuit comprises a predictor table configured to store multiple predictor table entries, each comprising a store instruction identifier, a block reach set, and a load set. Using this data, the memory dependence prediction circuit determines, upon a fetch of an instruction block by an execution pipeline, whether the instruction block contains store instructions that reach dependent load instructions. If so, the store instructions are marked as having dependent load instructions to wake. In some aspects, the memory dependence prediction circuit is configured to determine whether the instruction block contains dependent load instructions reached by store instructions. If so, the memory dependence prediction circuit delays execution of the dependent load instructions.Type: GrantFiled: September 19, 2016Date of Patent: June 16, 2020Assignee: QUALCOMM IncorporatedInventors: Chen-Han Ho, Gregory Michael Wright