Patents Examined by Daniel H. Pan
  • Patent number: 10402201
    Abstract: A method and apparatus for detecting potential memory conflicts in a parallel computing environment by executing two parallel program threads. The parallel program threads include special operands that are used by a processing core to identify memory addresses that have the potential for conflict. These memory addresses are combined into a composite access record for each thread. The composite access records are compared to each other in order to detect a potential memory conflict.
    Type: Grant
    Filed: March 9, 2017
    Date of Patent: September 3, 2019
    Inventors: Joel Kevin Jones, Ananth Jasty
  • Patent number: 10394558
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit includes receiving, by a load-store unit (LSU) of the processing unit, an OoO window of instructions including a plurality of instructions to be executed OoO, and issuing, by the LSU, instructions from the OoO window. The issuing includes selecting an instruction from the OoO window, the instruction using an effective address. Further, in response to the instruction being a load instruction, it is determined whether the effective address is present in an effective address directory (EAD). In response to the effective address being present in the EAD, the load instruction is issued using the effective address. Further, in response to the instruction being a store instruction, a real address mapped to the effective address is determined from an effective-real translation (ERT) table, and the store instruction is issued using the real address.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: August 27, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher Gonzalez, Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10394729
    Abstract: A system for executing a data flow graph comprises: at least two first actors each comprising means for independently executing a computation of a same data set comprising at least one datum, and producing a quality descriptor of the data set, the execution of the computation by each of at least two first actors being triggered by a synchronization system; a third actor, comprising means for triggering the execution of the computation by each of at least two first actors, and initializing a clock configured to emit an interrupt signal when a duration has elapsed; a fourth actor, comprising means for executing, at the latest at the interrupt signal from the clock: the selection, from the set of at least two first actors having produced a quality descriptor, of the one whose descriptor exhibits the most favorable value; the transfer of the data set computed by the selected actor.
    Type: Grant
    Filed: September 22, 2015
    Date of Patent: August 27, 2019
    Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventors: Paul Dubrulle, Thierry Goubier, Stéphane Louise
  • Patent number: 10387153
    Abstract: Techniques for synchronizing a set of code branches are disclosed. A synchronization process is triggered by an event and/or a schedule. The synchronization process includes traversing each code branch, such that parent branches of a particular branch are “in sync” prior to being merged into the particular branch. In an embodiment, a hierarchical order for a set of branches is determined. The branch represented by the top node of the hierarchical order does not have any parents. A branch that is a child of the branch represented by the top node is in the second level of the hierarchical order. The branch in the second level is updated by incorporating the current state of the branch represented by the top node. Thereafter, each branch is iteratively updated by incorporating the current state of the branch's parent branch. Hence, changes to any parent branch are propagated through all its descendant branches.
    Type: Grant
    Filed: November 27, 2017
    Date of Patent: August 20, 2019
    Assignee: Oracle International Corporation
    Inventors: Maurizio Cimadamore, Brian Goetz
  • Patent number: 10387161
    Abstract: Techniques are disclosed for implementing an extensible, light-weight, flexible (ELF) processing platform that can efficiently capture state information from multiple threads during execution of instructions (e.g., an instance of a game). The ELF processing platform supports execution of multiple threads in a single process for parallel execution of multiple instances of the same or different program code or games. Upon capturing the state information, one or more threads may be executed in the ELF platform to compute one or more actions to perform at any state of execution by each of those threads. The threads can easily access the state information from a shared memory space and use the state information to implement rule-based and/or learning-based techniques for determining subsequent actions for execution for the threads.
    Type: Grant
    Filed: September 1, 2017
    Date of Patent: August 20, 2019
    Assignee: Facebook, Inc.
    Inventors: Yuandong Tian, Qucheng Gong, Yuxin Wu
  • Patent number: 10379864
    Abstract: In an embodiment, a processor comprises a prefetch history array and a prefetch circuit. The prefetch history array comprises a plurality of entries corresponding to prefetch addresses, each entry of the plurality of entries comprising a sublength value associated with a frequency that a stride is repeated. The prefetch circuit is to: for each entry of the plurality of entries, adjust the sublength value based on stride matches for an address of the entry; adjust a short stream counter based on the sublength values of the plurality of entries in the prefetch history array; determine whether the short stream counter has exceeded a throttling threshold; and in response to a determination that the short stream counter has exceeded the throttling threshold, throttle a prefetch level of the prefetch circuit. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 26, 2016
    Date of Patent: August 13, 2019
    Assignee: Intel Corporation
    Inventors: Chunhui Zhang, Seth H. Pugsley, Mark J. Dechene
  • Patent number: 10379855
    Abstract: A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate a packed data register of the plurality of packed data registers that is to store a source packed memory address information. The source packed memory address information is to include a plurality of memory address information data elements. An execution unit is coupled with the decode unit and the plurality of packed data registers, the execution unit, in response to the instruction, is to load a plurality of data elements from a plurality of memory addresses that are each to correspond to a different one of the plurality of memory address information data elements, and store the plurality of loaded data elements in a destination storage location. The destination storage location does not include a register of the plurality of packed data registers.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: August 13, 2019
    Assignee: Intel Corporation
    Inventors: William C. Hasenplaugh, Chris J. Newburn, Simon C. Steely, Jr., Samantika S. Sury
  • Patent number: 10379869
    Abstract: There are provided a system, a method and a computer program product for selecting an active data stream (a lane) while running SPMD (Single Program Multiple Data) code on SIMD (Single Instruction Multiple Data) machine. The machine runs an instruction stream over input data streams. The machine increments lane depth counters of all active lanes upon the thread-PC reaching a branch operation. The machine updates the lane-PC of each active lane according to targets of the branch operation. The machine selects an active lane and activates only lanes whose lane-PCs match the thread-PC. The machine decrements the lane depth counters of the selected active lanes and updates the lane-PC of each active lane upon the instruction stream reaching a first instruction. The machine assigns the lane-PC of a lane with a largest lane depth counter value to the thread-PC and activates all lanes whose lane-PCs match the thread-PC.
    Type: Grant
    Filed: February 7, 2018
    Date of Patent: August 13, 2019
    Assignee: International Business Machines Corporation
    Inventors: Gheorghe Almasi, Jose Moreira, Jessica H. Tseng, Peng Wu
  • Patent number: 10372450
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described.
    Type: Grant
    Filed: July 11, 2017
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Victor W. Lee, Daehyun Kim, Tin-Fook Ngai, Jayashankar Bharadwaj, Albert Hartono, Sara Baghsorkhi, Nalini Vasudevan
  • Patent number: 10372668
    Abstract: Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: August 6, 2019
    Assignee: Intel Corporation
    Inventors: Chang Yong Kang, Pierre Laurent, Hari K. Tadepalli, Prasad M. Ghatigar, T.J. O'Dwyer, Serge Zhilyaev
  • Patent number: 10365927
    Abstract: Embodiments relate to non-default instruction handling within a transaction. An aspect includes entering a transaction, the transaction comprising a first plurality of instructions and a second plurality of instructions, wherein a default manner of handling of instructions in the transaction is one of atomic and non-atomic. Another aspect includes encountering a non-default specification instruction in the transaction, wherein the non-default specification instruction comprises a single instruction that specifies the second plurality of instructions of the transaction for handling in a non-default manner comprising one of atomic and non-atomic, wherein the non-default manner is different from the default manner. Another aspect includes handling the first plurality of instructions in the default manner. Yet another aspect includes handling the second plurality of instructions in the non-default manner.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: July 30, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Maged M. Michael, Eric M. Schwarz, Valentina Salapura, Chung-Lung K. Shum
  • Patent number: 10365929
    Abstract: A Spin Loop Delay instruction. The instruction has a field associated therewith that indicates one or more conditions to be checked. Dispatching of the instruction is initially delayed. The instruction is subsequently dispatched based on a timeout, provided the instruction has not been previously dispatched based on meeting at least one condition of the one or more conditions to be checked.
    Type: Grant
    Filed: November 13, 2017
    Date of Patent: July 30, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Fadi Y. Busaba, Christian Jacobi, Anthony Saporito, Eric M. Schwarz, Timothy J. Slegel
  • Patent number: 10360037
    Abstract: A fetch unit configured to, in response to detecting a subroutine call and link instruction, calculate and store a predicted target address for the corresponding subroutine return instruction in a prediction stack, and if certain conditions are met, also cause to be stored in the prediction stack a predicted target instruction bundle. The fetch unit is also configured to, in response to detecting a subroutine return instruction, use the predicted target address in the prediction stack to determine the address of the next instruction bundle to be fetched, and if certain conditions are met, cause any valid predicted target instruction bundle in the prediction stack to be the next bundle to be decoded.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: July 23, 2019
    Assignee: MIPS Tech, LLC
    Inventor: Philip Day
  • Patent number: 10353681
    Abstract: A method for improving performance of an access triggered architecture for a computer implemented application is provided. The method first executes typical operations of the access triggered architecture according to an execution time, wherein the typical operations comprise: obtaining a dataset and an instruction set; and using the instruction set to transmit the dataset to a functional block associated with an operation, wherein the functional block performs the operation using the dataset to generate a revised dataset. The method further creates a pipeline of the typical operations to reduce the execution time of the typical operations, to create a reduced execution time; and executes the typical operations according to the reduced execution time, using the pipeline.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: July 16, 2019
    Assignee: HONEYWELL INTERNATIONAL INC.
    Inventors: Thom Kreider, Jon Douglas Gilreath, Gary Warnica, Paul D. Kammann, Vince J. Gavagan, IV, Ronald E. Strong
  • Patent number: 10353736
    Abstract: Associating working sets and threads is disclosed. An indication of a stalling event is received. In response to receiving the indication of the stalling event, a state of a processor associated with the stalling event is saved. At least one of an identifier of a guest thread running in the processor and a guest physical address referenced by the processor is obtained from the saved processor state.
    Type: Grant
    Filed: August 25, 2017
    Date of Patent: July 16, 2019
    Assignee: TidalScale, Inc.
    Inventors: Isaac R. Nassi, Kleoni Ioannidou, David P. Reed, I-Chun Fang, Michael Berman, Mark Hill, Brian Moffet
  • Patent number: 10339094
    Abstract: A computer processor is disclosed. The computer processor comprises one or more processor resources. The computer processor further comprises a plurality of hardware thread units coupled to the one or more processor resources. The computer processor may be configured to permit simultaneous access to the one or more processor resources by only a subset of hardware thread units of the plurality of hardware thread units. The number of hardware threads in the subset may be less than the total number of hardware threads of the plurality of hardware thread units.
    Type: Grant
    Filed: May 19, 2015
    Date of Patent: July 2, 2019
    Assignee: OPTIMUM SEMICONDUCTOR TECHNOLOGIES, INC.
    Inventors: Mayan Moudgill, Gary J. Nacer, C. John Glossner, Arthur Joseph Hoane, Paul Hurtley, Murugappan Senthilvelan, Pablo Balzola, Vitaly Kalashnikov, Sitij Agrawal
  • Patent number: 10339057
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: July 2, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak
  • Patent number: 10331447
    Abstract: Providing efficient recursion handling using compressed return address stacks (CRASs) in processor-based systems is disclosed. In one aspect, a processor-based system provides a branch prediction circuit including a CRAS. Each CRAS entry within the CRAS includes an address field and a counter field. When a call instruction is encountered, a return address of the call instruction is compared to the address field of a top CRAS entry indicated by a CRAS top-of-stack (TOS) index. If the return address matches the top CRAS entry, the counter field of the top CRAS entry is incremented instead of adding a new CRAS entry for the return address. When a return instruction is subsequently encountered in the instruction stream, the counter field of the top CRAS entry is decremented if its value is greater than zero (0), or, if not, the top CRAS entry is removed from the CRAS.
    Type: Grant
    Filed: August 30, 2017
    Date of Patent: June 25, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Vignyan Reddy Kothinti Naresh, Anil Krishna
  • Patent number: 10318303
    Abstract: A method and apparatus for performing branch prediction is disclosed. A branch predictor includes a history buffer configured to store a branch history table indicative of a history of a plurality of previously fetched branch instructions. The branch predictor also includes a branch target cache (BTC) configured to store branch target addresses for fetch addresses that have been identified as including branch instructions but have not yet been predicted. A hash circuit is configured to form a hash of a fetch address, history information received from the history buffer, and hit information received from the BTC, wherein the fetch address includes a branch instruction. A branch prediction unit (BPU) configured to generate a branch prediction for the branch instruction included in the fetch address based on the hash formed from the fetch address, history information, and BTC hit information.
    Type: Grant
    Filed: March 28, 2017
    Date of Patent: June 11, 2019
    Assignee: Oracle International Corporation
    Inventors: Manish Shah, Jared Smolens
  • Patent number: 10318433
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template register independently specifies a linear address or a circular address mode for each of the nested loops.
    Type: Grant
    Filed: December 20, 2016
    Date of Patent: June 11, 2019
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventor: Joseph Zbiciak