Patents Examined by Jyoti Mehta
  • Patent number: 9658877
    Abstract: The disclosure relates generally to techniques, methods and apparatus for controlling context switching at a central processing unit. Alternatively, methods and apparatus are provided for providing security to memory blocks. Alternatively, methods and apparatus are provided for enabling transactional processing using a multi-core device.
    Type: Grant
    Filed: August 23, 2010
    Date of Patent: May 23, 2017
    Assignee: EMPIRE TECHNOLOGY DEVELOPMENT LLC
    Inventor: James Barwick
  • Patent number: 9645949
    Abstract: Embodiments of the invention relate to a data processing apparatus including a processor adapted to operate under control of an executable comprising instructions, and in any of a plurality of operating modes including a non-privileged mode and a privileged mode, the apparatus comprising: means for storing a plurality of stacks; a first stack pointer register for storing a pointer to an address in a first of said stacks; a second stack pointer register for storing a pointer to an address in a second of said stacks, wherein said processing apparatus is adapted to use said second stack pointer when said processor is operating in either the non-privileged mode or the privileged mode; and means for transferring operation of said processor from the non-privileged mode to the privileged mode in response to at least one of said instructions. Embodiments of the invention also relate to a method of operating a data processing apparatus.
    Type: Grant
    Filed: May 27, 2009
    Date of Patent: May 9, 2017
    Assignee: Cambridge Consultants Ltd.
    Inventors: Alistair G. Morfey, Karl Leighton Swepson, Peter Giles Lloyd
  • Patent number: 9645866
    Abstract: This disclosure describes communication techniques that may be used within a multiple-processor computing platform. The techniques may, in some examples, provide software interfaces that may be used to support message passing within a multiple-processor computing platform that initiates tasks using command queues. The techniques may, in additional examples, provide software interfaces that may be used for shared memory inter-processor communication within a multiple-processor computing platform. In further examples, the techniques may provide a graphics processing unit (GPU) that includes hardware for supporting message passing and/or shared memory communication between the GPU and a host CPU.
    Type: Grant
    Filed: September 16, 2011
    Date of Patent: May 9, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei V. Bourd, Colin Christopher Sharp, David Rigel Garcia Garcia, Chihong Zhang
  • Patent number: 9632778
    Abstract: Embodiments relate to packed loading and storing of data. An aspect includes a system for packed loading and storing of distributed data. The system includes memory and a processing element configured to communicate with the memory. The processing element is configured to perform a method including fetching and decoding an instruction for execution by the processing element. A plurality of individually addressable data elements is gathered from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The processing element packs and loads the data elements into register file elements of a register file entry based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.
    Type: Grant
    Filed: August 8, 2012
    Date of Patent: April 25, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener
  • Patent number: 9632777
    Abstract: Embodiments relate to packed loading and storing of data. An aspect includes a method for packed loading and storing of data distributed in a system that includes memory and a processing element. The method includes fetching and decoding an instruction for execution by the processing element. The processing element gathers a plurality of individually addressable data elements from non-contiguous locations in the memory which are narrower than a nominal width of register file elements in the processing element based on the instruction. The data elements are packed and loaded into register file elements of a register file entry by the processing element based on the instruction, such that at least two of the data elements gathered from the non-contiguous locations in the memory are packed and loaded into a single register file element of the register file entry.
    Type: Grant
    Filed: August 3, 2012
    Date of Patent: April 25, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bruce M. Fleischer, Thomas W. Fox, Hans M. Jacobson, Jaime H. Moreno, Ravi Nair, Daniel A. Prener
  • Patent number: 9619228
    Abstract: The data processor includes CPU operable to execute an instruction included in an instruction set. The instruction set includes a load instruction for reading data on a memory space. The data read according to the load instruction includes data of a format type having a data-read-branching-occurrence bit region. The CPU includes a data-read-branching control register; a data-read-branching address register; and a read-data-analyzing unit. On condition that a bit value showing the occurrence of data read branching has been set on the data-read-branching-occurrence bit region, and a value showing the data-read-branching-occurrence bit remaining valid has been set on the data-read-branching control register, the switching between processes is performed by branching to an address stored in the data-read-branching address register.
    Type: Grant
    Filed: March 28, 2011
    Date of Patent: April 11, 2017
    Assignee: Renesas Electronics Corporation
    Inventors: Takafumi Yuasa, Hiroaki Nakata, Motoki Kimura, Kazushi Akie
  • Patent number: 9612844
    Abstract: A method and apparatus are provided for executing instructions of a multi-threaded processor having multiple hardware threads (32, 34) with differing hardware resources comprising the steps of receiving a plurality of streams of instructions (38, 44) and determining which hardware threads are able to receive instructions for execution (40, 46), determining whether a thread determined to be available for executing an instructions has the hardware resources available required by that instructions (36) and executing the instruction in dependence on the result of the determination (50).
    Type: Grant
    Filed: January 18, 2010
    Date of Patent: April 4, 2017
    Assignee: Imagination Technologies Limited
    Inventor: Andrew Webber
  • Patent number: 9606797
    Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 28, 2017
    Assignee: Intel Corporation
    Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
  • Patent number: 9600289
    Abstract: Methods and processors for managing load-store dependencies in an out-of-order instruction pipeline. A load store dependency predictor includes a table for storing entries for load-store pairs that have been found to be dependent and execute out of order. Each entry in the table includes hashed values to identify load and store operations. When a load or store operation is detected, the PC and an architectural register number are used to create a hashed value that can be used to uniquely identify the operation. Then, the load store dependency predictor table is searched for any matching entries with the same hashed value.
    Type: Grant
    Filed: May 30, 2012
    Date of Patent: March 21, 2017
    Assignee: Apple Inc.
    Inventors: Stephan G. Meier, John H. Mylius, Gerard R. Williams, III, Suparn Vats
  • Patent number: 9582276
    Abstract: Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: February 28, 2017
    Assignee: Apple Inc.
    Inventors: Stephan G. Meier, Gerard R. Williams, III
  • Patent number: 9582284
    Abstract: A method utilizes information provided by performance monitoring hardware to dynamically adjust the number of levels of speculative branch predictions allowed (typically 3 or 4 per thread) for a processor core. The information includes cycles-per-instruction (CPI) for the processor core and number of memory accesses per unit time. If the CPI is below a CPI threshold; and the number of memory accesses (NMA) per unit time is above a prescribed threshold, the number of levels of speculative branch predictions is reduced per thread for the processor core. Likewise, the number of levels of speculative branch predictions could be increased, from a low level to maximum allowed, if the CPI threshold is exceeded or the number of memory accesses per unit time is below the prescribed threshold.
    Type: Grant
    Filed: December 1, 2011
    Date of Patent: February 28, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert H. Bell, Jr., Wen-Tzer T. Chen
  • Patent number: 9582282
    Abstract: A data processing apparatus has prefetch circuitry for prefetching cache lines of instructions into an instruction cache. A prefetch lookup table is provided for storing prefetch entries, with each entry corresponding to a region of a memory address space and identifying at least one block of one or more cache lines within the corresponding region from which processing circuitry accessed an instruction on a previous occasion. When the processing circuitry executes an instruction from a new region, the prefetch circuitry looks up the table, and if it stores a prefetch entry for the new region, then the at least one block identified by the corresponding entry is prefetched into the cache.
    Type: Grant
    Filed: July 17, 2014
    Date of Patent: February 28, 2017
    Assignee: ARM Limited
    Inventors: Mitchell Bryan Hayenga, Christopher Daniel Emmons
  • Patent number: 9569272
    Abstract: A method and device for digital data processing based on a data flow processing model is suitable for the execution, in a distributed manner on multiple calculation nodes, of multiple data processing operations modelled by directed graphs, where two different processing operations include at least one common calculation node. The device includes an identification processor configured to, from a valued directed multi-graph made up of the union of several distinct processing graphs and divided into several valued directed sub-multi-graphs, called chunks, and whose input and output nodes are buffer memory nodes of the multi-graph, identify a coordination module for each chunk. Furthermore each identified coordination module is configured to synchronize portions of processing operations that are to be executed in the chunk with which the respective coordination module is associated, independently of portions of processing operations that are to be executed in other chunks.
    Type: Grant
    Filed: July 9, 2010
    Date of Patent: February 14, 2017
    Assignee: Commissariat a l'energie atomique et aux alternatives
    Inventor: Yvain Thonnart
  • Patent number: 9552206
    Abstract: Traditionally, providing parallel processing within a multi-core system has been very difficult. Here, however, a system is provided where serial source code is automatically converted into parallel source code, and a processing cluster is reconfigured “on the fly” to accommodate the parallelized code based on an allocation of memory and compute resources. Thus, the processing cluster and its corresponding system programming tool provide a system that can perform parallel processing from a serial program that is transparent to a user. Generally, a control node connected to the address and data leads of a host processor uses messages to control the processing of data in a processing cluster. The cluster includes nodes of parallel processors, shared function memory, a global load/store, and hardware accelerators all connected to the control node by message busses. A crossbar data interconnect routes data to the cluster circuits separate from the message busses.
    Type: Grant
    Filed: September 14, 2011
    Date of Patent: January 24, 2017
    Assignee: Texas Instruments Incorporated
    Inventors: William M. Johnson, Murali S. Chinnakonda, Jeffrey L. Nye, Toshio Nagata, John W. Glotzbach, Hamid R. Sheikh, Ajay Jayaraj, Stephen Busch, Shalini Gupta, Robert J.P. Nychka, David H. Bartley, Ganesh Sundararajan
  • Patent number: 9542191
    Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
  • Patent number: 9529596
    Abstract: In accordance with embodiments disclosed herein, there are provided methods, systems, and apparatuses for scheduling instructions in a multi-strand out-of-order processor. For example, an apparatus for scheduling instructions in a multi-strand out-of-order processor includes an out-of-order instruction fetch unit to retrieve a plurality of interdependent instructions for execution from a multi-strand representation of a sequential program listing; an instruction scheduling unit to schedule the execution of the plurality of interdependent instructions based at least in part on operand synchronization bits encoded within each of the plurality of interdependent instructions; and a plurality of execution units to execute at least a subset of the plurality of interdependent instructions in parallel.
    Type: Grant
    Filed: July 1, 2011
    Date of Patent: December 27, 2016
    Assignee: Intel Corporation
    Inventors: Boris A. Babayan, Vladimir M. Pentkovski, Alexander V. Butuzov, Sergey Y. Shishlov, Alexey Y. Sivtsov, Nikolay E. Kosarev
  • Patent number: 9514094
    Abstract: There is provided a method for processing multiple sets of data concurrently in a statically scheduled pipelined stream processor by allowing a data set to enter the pipeline while another data set is being processed. Dedicated logic units enable independent control of each of the data sets being processed.
    Type: Grant
    Filed: July 10, 2012
    Date of Patent: December 6, 2016
    Assignee: MAXELER TECHNOLOGIES LTD
    Inventors: Oliver Pell, Itay Greenspon, James Barry Spooner, Robert Gwilym Dimond, Jacob Bower, Richard Berry
  • Patent number: 9495157
    Abstract: Embodiments relate to fingerprint-based branch prediction. An aspect includes based on encountering a branch instruction during execution of software on a processor of a computer system, determining a fingerprint of the software, the fingerprint comprising a representation of a sequence of behavior that occurs in the processor while the software is executing. Another aspect includes based on determining that a match for the fingerprint and the branch instruction is located in an entry in the prediction table: predicting the branch instruction according to the associated prediction field. Another aspect includes based on determining that no match for the fingerprint and the branch instruction are located in an entry in the prediction table: creating a new entry in the prediction table for the fingerprint and the branch instruction.
    Type: Grant
    Filed: December 7, 2015
    Date of Patent: November 15, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Giles R. Frazier, Michael Karl Gschwind, Christian Jacobi, Anthony Saporito, Chung-Lung K. Shum
  • Patent number: 9483243
    Abstract: A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by elements of earlier and a later vector instructions, one being a write instruction. An element indicating the next data access for each of the instructions is determined. The next data accesses for the earlier and the later instructions may be reordered. The next data access of the earlier instruction is selected if the position of the earlier instruction's next data element is less than or equal to the position of the later instruction's next data element minus a predetermined value. The next data access of the later instruction may be selected if the position of the earlier instruction's next data element is higher than the position of the later instruction's next data element minus a predetermined value. Thus data accesses from earlier and later instructions are partially interleaved.
    Type: Grant
    Filed: March 23, 2015
    Date of Patent: November 1, 2016
    Assignee: ARM Limited
    Inventor: Alastair David Reid
  • Patent number: 9430419
    Abstract: A data processing apparatus is provided with a plurality of processing units executing respective streams of program instructions corresponding to respective processing threads. Exception control circuitry controls exception processing for a group of the processing units in response to an exception triggering event. Each of the processing units moves only once and in sequence between normal, in-exception, and done-exception states in response to a given exception event. A group of processing units moves in sequence between states normal, triggering, and completing in response to the exception event. A counter value is used to track the number of processing units which have entered exception processing and then to track the number of processing units which have completed their exception processing.
    Type: Grant
    Filed: October 13, 2011
    Date of Patent: August 30, 2016
    Assignee: ARM Limited
    Inventors: Simon Jones, Joe Dominic Michael Tapply