Processing Control Patents (Class 712/220)
  • Publication number: 20130297915
    Abstract: In one embodiment, a processor includes an instruction decoder to receive and decode an instruction having a prefix and an opcode, an execution unit to execute the instruction based on the opcode, and flag modification override logic to prevent the execution unit from modifying a flag register of the processor based on the prefix of the instruction.
    Type: Application
    Filed: November 14, 2011
    Publication date: November 7, 2013
    Inventors: Jonathan D. Combs, Jason W. Brandt, Robert Valentine
  • Publication number: 20130297914
    Abstract: A method, system and computer program product are disclosed for maintaining data coherence, for use in a multi-node processing system where each of the nodes includes one or more components. In one embodiment, the method comprises establishing a data domain, assigning a group of the components to the data domain, sending a coherence message from a first component of the processing system to a second component of the processing system, and determining if that second component is assigned to the data domain. In this embodiment, if that second component is assigned to the data domain, the coherence message is transferred to all of the components assigned to the data domain to maintain data coherency among those components. In an embodiment, if that second component is assigned to the data domain, the first component is assigned to the data domain.
    Type: Application
    Filed: July 8, 2013
    Publication date: November 7, 2013
    Inventors: Kattamuri Ekanadham, Il Park, Pratap Pattnaik
  • Publication number: 20130290683
    Abstract: Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. in this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.
    Type: Application
    Filed: October 19, 2012
    Publication date: October 31, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine
  • Patent number: 8572624
    Abstract: A system, method and computer program product for providing multiple quiesce state machines. The system includes a first controller including logic for processing a first quiesce request. The system also includes a second controller including logic for processing a second quiesce request. All or a portion of the processing of the second quiesce request overlaps in time with the processing of the first quiesce request. Thus, multiple quiesce requests may be active in the system at the same time.
    Type: Grant
    Filed: February 26, 2008
    Date of Patent: October 29, 2013
    Assignee: International Business Machines Corporation
    Inventors: Lisa C. Heller, Norbert Hagspiel, Ute Gaertner, Hanno Ulrich, Rebecca S. Wisniewski
  • Patent number: 8572588
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Grant
    Filed: March 31, 2009
    Date of Patent: October 29, 2013
    Assignee: Nvidia Corporation
    Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
  • Patent number: 8572355
    Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.
    Type: Grant
    Filed: September 13, 2010
    Date of Patent: October 29, 2013
    Assignee: Nvidia Corporation
    Inventors: Guillermo Juan Rozas, Brett W. Coon
  • Publication number: 20130283015
    Abstract: Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.
    Type: Application
    Filed: January 7, 2013
    Publication date: October 24, 2013
    Inventors: Ian A. Buck, Bastiaan Aarts
  • Patent number: 8565811
    Abstract: A radio control board passes a plurality of digital samples between a memory of a computing device and a radio frequency (RF) transceiver coupled to a system bus of the computing device. Processing of the digital samples is carried out by one or more cores of a multi-core processor to implement a software-defined radio.
    Type: Grant
    Filed: August 4, 2009
    Date of Patent: October 22, 2013
    Assignee: Microsoft Corporation
    Inventors: Kun Tan, Jiansong Zhang, Yongguang Zhang
  • Patent number: 8566567
    Abstract: Method, apparatus, and system for monitoring performance within a processing resource, which may be used to modify user-level software. Some embodiments of the invention pertain to an architecture to allow a user to improve software running on a processing resources on a per-thread basis in real-time and without incurring significant processing overhead.
    Type: Grant
    Filed: June 21, 2012
    Date of Patent: October 22, 2013
    Assignee: Intel Corporation
    Inventors: Chris J. Newburn, Robert Knight, Robert Geva, Dion Rodgers, Xiang Zou, Hong Wang, Bryant E. Bigbee, Ittai Anati
  • Publication number: 20130275725
    Abstract: An integrated circuit device comprising at least one digital signal processor (DSP) module, the at least one DSP module comprising a first data register and at least one further data register and at least one data execution unit (DEU) module arranged to execute operations on target data stored within the first data register and the at least one further data register. The at least one DEU module is arranged, upon receipt of a conditional negation instruction, to retrieve at least one conditional bit value from the first data register, and conditionally perform negation of target data within the at least one further data register according to the at least one retrieved conditional bit value.
    Type: Application
    Filed: January 3, 2011
    Publication date: October 17, 2013
    Applicant: Freescale Semiconductor, Inc.
    Inventors: Ilia Moskovich, Fabrice Aidan, Avi Gal, Dmitry Lachover
  • Patent number: 8560814
    Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.
    Type: Grant
    Filed: May 4, 2010
    Date of Patent: October 15, 2013
    Assignee: Oracle International Corporation
    Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
  • Patent number: 8555251
    Abstract: A signal processing apparatus for performing signal processing including a plurality of steps in data units by software signal processing includes signal processing modules performing the steps, a circuit configuration information storing and managing unit storing the signal processing modules and circuit configuration information, a signal processing order determining unit determining a signal processing order by performing path routing, a signal processing executing unit executing the signal processing in the determined order, and a circuit configuration changing unit changing circuit configuration information and causing the signal processing order determining unit to re-execute path routing to determine a signal processing order for the changed circuit configuration information during a period from the end of the software signal processing in the data unit to the beginning of the subsequent data unit.
    Type: Grant
    Filed: March 21, 2006
    Date of Patent: October 8, 2013
    Assignee: Sony Corporation
    Inventor: Kosei Yamashita
  • Patent number: 8555036
    Abstract: A system includes a processor having an instruction register for storing an instruction having a predefined opcode, a predicate register for storing a predicate condition to select an output register for a result of the instruction, a first output register, and a second output register. The processor further includes processor circuitry operable to execute the instruction to produce a result, and processor circuitry operable to store the result of the instruction in the first output register if the predicate condition to select the output is true, and to store the second output register if the predicate condition to select the output is false. A single instruction is used to produce the result, and to store the result of the instruction.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: October 8, 2013
    Assignee: NVIDIA Corporation
    Inventors: Timo Aila, Samuli Laine
  • Patent number: 8554856
    Abstract: A computer system includes one or more devices that are capable of multitasking (performing at least two tasks in parallel or substantially in parallel). In response to detecting that one of the devices is performing a first one of the tasks, the system prevents the devices from performing at least one of the tasks other than the first task (such as all of the tasks other than the first task). In response to detecting that one of the devices is performing a second one of the tasks, the system prevents the devices from performing at least one of the tasks other than the second task (such as all of the tasks other than the second task).
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: October 8, 2013
    Assignee: Yagi Corp.
    Inventor: Robert Plotkin
  • Publication number: 20130262834
    Abstract: A system and method is provided for improving efficiency, power, and bandwidth consumption in parallel processing. Rather than requiring memory polling to ensure ordered execution of processes or threads, the techniques disclosed herein provide a system and method to allow any process or thread to run out of order as long as needed, but ensure ordered execution of multiple ordered instructions when needed. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.
    Type: Application
    Filed: March 29, 2012
    Publication date: October 3, 2013
    Inventors: Laurent Lefebvre, Michael Mantor
  • Publication number: 20130262835
    Abstract: An information processing apparatus generates first and second operation trees representing a dependency relationship among the instructions included in a first code, and computes first and second operation sequences from the first and second operation trees. Then, the information processing apparatus computes the longest ones of operation subsequences common to the first and second operation sequences, evaluates, for each longest operation subsequence, the utilization of computing resources used for executing the combinations of instructions of the first and second operation trees corresponding to the operations included in the longest operation subsequence, and selects a combination pattern of instructions indicated by any one of the longest operation subsequences on the basis of the evaluation results.
    Type: Application
    Filed: March 12, 2013
    Publication date: October 3, 2013
    Applicant: FUJITSU LIMITED
    Inventors: Takashi ARAKAWA, Shuichi Chiba, Toshihiro Konda
  • Publication number: 20130262832
    Abstract: A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gang CHEN, Srinivasa B. Yadavalli
  • Publication number: 20130262819
    Abstract: An apparatus includes a processor to determine an extremum among a series of values that are successively provided to a first register and a second register. The processor is configured to execute a single cycle search instruction, including compare a value in the first register with a value in a first accumulator, and store an extremum of the two values in the first accumulator; and compare a value in the second register with a value in a second accumulator, and store an extremum of the two values in the second accumulator. The processor is configured to execute a single cycle select instruction, including compare the value in the first accumulator with the value in the second accumulator, and store an extremum of the two values in the first accumulator, the extremum stored in the first accumulator representing the extremum of the series of numbers.
    Type: Application
    Filed: April 2, 2012
    Publication date: October 3, 2013
    Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
  • Patent number: 8549258
    Abstract: A configurable processing apparatus includes a plurality of processing units, at least an instruction synchronization control circuit, and at least a configuration memory. Each processing apparatus has a stall-output signal generating circuit to output a stall-output signal, wherein the stall-output signal indicates that an unexpected stall is occurred in the processing unit. The processing unit has a stall-in signal, and an external circuit of the processing unit can control whether the processing unit is stalled according to the stall-in signal. The instruction synchronization control circuit generates the stall-in signals to the processing units in response to a content stored in the configuration memory and the stall-output signals of the processing units, so as to determine operation modes and instruction synchronization of the processing units.
    Type: Grant
    Filed: February 7, 2010
    Date of Patent: October 1, 2013
    Assignee: Industrial Technology Research Institute
    Inventors: Tzu-Fang Lee, Chien-Hong Lin, Jing-Shan Liang, Chi-Lung Wang
  • Patent number: 8549185
    Abstract: A computer program product is provided for performing an input/output (I/O) processing operation at a host computer system. The computer program product is configured to perform: obtaining a transport command word (TCW) at a channel subsystem for an I/O operation, the TCW including an address of a transport command control block (TCCB) having a transport command area (TCA) configured to hold a first plurality of device command words (DCW) and control data associated with respective DCWs, the first plurality of DCWs including a transfer TCA extension (TTE) DCW that specifies a TCA extension, the TCA extension configured to hold one or more DCWs and control data associated with respective DCWs; gathering the TCCB from one or more locations specified in the TCCB address and transferring the TCCB to the control unit; gathering the TCA extension specified by the TTE DCW; and transferring the TCA extension to the control unit.
    Type: Grant
    Filed: June 30, 2011
    Date of Patent: October 1, 2013
    Assignee: International Business Machines Corporation
    Inventors: Susan K. Candelaria, Scott M. Carlson, Daniel F. Casper, John R. Flanagan, Roger G. Hathorn, Matthew J. Kalos, Louis W. Ricci, Dale F. Riedy, Cynthia Sittmann
  • Publication number: 20130254517
    Abstract: An apparatus for processing an invalid operation in a prologue and/or an epilogue of a loop includes a register file including a first region for storing a data validity value indicating whether data is valid or invalid, and a second region for storing the data; and a functional unit configured to determine whether an operation is valid or invalid based on a value of a first region of each of one or more input sources received from the register file, and output a destination including a value based on the value of the first region of each of the input sources
    Type: Application
    Filed: March 15, 2013
    Publication date: September 26, 2013
    Applicants: Seoul National University R&DB Foundation, Samsung Electronics Co., Ltd.
    Inventors: Seong-Hun Jeong, Bernhard Egger, Won-Sub Kim
  • Patent number: 8544031
    Abstract: A system, method and computer program product for managing a plurality of applications in a computer cluster. Each application is able to run on a particular node in the cluster. In one embodiment, associations are maintained among a plurality of modes and the plurality of applications, with each application being associated with at least one mode. Responsive to designation of at least one mode as active for the cluster, each application that is associated with an active mode is flagged as eligible for activation, each inactive application that is not associated with any active mode is flagged as ineligible for activation, and each active application that is not associated with any active mode is flagged as ineligible for activation and inactivated. Flagging as eligible, flagging as ineligible and flagging as ineligible and inactivating may be performed in any order, and inactivating is sequenced according to dependencies among the applications.
    Type: Grant
    Filed: February 22, 2012
    Date of Patent: September 24, 2013
    Assignee: International Business Machines Corporation
    Inventor: Michael P. Clarke
  • Publication number: 20130246755
    Abstract: Embodiments of the invention relate to run-time instrumentation reporting. An instruction stream is executed by a processor. Run-time instrumentation information of the executing instruction stream is captured by the processor. Run-time instrumentation records are created based on the captured run-time instrumentation information. A run-time instrumentation sample point of the executing instruction stream on the processor is detected. A reporting group is stored in a run-time instrumentation program buffer. The storing is based on the detecting and the storing includes: determining a current address of the run-time instrumentation program buffer, the determining based on instruction accessible run-time instrumentation controls; and storing the reporting group into the run-time instrumentation program buffer based on an origin address and the current address of the run-time instrumentation program buffer, the reporting group including the created run-time instrumentation records.
    Type: Application
    Filed: March 16, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mark S. Farrell, Charles W. Gainey, JR., Marcel Mitran, Chung-Lung K. Shum, Brian L. Smith
  • Publication number: 20130246754
    Abstract: Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.
    Type: Application
    Filed: March 16, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Charles W. Gainey, JR., Michael K. Gschwind
  • Publication number: 20130246751
    Abstract: Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
  • Publication number: 20130246756
    Abstract: Disclosed is a hardware protocol stack, where header information of analysis-subjected protocol is stored in a register unit, comparison is made whether information recorded in the header of inputted frame mutually matches header information stored in the register unit, and data is extracted as a result of the comparison.
    Type: Application
    Filed: March 11, 2013
    Publication date: September 19, 2013
    Applicant: LSIS CO., LTD.
    Inventors: Soo Gang LEE, Dae Hyun KWON
  • Publication number: 20130246752
    Abstract: Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
  • Publication number: 20130246753
    Abstract: Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Eric M. Schwarz, Timothy J. Slegel
  • Patent number: 8533436
    Abstract: In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 26, 2009
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Joshua B. Fryman, Edward T. Grochowski, Toni Juan, Andrew Thomas Forsyth, John Mejia, Ramacharan Sundararaman, Eric Sprangle, Roger Espasa, Ravi Rajwar
  • Patent number: 8533437
    Abstract: A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: September 10, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker
  • Patent number: 8533721
    Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Stephen J. Robinson, Deepak Limaye
  • Publication number: 20130232321
    Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 5, 2013
    Inventors: Asaf Hargil, Doron Orenstein
  • Patent number: 8527741
    Abstract: A task matching circuit for synchronizing software on a plurality of processors is disclosed. The task matching circuit includes first and second inputs, an analysis sub-circuit, and an output. The first input is from a first processor configured to receive a first software routine identifier. The second input is from a second processor configured to receive a second software routine identifier. The analysis sub-circuit determines if the first software routine identifier corresponds with the second software routine identifier. The output is coupled to at least one of the first or second processors and indicates when the first and second software routine identifiers do not correspond. One of the first and second processors is delayed until the first and second software routine identifiers correspond.
    Type: Grant
    Filed: July 3, 2006
    Date of Patent: September 3, 2013
    Assignee: ViaSat, Inc.
    Inventors: Albert J. Bourdon, Gary G. Christensen, Michael J. Godfrey
  • Patent number: 8521993
    Abstract: A method and apparatus for providing fairness in a multi-processing element environment is herein described. Mask elements are utilized to associated portions of a reservation station with each processing element, while still allowing common access to another portion of reservation station entries. Additionally, bias logic biases selection of processing elements in a pipeline away from a processing element associated with a blocking stall to provide fair utilization of the pipeline.
    Type: Grant
    Filed: April 9, 2007
    Date of Patent: August 27, 2013
    Assignee: Intel Corporation
    Inventors: Morris Marden, Matthew Merten, Alexandre Farcy, Avinash Sodani, James Hadley, Ilhyun Kim
  • Patent number: 8516226
    Abstract: A method and system for flexible prefetching of data and/or instructions for applications are described. A prefetching mechanism monitors program instructions and tag information associated with the instructions. The tag information is used to determine when a prefetch operation is desirable. The prefetching mechanism then requests data and/or instructions. Furthermore, the prefetching mechanism determines when entry into a different execution phase of an application program occurs, and executes a different prefetching policy based on the application's program instructions and tag information for that execution phase as well as profile information from previous executions of the application in that execution phase.
    Type: Grant
    Filed: January 23, 2006
    Date of Patent: August 20, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Jean-Francois Collard, Norman Paul Jouppi
  • Publication number: 20130212358
    Abstract: A data processing system comprises a processor unit that includes an instruction decode/issue unit including a re-order buffer having entries that include an execution queue tag that indicates an execution queue location of an instruction to which a re-order buffer entry is assigned, a result valid indicator to indicate that a corresponding instruction has executed with a status bit valid result, and a forward indicator to indicate that the status bit can be forwarded to an execution queue of an instruction pointed to that is waiting to receive the status bit.
    Type: Application
    Filed: February 15, 2012
    Publication date: August 15, 2013
    Inventors: Thang M. Tran, Trinh Huy Nguyen
  • Publication number: 20130212361
    Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.
    Type: Application
    Filed: March 15, 2013
    Publication date: August 15, 2013
    Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
  • Publication number: 20130212357
    Abstract: Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Erich James Plondke, Lucian Codrescu, Charles Joseph Tabony, Swaminathan Balasubramanian
  • Patent number: 8510538
    Abstract: A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.
    Type: Grant
    Filed: April 13, 2010
    Date of Patent: August 13, 2013
    Assignee: Google Inc.
    Inventors: Grzegorz Malewicz, Marian Dvorsky, Christopher B. Colohan, Derek P. Thomson, Joshua Louis Levenberg
  • Publication number: 20130205122
    Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.
    Type: Application
    Filed: March 8, 2013
    Publication date: August 8, 2013
    Inventors: Hong WANG, John SHEN, Hong JIANG, Richard HANKINS, Per HAMMARLUND, Dion RODGERS, Gautham CHINYA, Baiju PATEL, Shiv KAUSHIK, Bryant BIGBEE, Gad SHEAFFER, Yoav Talgam, Yuval YOSEF, James P. HELD
  • Publication number: 20130205120
    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.
    Type: Application
    Filed: February 8, 2012
    Publication date: August 8, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Guy L Guthrie, William J. Starke, Derek E Williams
  • Publication number: 20130205121
    Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction.
    Type: Application
    Filed: November 28, 2012
    Publication date: August 8, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Patent number: 8505015
    Abstract: A “group work sorting” technique is used in a parallel computing system that executes multiple items of work across multiple parallel processing units, where each parallel processing unit processes one or more of the work items according to their positions in a prioritized work queue that corresponds to the parallel processing unit. When implementing the technique, one or more of the parallel processing units receives a new work item to be placed into a first work queue that corresponds to the parallel processing unit and receives data that indicates where one or more other parallel processing units would prefer to place the new work item in the prioritized work queues that correspond to the other parallel processing units. The parallel processing unit uses the received data as a guide in placing the new work item into the first work queue.
    Type: Grant
    Filed: October 29, 2009
    Date of Patent: August 6, 2013
    Assignee: Teradata US, Inc.
    Inventor: Curtis Stehley
  • Patent number: 8504804
    Abstract: In one embodiment, the present invention includes a method for determining if an instruction of a first thread dispatched from a first queue associated with the first thread is stalled in a pipestage of a pipeline, and if so, dispatching an instruction of a second thread from a second queue associated with the second thread to the pipeline if the second thread is not stalled. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 13, 2012
    Date of Patent: August 6, 2013
    Assignee: Intel Corporation
    Inventors: Matthew Merten, Avinash Sodani, James Hadley, Alexandre Farcy, Iredamola Olopade
  • Publication number: 20130198493
    Abstract: The subject invention relates to systems and methods that facilitate display, selection, and management of context associated with execution of add-on instructions. The systems and methods track add-on instruction calls provide a user with call and data context, wherein the user can select a particular add-on instruction context from a plurality of contexts in order to observe values and/or edit parameters associated with the add-on instruction. The add-on instruction context can include information such as instances of data for particular lines of execution, the add-on instruction called, a caller of the instruction, a location of the instruction call, references to complex data types and objects, etc. The systems and methods further provide a technique for automatic routine selection based on the add-on instruction state information such that the add-on instruction executed corresponds to a current state.
    Type: Application
    Filed: January 7, 2013
    Publication date: August 1, 2013
    Applicant: Rockwell Automation Technologies, Inc.
    Inventor: Rockwell Automation Technologies, Inc.
  • Publication number: 20130191618
    Abstract: Various embodiments of the present invention provide systems and methods for data processing using variable scaling.
    Type: Application
    Filed: March 8, 2013
    Publication date: July 25, 2013
    Applicant: LSI Corporation
    Inventor: LSI Corporation
  • Patent number: 8495344
    Abstract: A multi-core microprocessor includes first and second processing cores and a bus coupling the first and second processing cores. The bus conveys messages between the first and second processing cores. The cores are configured such that: the first core stops executing user instructions and interrupts the second core via the bus, in response to detecting a predetermined event; the second core stops executing user instructions, in response to being interrupted by the first core; each core outputs its state after it stops executing user instructions; and each core waits to begin fetching and executing user instructions until it receives a notification from the other core via the bus that the other core is ready to begin fetching and executing user instructions. In one embodiment, the predetermined event comprises detecting that the first core has retired a predetermined number of instructions. In one embodiment, microcode waits for the notification.
    Type: Grant
    Filed: March 29, 2010
    Date of Patent: July 23, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Jui-Shuan Chen
  • Patent number: 8494782
    Abstract: Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component.
    Type: Grant
    Filed: April 21, 2011
    Date of Patent: July 23, 2013
    Assignee: The United States of America as Represented by the Administrator of the National Aeronautics & Space Administration (NASA)
    Inventors: Viktor Stolc, Matthew W Brock
  • Patent number: 8495343
    Abstract: A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed.
    Type: Grant
    Filed: June 4, 2010
    Date of Patent: July 23, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Gerard M. Col, Timothy A. Elliott, Rodney E. Hooker, Terry Parks
  • Patent number: RE44494
    Abstract: A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.
    Type: Grant
    Filed: November 24, 2004
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: David J. Sager, Thomas D. Fletcher, Glenn J. Hinton, Michael D. Upton