Processing Control Patents (Class 712/220)

Arithmetic operation instruction processing (Class 712/221)

Floating point or vector (Class 712/222)

Logic operation instruction processing (Class 712/223)

Masking (Class 712/224)

Processing control for data transfer (Class 712/225)

Instruction modification based on condition (Class 712/226)

Specialized instruction processing in support of testing, debugging, emulation (Class 712/227)

Context preserving (e.g., context swapping, checkpointing, register windowing (Class 712/228)

Mode switch or change (Class 712/229)

Generating next microinstruction address (Class 712/230)

Detecting end or completion of microprogram (Class 712/231)

Hardwired controller (Class 712/232)

Branching (e.g., delayed branch, loop control, branch predict, interrupt) (Class 712/233)

Processing sequence control (i.e., microsequencing) (Class 712/245)

FLAG NON-MODIFICATION EXTENSION FOR ISA INSTRUCTIONS USING PREFIXES

Publication number: 20130297915

Abstract: In one embodiment, a processor includes an instruction decoder to receive and decode an instruction having a prefix and an opcode, an execution unit to execute the instruction based on the opcode, and flag modification override logic to prevent the execution unit from modifying a flag register of the processor based on the prefix of the instruction.

Type: Application

Filed: November 14, 2011

Publication date: November 7, 2013

Inventors: Jonathan D. Combs, Jason W. Brandt, Robert Valentine
MAINTAINING DATA COHERENCE BY USING DATA DOMAINS

Publication number: 20130297914

Abstract: A method, system and computer program product are disclosed for maintaining data coherence, for use in a multi-node processing system where each of the nodes includes one or more components. In one embodiment, the method comprises establishing a data domain, assigning a group of the components to the data domain, sending a coherence message from a first component of the processing system to a second component of the processing system, and determining if that second component is assigned to the data domain. In this embodiment, if that second component is assigned to the data domain, the coherence message is transferred to all of the components assigned to the data domain to maintain data coherency among those components. In an embodiment, if that second component is assigned to the data domain, the first component is assigned to the data domain.

Type: Application

Filed: July 8, 2013

Publication date: November 7, 2013

Inventors: Kattamuri Ekanadham, Il Park, Pratap Pattnaik
Eliminating Redundant Masking Operations Instruction Processing Circuits, And Related Processor Systems, Methods, And Computer-Readable Media

Publication number: 20130290683

Abstract: Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. in this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.

Type: Application

Filed: October 19, 2012

Publication date: October 31, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine
Providing multiple quiesce state machines in a computing environment

Patent number: 8572624

Abstract: A system, method and computer program product for providing multiple quiesce state machines. The system includes a first controller including logic for processing a first quiesce request. The system also includes a second controller including logic for processing a second quiesce request. All or a portion of the processing of the second quiesce request overlaps in time with the processing of the first quiesce request. Thus, multiple quiesce requests may be active in the system at the same time.

Type: Grant

Filed: February 26, 2008

Date of Patent: October 29, 2013

Assignee: International Business Machines Corporation

Inventors: Lisa C. Heller, Norbert Hagspiel, Ute Gaertner, Hanno Ulrich, Rebecca S. Wisniewski
Thread-local memory reference promotion for translating CUDA code for execution by a general purpose processor

Patent number: 8572588

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Grant

Filed: March 31, 2009

Date of Patent: October 29, 2013

Assignee: Nvidia Corporation

Inventors: Vinod Grover, Bastiaan Joannes Matheus Aarts, Michael Murphy
Support for non-local returns in parallel thread SIMD engine

Patent number: 8572355

Abstract: One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.

Type: Grant

Filed: September 13, 2010

Date of Patent: October 29, 2013

Assignee: Nvidia Corporation

Inventors: Guillermo Juan Rozas, Brett W. Coon
EXPRESSING PARALLEL EXECUTION RELATIONSHIPS IN A SEQUENTIAL PROGRAMMING LANGUAGE

Publication number: 20130283015

Abstract: Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.

Type: Application

Filed: January 7, 2013

Publication date: October 24, 2013

Inventors: Ian A. Buck, Bastiaan Aarts
Software-defined radio using multi-core processor

Patent number: 8565811

Abstract: A radio control board passes a plurality of digital samples between a memory of a computing device and a radio frequency (RF) transceiver coupled to a system bus of the computing device. Processing of the digital samples is carried out by one or more cores of a multi-core processor to implement a software-defined radio.

Type: Grant

Filed: August 4, 2009

Date of Patent: October 22, 2013

Assignee: Microsoft Corporation

Inventors: Kun Tan, Jiansong Zhang, Yongguang Zhang
System to profile and optimize user software in a managed run-time environment

Patent number: 8566567

Abstract: Method, apparatus, and system for monitoring performance within a processing resource, which may be used to modify user-level software. Some embodiments of the invention pertain to an architecture to allow a user to improve software running on a processing resources on a per-thread basis in real-time and without incurring significant processing overhead.

Type: Grant

Filed: June 21, 2012

Date of Patent: October 22, 2013

Assignee: Intel Corporation

Inventors: Chris J. Newburn, Robert Knight, Robert Geva, Dion Rodgers, Xiang Zou, Hong Wang, Bryant E. Bigbee, Ittai Anati
INTEGRATED CIRCUIT DEVICE AND METHOD FOR PERFORMING CONDITIONAL NEGATION OF DATA

Publication number: 20130275725

Abstract: An integrated circuit device comprising at least one digital signal processor (DSP) module, the at least one DSP module comprising a first data register and at least one further data register and at least one data execution unit (DEU) module arranged to execute operations on target data stored within the first data register and the at least one further data register. The at least one DEU module is arranged, upon receipt of a conditional negation instruction, to retrieve at least one conditional bit value from the first data register, and conditionally perform negation of target data within the at least one further data register according to the at least one retrieved conditional bit value.

Type: Application

Filed: January 3, 2011

Publication date: October 17, 2013

Applicant: Freescale Semiconductor, Inc.

Inventors: Ilia Moskovich, Fabrice Aidan, Avi Gal, Dmitry Lachover
Thread fairness on a multi-threaded processor with multi-cycle cryptographic operations

Patent number: 8560814

Abstract: Systems and methods for efficient execution of operations in a multi-threaded processor. Each thread may include a blocking instruction. A blocking instruction blocks other threads from utilizing hardware resources for an appreciable amount of time. One example of a blocking type instruction is a Montgomery multiplication cryptographic instruction. Each thread can operate in a thread-based mode that allows the insertion of stall cycles during the execution of blocking instructions, during which other threads may utilize the previously blocked hardware resources. At times when multiple threads are scheduled to execute blocking instructions, the thread-based mode may be changed to increase throughput for these multiple threads. For example, the mode may be changed to disallow the insertion of stall cycles. Therefore, the time for sequential operation of the blocking instructions corresponding to the multiple threads may be reduced.

Type: Grant

Filed: May 4, 2010

Date of Patent: October 15, 2013

Assignee: Oracle International Corporation

Inventors: Robert T. Golla, Christopher H. Olson, Gregory F. Grohoski
Signal processing apparatus with user-configurable circuit configuration

Patent number: 8555251

Abstract: A signal processing apparatus for performing signal processing including a plurality of steps in data units by software signal processing includes signal processing modules performing the steps, a circuit configuration information storing and managing unit storing the signal processing modules and circuit configuration information, a signal processing order determining unit determining a signal processing order by performing path routing, a signal processing executing unit executing the signal processing in the determined order, and a circuit configuration changing unit changing circuit configuration information and causing the signal processing order determining unit to re-execute path routing to determine a signal processing order for the changed circuit configuration information during a period from the end of the software signal processing in the data unit to the beginning of the subsequent data unit.

Type: Grant

Filed: March 21, 2006

Date of Patent: October 8, 2013

Assignee: Sony Corporation

Inventor: Kosei Yamashita
System and method for performing predicated selection of an output register

Patent number: 8555036

Abstract: A system includes a processor having an instruction register for storing an instruction having a predefined opcode, a predicate register for storing a predicate condition to select an output register for a result of the instruction, a first output register, and a second output register. The processor further includes processor circuitry operable to execute the instruction to produce a result, and processor circuitry operable to store the result of the instruction in the first output register if the predicate condition to select the output is true, and to store the second output register if the predicate condition to select the output is false. A single instruction is used to produce the result, and to store the result of the instruction.

Type: Grant

Filed: May 17, 2010

Date of Patent: October 8, 2013

Assignee: NVIDIA Corporation

Inventors: Timo Aila, Samuli Laine
Enforced unitasking in multitasking systems

Patent number: 8554856

Abstract: A computer system includes one or more devices that are capable of multitasking (performing at least two tasks in parallel or substantially in parallel). In response to detecting that one of the devices is performing a first one of the tasks, the system prevents the devices from performing at least one of the tasks other than the first task (such as all of the tasks other than the first task). In response to detecting that one of the devices is performing a second one of the tasks, the system prevents the devices from performing at least one of the tasks other than the second task (such as all of the tasks other than the second task).

Type: Grant

Filed: November 8, 2011

Date of Patent: October 8, 2013

Assignee: Yagi Corp.

Inventor: Robert Plotkin
Hardware Managed Ordered Circuit

Publication number: 20130262834

Abstract: A system and method is provided for improving efficiency, power, and bandwidth consumption in parallel processing. Rather than requiring memory polling to ensure ordered execution of processes or threads, the techniques disclosed herein provide a system and method to allow any process or thread to run out of order as long as needed, but ensure ordered execution of multiple ordered instructions when needed. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.

Type: Application

Filed: March 29, 2012

Publication date: October 3, 2013

Inventors: Laurent Lefebvre, Michael Mantor
CODE GENERATION METHOD AND INFORMATION PROCESSING APPARATUS

Publication number: 20130262835

Abstract: An information processing apparatus generates first and second operation trees representing a dependency relationship among the instructions included in a first code, and computes first and second operation sequences from the first and second operation trees. Then, the information processing apparatus computes the longest ones of operation subsequences common to the first and second operation sequences, evaluates, for each longest operation subsequence, the utilization of computing resources used for executing the combinations of instructions of the first and second operation trees corresponding to the operations included in the longest operation subsequence, and selects a combination pattern of instructions indicated by any one of the longest operation subsequences on the basis of the evaluation results.

Type: Application

Filed: March 12, 2013

Publication date: October 3, 2013

Applicant: FUJITSU LIMITED

Inventors: Takashi ARAKAWA, Shuichi Chiba, Toshihiro Konda
Instruction Scheduling for Reducing Register Usage

Publication number: 20130262832

Abstract: A method, computer program product, and system are provided for scheduling a plurality of instructions in a computing system. For example, the method can generate a plurality of instruction lineages, in which the plurality of instruction lineages is assigned to one or more registers. Each of the plurality of instruction lineages has at least one node representative of an instruction from the plurality of instructions. The method can also determine a node order based on respective priority values associated with each of the nodes. Further, the method can include scheduling the plurality of instructions based on the node order and the one or more registers assigned to the one or more registers.

Type: Application

Filed: March 30, 2012

Publication date: October 3, 2013

Applicant: Advanced Micro Devices, Inc.

Inventors: Gang CHEN, Srinivasa B. Yadavalli
SINGLE CYCLE COMPARE AND SELECT OPERATIONS

Publication number: 20130262819

Abstract: An apparatus includes a processor to determine an extremum among a series of values that are successively provided to a first register and a second register. The processor is configured to execute a single cycle search instruction, including compare a value in the first register with a value in a first accumulator, and store an extremum of the two values in the first accumulator; and compare a value in the second register with a value in a second accumulator, and store an extremum of the two values in the second accumulator. The processor is configured to execute a single cycle select instruction, including compare the value in the first accumulator with the value in the second accumulator, and store an extremum of the two values in the first accumulator, the extremum stored in the first accumulator representing the extremum of the series of numbers.

Type: Application

Filed: April 2, 2012

Publication date: October 3, 2013

Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
Configurable processing apparatus and system thereof

Patent number: 8549258

Abstract: A configurable processing apparatus includes a plurality of processing units, at least an instruction synchronization control circuit, and at least a configuration memory. Each processing apparatus has a stall-output signal generating circuit to output a stall-output signal, wherein the stall-output signal indicates that an unexpected stall is occurred in the processing unit. The processing unit has a stall-in signal, and an external circuit of the processing unit can control whether the processing unit is stalled according to the stall-in signal. The instruction synchronization control circuit generates the stall-in signals to the processing units in response to a content stored in the configuration memory and the stall-output signals of the processing units, so as to determine operation modes and instruction synchronization of the processing units.

Type: Grant

Filed: February 7, 2010

Date of Patent: October 1, 2013

Assignee: Industrial Technology Research Institute

Inventors: Tzu-Fang Lee, Chien-Hong Lin, Jing-Shan Liang, Chi-Lung Wang
Facilitating transport mode input/output operations between a channel subsystem and input/output devices

Patent number: 8549185

Abstract: A computer program product is provided for performing an input/output (I/O) processing operation at a host computer system. The computer program product is configured to perform: obtaining a transport command word (TCW) at a channel subsystem for an I/O operation, the TCW including an address of a transport command control block (TCCB) having a transport command area (TCA) configured to hold a first plurality of device command words (DCW) and control data associated with respective DCWs, the first plurality of DCWs including a transfer TCA extension (TTE) DCW that specifies a TCA extension, the TCA extension configured to hold one or more DCWs and control data associated with respective DCWs; gathering the TCCB from one or more locations specified in the TCCB address and transferring the TCCB to the control unit; gathering the TCA extension specified by the TTE DCW; and transferring the TCA extension to the control unit.

Type: Grant

Filed: June 30, 2011

Date of Patent: October 1, 2013

Assignee: International Business Machines Corporation

Inventors: Susan K. Candelaria, Scott M. Carlson, Daniel F. Casper, John R. Flanagan, Roger G. Hathorn, Matthew J. Kalos, Louis W. Ricci, Dale F. Riedy, Cynthia Sittmann
APPARATUS AND METHOD FOR PROCESSING INVALID OPERATION IN PROLOGUE OR EPILOGUE OF LOOP

Publication number: 20130254517

Abstract: An apparatus for processing an invalid operation in a prologue and/or an epilogue of a loop includes a register file including a first region for storing a data validity value indicating whether data is valid or invalid, and a second region for storing the data; and a functional unit configured to determine whether an operation is valid or invalid based on a value of a first region of each of one or more input sources received from the register file, and output a destination including a value based on the value of the first region of each of the input sources

Type: Application

Filed: March 15, 2013

Publication date: September 26, 2013

Applicants: Seoul National University R&DB Foundation, Samsung Electronics Co., Ltd.

Inventors: Seong-Hun Jeong, Bernhard Egger, Won-Sub Kim
Use of modes for computer cluster management

Patent number: 8544031

Abstract: A system, method and computer program product for managing a plurality of applications in a computer cluster. Each application is able to run on a particular node in the cluster. In one embodiment, associations are maintained among a plurality of modes and the plurality of applications, with each application being associated with at least one mode. Responsive to designation of at least one mode as active for the cluster, each application that is associated with an active mode is flagged as eligible for activation, each inactive application that is not associated with any active mode is flagged as ineligible for activation, and each active application that is not associated with any active mode is flagged as ineligible for activation and inactivated. Flagging as eligible, flagging as ineligible and flagging as ineligible and inactivating may be performed in any order, and inactivating is sequenced according to dependencies among the applications.

Type: Grant

Filed: February 22, 2012

Date of Patent: September 24, 2013

Assignee: International Business Machines Corporation

Inventor: Michael P. Clarke
RUN-TIME INSTRUMENTATION REPORTING

Publication number: 20130246755

Abstract: Embodiments of the invention relate to run-time instrumentation reporting. An instruction stream is executed by a processor. Run-time instrumentation information of the executing instruction stream is captured by the processor. Run-time instrumentation records are created based on the captured run-time instrumentation information. A run-time instrumentation sample point of the executing instruction stream on the processor is detected. A reporting group is stored in a run-time instrumentation program buffer. The storing is based on the detecting and the storing includes: determining a current address of the run-time instrumentation program buffer, the determining based on instruction accessible run-time instrumentation controls; and storing the reporting group into the run-time instrumentation program buffer based on an origin address and the current address of the run-time instrumentation program buffer, the reporting group including the created run-time instrumentation records.

Type: Application

Filed: March 16, 2012

Publication date: September 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mark S. Farrell, Charles W. Gainey, JR., Marcel Mitran, Chung-Lung K. Shum, Brian L. Smith
RUN-TIME INSTRUMENTATION INDIRECT SAMPLING BY ADDRESS

Publication number: 20130246754

Abstract: Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.

Type: Application

Filed: March 16, 2012

Publication date: September 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Charles W. Gainey, JR., Michael K. Gschwind
VECTOR FIND ELEMENT NOT EQUAL INSTRUCTION

Publication number: 20130246751

Abstract: Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.

Type: Application

Filed: March 15, 2012

Publication date: September 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
HARDWARE PROTOCOL STACK

Publication number: 20130246756

Abstract: Disclosed is a hardware protocol stack, where header information of analysis-subjected protocol is stored in a register unit, comparison is made whether information recorded in the header of inputted frame mutually matches header information stored in the register unit, and data is extracted as a result of the comparison.

Type: Application

Filed: March 11, 2013

Publication date: September 19, 2013

Applicant: LSIS CO., LTD.

Inventors: Soo Gang LEE, Dae Hyun KWON
VECTOR FIND ELEMENT EQUAL INSTRUCTION

Publication number: 20130246752

Abstract: Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.

Type: Application

Filed: March 15, 2012

Publication date: September 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Michael K. Gschwind, Eric M. Schwarz, Timothy J. Slegel
VECTOR STRING RANGE COMPARE

Publication number: 20130246753

Abstract: Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements.

Type: Application

Filed: March 15, 2012

Publication date: September 19, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Eric M. Schwarz, Timothy J. Slegel
Adaptively handling remote atomic execution based upon contention prediction

Patent number: 8533436

Abstract: In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.

Type: Grant

Filed: June 26, 2009

Date of Patent: September 10, 2013

Assignee: Intel Corporation

Inventors: Joshua B. Fryman, Edward T. Grochowski, Toni Juan, Andrew Thomas Forsyth, John Mejia, Ramacharan Sundararaman, Eric Sprangle, Roger Espasa, Ravi Rajwar
Guaranteed prefetch instruction

Patent number: 8533437

Abstract: A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions.

Type: Grant

Filed: May 17, 2010

Date of Patent: September 10, 2013

Assignee: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker
Method and system of scheduling out-of-order operations without the requirement to execute compare, ready and pick logic in a single cycle

Patent number: 8533721

Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.

Type: Grant

Filed: March 26, 2010

Date of Patent: September 10, 2013

Assignee: Intel Corporation

Inventors: Stephen J. Robinson, Deepak Limaye
Unpacking Packed Data In Multiple Lanes

Publication number: 20130232321

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: March 15, 2013

Publication date: September 5, 2013

Inventors: Asaf Hargil, Doron Orenstein
System for selectively synchronizing high-assurance software tasks on multiple processors at a software routine level

Patent number: 8527741

Abstract: A task matching circuit for synchronizing software on a plurality of processors is disclosed. The task matching circuit includes first and second inputs, an analysis sub-circuit, and an output. The first input is from a first processor configured to receive a first software routine identifier. The second input is from a second processor configured to receive a second software routine identifier. The analysis sub-circuit determines if the first software routine identifier corresponds with the second software routine identifier. The output is coupled to at least one of the first or second processors and indicates when the first and second software routine identifiers do not correspond. One of the first and second processors is delayed until the first and second software routine identifiers correspond.

Type: Grant

Filed: July 3, 2006

Date of Patent: September 3, 2013

Assignee: ViaSat, Inc.

Inventors: Albert J. Bourdon, Gary G. Christensen, Michael J. Godfrey
Providing thread fairness by biasing selection away from a stalling thread using a stall-cycle counter in a hyper-threaded microprocessor

Patent number: 8521993

Abstract: A method and apparatus for providing fairness in a multi-processing element environment is herein described. Mask elements are utilized to associated portions of a reservation station with each processing element, while still allowing common access to another portion of reservation station entries. Additionally, bias logic biases selection of processing elements in a pipeline away from a processing element associated with a blocking stall to provide fair utilization of the pipeline.

Type: Grant

Filed: April 9, 2007

Date of Patent: August 27, 2013

Assignee: Intel Corporation

Inventors: Morris Marden, Matthew Merten, Alexandre Farcy, Avinash Sodani, James Hadley, Ilhyun Kim
Executing a prefetching policy responsive to entry into an execution phase of an application

Patent number: 8516226

Abstract: A method and system for flexible prefetching of data and/or instructions for applications are described. A prefetching mechanism monitors program instructions and tag information associated with the instructions. The tag information is used to determine when a prefetch operation is desirable. The prefetching mechanism then requests data and/or instructions. Furthermore, the prefetching mechanism determines when entry into a different execution phase of an application program occurs, and executes a different prefetching policy based on the application's program instructions and tag information for that execution phase as well as profile information from previous executions of the application in that execution phase.

Type: Grant

Filed: January 23, 2006

Date of Patent: August 20, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Jean-Francois Collard, Norman Paul Jouppi
DATA PROCESSING SYSTEM WITH LATENCY TOLERANCE EXECUTION

Publication number: 20130212358

Abstract: A data processing system comprises a processor unit that includes an instruction decode/issue unit including a re-order buffer having entries that include an execution queue tag that indicates an execution queue location of an instruction to which a re-order buffer entry is assigned, a result valid indicator to indicate that a corresponding instruction has executed with a status bit valid result, and a forward indicator to indicate that the status bit can be forwarded to an execution queue of an instruction pointed to that is waiting to receive the status bit.

Type: Application

Filed: February 15, 2012

Publication date: August 15, 2013

Inventors: Thang M. Tran, Trinh Huy Nguyen
INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS

Publication number: 20130212361

Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.

Type: Application

Filed: March 15, 2013

Publication date: August 15, 2013

Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
Floating Point Constant Generation Instruction

Publication number: 20130212357

Abstract: Systems and methods for generating a floating point constant value from an instruction are disclosed. A first field of the instruction is decoded as a sign bit of the floating point constant value. A second field of the instruction is decoded to correspond to an exponent value of the floating point constant value. A third field of the instruction is decoded to correspond to the significand of the floating point constant value. The first field, the second field, and the third field are combined to form the floating point constant value. The exponent value may include a bias, and a bias constant may be added to the exponent value to compensate for the bias. The third field may comprise the most significant bits of the significand. Optionally, the second field and the third field may be shifted by first and second shift values respectively before they are combined to form the floating point constant value.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Erich James Plondke, Lucian Codrescu, Charles Joseph Tabony, Swaminathan Balasubramanian
System and method for limiting the impact of stragglers in large-scale parallel data processing

Patent number: 8510538

Abstract: A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.

Type: Grant

Filed: April 13, 2010

Date of Patent: August 13, 2013

Assignee: Google Inc.

Inventors: Grzegorz Malewicz, Marian Dvorsky, Christopher B. Colohan, Derek P. Thomson, Joshua Louis Levenberg
INSTRUCTION SET ARCHITECTURE-BASED INTER-SEQUENCER COMMUNICATIONS WITH A HETEROGENEOUS RESOURCE

Publication number: 20130205122

Abstract: In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.

Type: Application

Filed: March 8, 2013

Publication date: August 8, 2013

Inventors: Hong WANG, John SHEN, Hong JIANG, Richard HANKINS, Per HAMMARLUND, Dion RODGERS, Gautham CHINYA, Baiju PATEL, Shiv KAUSHIK, Bryant BIGBEE, Gad SHEAFFER, Yoav Talgam, Yuval YOSEF, James P. HELD
PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS

Publication number: 20130205120

Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.

Type: Application

Filed: February 8, 2012

Publication date: August 8, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Guy L Guthrie, William J. Starke, Derek E Williams
PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS

Publication number: 20130205121

Abstract: A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction.

Type: Application

Filed: November 28, 2012

Publication date: August 8, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
Placing a group work item into every prioritized work queue of multiple parallel processing units based on preferred placement of the work queues

Patent number: 8505015

Abstract: A “group work sorting” technique is used in a parallel computing system that executes multiple items of work across multiple parallel processing units, where each parallel processing unit processes one or more of the work items according to their positions in a prioritized work queue that corresponds to the parallel processing unit. When implementing the technique, one or more of the parallel processing units receives a new work item to be placed into a first work queue that corresponds to the parallel processing unit and receives data that indicates where one or more other parallel processing units would prefer to place the new work item in the prioritized work queues that correspond to the other parallel processing units. The parallel processing unit uses the received data as a guide in placing the new work item into the first work queue.

Type: Grant

Filed: October 29, 2009

Date of Patent: August 6, 2013

Assignee: Teradata US, Inc.

Inventor: Curtis Stehley
Managing multiple threads in a single pipeline

Patent number: 8504804

Abstract: In one embodiment, the present invention includes a method for determining if an instruction of a first thread dispatched from a first queue associated with the first thread is stalled in a pipestage of a pipeline, and if so, dispatching an instruction of a second thread from a second queue associated with the second thread to the pipeline if the second thread is not stalled. Other embodiments are described and claimed.

Type: Grant

Filed: September 13, 2012

Date of Patent: August 6, 2013

Assignee: Intel Corporation

Inventors: Matthew Merten, Avinash Sodani, James Hadley, Alexandre Farcy, Iredamola Olopade
SYSTEMS AND METHODS THAT FACILITATE MANAGEMENT OF ADD-ON INSTRUCTION GENERATION, SELECTION, AND/OR MONITORING DURING EXECUTION

Publication number: 20130198493

Abstract: The subject invention relates to systems and methods that facilitate display, selection, and management of context associated with execution of add-on instructions. The systems and methods track add-on instruction calls provide a user with call and data context, wherein the user can select a particular add-on instruction context from a plurality of contexts in order to observe values and/or edit parameters associated with the add-on instruction. The add-on instruction context can include information such as instances of data for particular lines of execution, the add-on instruction called, a caller of the instruction, a location of the instruction call, references to complex data types and objects, etc. The systems and methods further provide a technique for automatic routine selection based on the add-on instruction state information such that the add-on instruction executed corresponds to a current state.

Type: Application

Filed: January 7, 2013

Publication date: August 1, 2013

Applicant: Rockwell Automation Technologies, Inc.

Inventor: Rockwell Automation Technologies, Inc.
Systems and Methods for Dynamic Scaling in a Data Decoding System

Publication number: 20130191618

Abstract: Various embodiments of the present invention provide systems and methods for data processing using variable scaling.

Type: Application

Filed: March 8, 2013

Publication date: July 25, 2013

Applicant: LSI Corporation

Inventor: LSI Corporation
Simultaneous execution resumption of multiple processor cores after core state information dump to facilitate debugging via multi-core processor simulator using the state information

Patent number: 8495344

Abstract: A multi-core microprocessor includes first and second processing cores and a bus coupling the first and second processing cores. The bus conveys messages between the first and second processing cores. The cores are configured such that: the first core stops executing user instructions and interrupts the second core via the bus, in response to detecting a predetermined event; the second core stops executing user instructions, in response to being interrupted by the first core; each core outputs its state after it stops executing user instructions; and each core waits to begin fetching and executing user instructions until it receives a notification from the other core via the bus that the other core is ready to begin fetching and executing user instructions. In one embodiment, the predetermined event comprises detecting that the first core has retired a predetermined number of instructions. In one embodiment, microcode waits for the notification.

Type: Grant

Filed: March 29, 2010

Date of Patent: July 23, 2013

Assignee: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Jui-Shuan Chen
Rapid polymer sequencer

Patent number: 8494782

Abstract: Method and system for rapid and accurate determination of each of a sequence of unknown polymer components, such as nucleic acid components. A self-assembling monolayer of a selected substance is optionally provided on an interior surface of a pipette tip, and the interior surface is immersed in a selected liquid. A selected electrical field is impressed in a longitudinal direction, or in a transverse direction, in the tip region, a polymer sequence is passed through the tip region, and a change in an electrical current signal is measured as each polymer component passes through the tip region. Each of the measured changes in electrical current signals is compared with a database of reference electrical change signals, with each reference signal corresponding to an identified polymer component, to identify the unknown polymer component with a reference polymer component.

Type: Grant

Filed: April 21, 2011

Date of Patent: July 23, 2013

Assignee: The United States of America as Represented by the Administrator of the National Aeronautics & Space Administration (NASA)

Inventors: Viktor Stolc, Matthew W Brock
Apparatus and method for detection and correction of denormal speculative floating point operand

Patent number: 8495343

Abstract: A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed.

Type: Grant

Filed: June 4, 2010

Date of Patent: July 23, 2013

Assignee: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Gerard M. Col, Timothy A. Elliott, Rodney E. Hooker, Terry Parks
Processor having execution core sections operating at different clock rates

Patent number: RE44494

Abstract: A processor including a first execution core section clocked to perform execution operations at a first clock frequency, and a second execution core section clocked to perform execution operations at a second clock frequency which is different than the first clock frequency. The second execution core section runs faster and includes a data cache and critical ALU functions, while the first execution core section includes latency-tolerant functions such as instruction fetch and decode units and non-critical ALU functions. The processor may further include an I/O ring which may be still slower than the first execution core section. Optionally, the first execution core section may include a third execution core section whose clock rate is between that of the first and second execution core sections. Clock multipliers/dividers may be used between the various sections to derive their clocks from a single source, such as the I/O clock.

Type: Grant

Filed: November 24, 2004

Date of Patent: September 10, 2013

Assignee: Intel Corporation

Inventors: David J. Sager, Thomas D. Fletcher, Glenn J. Hinton, Michael D. Upton

prev … 5 6 7 8 9 10 11 12 13 … next