Architecture Based Instruction Processing Patents (Class 712/200)

Data flow based system (Class 712/201)

Stack based computer (Class 712/202)

Multiprocessor instruction (Class 712/203)

Efficient approach for a unified command buffer

Patent number: 8675002

Abstract: A method for providing two or more processors access to a single command buffer is provided. The method includes receiving instructions in the command buffer from a central processor, at least one of the instructions being designated for a particular one of the two or more processors. The method also includes sending the at least one instruction to only the particular processor.

Type: Grant

Filed: June 9, 2010

Date of Patent: March 18, 2014

Assignee: ATI Technologies, ULC

Inventors: Joseph Andonieh, Arshad Rahman
SYSTEMS, APPARATUSES, AND METHODS FOR EXTRACTING A WRITEMASK FROM A REGISTER

Publication number: 20140068227

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor mask extraction from a general purpose register in response to a single mask extraction from a general purpose register instruction that includes a source general purpose register operand, a destination writemask register operand, an immediate value, and an opcode are described.

Type: Application

Filed: December 22, 2011

Publication date: March 6, 2014

Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
METHOD AND SYSTEM FOR MEMORY OVERLAYS FOR PORTABLE FUNCTION POINTERS

Publication number: 20140047213

Abstract: A system and method for implementing memory overlays for portable pointer variables. The method includes providing a program executable by a heterogeneous processing system comprising a plurality of a processors running a plurality of instruction set architectures (ISAs). The method also includes providing a plurality of processor specific functions associated with a function pointer in the program. The method includes executing the program by a first processor. The method includes dereferencing the function pointer by mapping the function pointer to a corresponding processor specific feature based on which processor in the plurality of processors is executing the program.

Type: Application

Filed: August 8, 2012

Publication date: February 13, 2014

Applicant: NVIDIA CORPORATION

Inventor: Olivier Giroux
Packet draining from a scheduling hierarchy in a traffic manager of a network processor

Patent number: 8638805

Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.

Type: Grant

Filed: September 30, 2011

Date of Patent: January 28, 2014

Assignee: LSI Corporation

Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
Real-time video transmission system and method

Patent number: 8635656

Abstract: A real-time video transmission system includes a wireless video input device, a wireless data transmission interface and a computer. The computer includes a virtual camera module, a data management module and an application module. The data management module repeatedly accesses a register. The virtual camera module writes a received real-time video request into the register. When the data management module obtains the real-time video request from the register during accessing of the same, the wireless video input device is driven to film and send back a real-time video stream. The data management module writes the received real-time video stream into the register. When the received real-time video stream is obtained from the register during accessing of the same, the virtual camera module transmits the real-time video stream to the application program module.

Type: Grant

Filed: October 7, 2011

Date of Patent: January 21, 2014

Assignee: Chicony Electronics Co., Ltd.

Inventors: Wei-Cheng Huang, Mei-Yi Tsai, Chien-Yu Chen
GENERALIZED BIT MANIPULATION INSTRUCTIONS FOR A COMPUTER PROCESSOR

Publication number: 20140019719

Abstract: Methods of bit manipulation within a computer processor are disclosed. Improved flexibility in bit manipulation proves helpful in computing elementary functions critical to the performance of many programs and for other applications. In one embodiment, a unit of input data is shifted/rotated and multiple non-contiguous bit fields from the unit of input data are inserted in an output register. In another embodiment, one of two units of input data is optionally shifted or rotated, the two units of input data are partitioned into a plurality of bit fields, bitwise operations are performed on each bit field, and pairs of bit fields are combined with either an AND or an OR bitwise operation. Embodiments are also disclosed to simultaneously perform these processes on multiple units and pairs of units of input data in a Single Input, Multiple Data processing environment capable of performing logical operations on floating point data.

Type: Application

Filed: July 11, 2012

Publication date: January 16, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Christopher K. Anand, Simon C. Broadhead, Robert F. Enenkel
VECTORIZED PATTERN SEARCHING

Publication number: 20140019718

Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media are described herein for vectorized searching for a pattern P within a set of data T, the pattern P having a length m. In various embodiments, the vectorized search may include a shift of a sliding window into T by a distance d that is greater than m on determination, based on one or more ordered vectorized comparisons of portions of P and T, that no potential match of P is found within the sliding window. In various embodiments, d and m may be positive integers. In various embodiments, the one or more ordered vectorized comparisons may include one or more single instruction multiple data (“SIMD”) instructions supported by the processor.

Type: Application

Filed: July 10, 2012

Publication date: January 16, 2014

Inventor: Shihjong J. Kuo
Shared Register Pool For A Multithreaded Microprocessor

Publication number: 20130332703

Abstract: A method of sharing a plurality of registers in a shared register pool among a plurality of microprocessor threads begins with a determination that a first instruction to be executed by a microprocessor in a first microprocessor thread requires a first logical register. Next a determination is made that a second instruction to be executed by the microprocessor in a second microprocessor thread requires a second logical register. A first physical register in the shared register pool is allocated to the first microprocessor thread for execution of the first instruction and the first logical register is mapped to the first physical register. A second physical register in the shared register pool is allocated to the second microprocessor thread for execution of the second instruction. Finally, the second logical register is mapped to the second physical register.

Type: Application

Filed: June 8, 2012

Publication date: December 12, 2013

Applicant: MIPS Technologies, Inc.

Inventor: Ilie GARBACEA
Dynamic load balancing of instructions for execution by heterogeneous processing engines

Patent number: 8578387

Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.

Type: Grant

Filed: July 31, 2007

Date of Patent: November 5, 2013

Assignee: Nvidia Corporation

Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
Optimized scalar promotion with load and splat SIMD instructions

Patent number: 8572586

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Grant

Filed: July 23, 2012

Date of Patent: October 29, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
Extending operations of an application in a data processing system

Patent number: 8555033

Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.

Type: Grant

Filed: July 20, 2011

Date of Patent: October 8, 2013

Assignee: International Business Machines Corporation

Inventors: Daniel C. Berg, Charles D. Bridgham, Derek F. Holt, Ritchard L. Schacher, Jason A. Sholl
Enforced unitasking in multitasking systems

Patent number: 8554856

Abstract: A computer system includes one or more devices that are capable of multitasking (performing at least two tasks in parallel or substantially in parallel). In response to detecting that one of the devices is performing a first one of the tasks, the system prevents the devices from performing at least one of the tasks other than the first task (such as all of the tasks other than the first task). In response to detecting that one of the devices is performing a second one of the tasks, the system prevents the devices from performing at least one of the tasks other than the second task (such as all of the tasks other than the second task).

Type: Grant

Filed: November 8, 2011

Date of Patent: October 8, 2013

Assignee: Yagi Corp.

Inventor: Robert Plotkin
EVENT LOGGER FOR JUST-IN-TIME STATIC TRANSLATION SYSTEM

Publication number: 20130262820

Abstract: Systems and methods for event logging in a just-in-time static translation system are disclosed. One method includes executing a workload in a computing system having a native instruction set architecture, the workload stored in one or more banks of non-native instructions. At least a portion of the workload is further included in one or more banks of native instructions and executing the workload comprises executing at least part of the workload from the one or more banks of native instructions. The method also includes determining an amount of time during execution of the workload in which the execution of the workload occurs from the one or more banks of native instructions. The method includes generating a log including performance statistics generated during execution of the workload, the performance statistics including the amount of time.

Type: Application

Filed: March 28, 2012

Publication date: October 3, 2013

Inventors: Michael J. Rieschl, Thomas L. Nowatzki, James F. Merten, Nathan Zimmer
SIMD ACCELERATOR FOR DATA COMPARISON

Publication number: 20130227250

Abstract: Some example embodiments include an apparatus for comparing a first operand to a second operand. The apparatus includes a SIMD accelerator configured to compare first multiple parts (e.g., bytes) of first operand to second multiple parts (e.g., bytes) of the second operand. The SIMD accelerator includes a ones' complement subtraction logic and a twos' complement logic configured to perform logic operations on the multiple parts of the first operand and the multiple parts of the second operand to generate a group of carry out and propagate data across bits of the multiple parts. At least a portion of the group of carry out and propagate data is reused in the group of logic operations.

Type: Application

Filed: February 24, 2012

Publication date: August 29, 2013

Applicant: International Business Machines Corporation

Inventors: Wilhelm Haller, Ulrich Krauch, Kurt Lind, Friedrich Schroeder, Alexander Woerner
Parsing Data Representative of a Hardware Design into Commands of a Hardware Design Environment

Publication number: 20130219150

Abstract: A method for implementing a hardware design that includes using a computer for receiving structured data that includes a representation of a basic hardware structure and a complex hardware structure that includes the basic hardware structure, parsing the structured data and generating, based on a result of the parsing, commands of a hardware design environment.

Type: Application

Filed: February 19, 2013

Publication date: August 22, 2013

Applicant: International Business Machines Corporation

Inventor: International Business Machines Corporation
Storing contexts for thread switching

Patent number: 8516496

Abstract: An electronic device comprising decode logic that decodes instructions and a stack coupled to the decode logic. A group of instructions causes the decode logic to push onto the stack, after halting processing of a first thread at a switch point and prior to processing a second thread, a minimum amount of information needed to resume execution of the first thread at the switch point and not information not needed to resume execution of the first thread at the switch point.

Type: Grant

Filed: July 21, 2005

Date of Patent: August 20, 2013

Assignee: Texas Instruments Incorporated

Inventors: Gilbert Cabillic, Gerard Chauvel
Semiconductor device and data processing system selectively operating as one of a big endian or little endian system

Patent number: 8504801

Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface tote used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.

Type: Grant

Filed: October 28, 2012

Date of Patent: August 6, 2013

Assignee: Renesas Electronics Corporation

Inventors: Goro Sakamaki, Yuri Azuma
PROCESSING ELEMENT MANAGEMENT IN A STREAMING DATA SYSTEM

Publication number: 20130198489

Abstract: Stream applications may inefficiently use the hardware resources that execute the processing elements of the data stream. For example, a compute node may host four processing elements and execute each using a CPU. However, other CPUs on the compute node may sit idle. To take advantage of these available hardware resources, a stream programmer may identify one or more processing elements that may be cloned. The cloned processing elements may be used to generate a different execution path that is parallel to the execution path that includes the original processing elements. Because the cloned processing elements contain the same operators as the original processing elements, the data stream that was previously flowing through only the original processing element may be split and sent through both the original and cloned processing elements. In this manner, the parallel execution path may use underutilized hardware resources to increase the throughput of the data stream.

Type: Application

Filed: December 10, 2012

Publication date: August 1, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
PERFORMING A CYCLIC REDUNDANCY CHECKSUM OPERATION RESPONSIVE TO A USER-LEVEL INSTRUCTION

Publication number: 20130191614

Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.

Type: Application

Filed: March 12, 2013

Publication date: July 25, 2013

Inventors: STEVEN KING, FRANK BERRY, MICHAEL KOUNAVIS
Apparatus and method for separate asymmetric control processing and data path processing in a configurable dual path processor that supports instructions having different bit widths

Patent number: 8484441

Abstract: A computer processor with control and data processing capabilities comprises a decode unit for decoding instructions. A data processing facility comprises a first data execution path including fixed operators and a second data execution path including at least configurable operators, the configurable operators having a plurality of predefined configurations, at least some of which are selectable by means of an opcode portion of a data processing instruction. The decode unit is operable to detect whether a data processing instruction defines a fixed data processing operation or a configurable data processing operation, said decode unit causing the computer system to supply data for processing to said first data execution path when a fixed data processing instruction is detected and to said configurable data execution path when a configurable data processing instruction is detected.

Type: Grant

Filed: March 31, 2004

Date of Patent: July 9, 2013

Assignee: Icera Inc.

Inventor: Simon Knowles
Vector Size Agnostic Single Instruction Multiple Data (SIMD) Processor Architecture

Publication number: 20130159667

Abstract: A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.

Type: Application

Filed: December 16, 2011

Publication date: June 20, 2013

Applicant: MIPS TECHNOLOGIES, INC.

Inventor: Ilie Garbacea
PREDECODE LOGIC FOR AUTOVECTORIZING SCALAR INSTRUCTIONS IN AN INSTRUCTION BUFFER

Publication number: 20130159668

Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.

Type: Application

Filed: December 20, 2011

Publication date: June 20, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
SPECIALIZED VECTOR INSTRUCTION AND DATAPATH FOR MATRIX MULTIPLICATION

Publication number: 20130159665

Abstract: A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided.

Type: Application

Filed: December 15, 2011

Publication date: June 20, 2013

Applicant: Verisilicon Holdings Co., Ltd.

Inventor: Asheesh Kashyap
DYNAMICALLY CONFIGURABLE PLACEMENT ENGINE

Publication number: 20130145121

Abstract: A stream application may allocate processing elements to one or more compute nodes (or hosts) to achieve a desired optimization goal. Each optimization mode may define processing element selection criteria and/or host selection criteria. When allocating a processing element to a host, a scheduler may place each processing element individually. Accordingly, the scheduler may use the processing element selection criteria for selecting which processing element in the stream application to allocate next. The scheduler may then determine, based on one or more constraints, which host the processing element can be placed on. If the scheduler determines that multiple hosts are suitable candidates for the processing element, it may use the host selection criteria to pick one of the candidate hosts that further optimize the stream application to meet the desired goal.

Type: Application

Filed: December 11, 2012

Publication date: June 6, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
REGISTER MANAGEMENT IN AN EXTENDED PROCESSOR ARCHITECTURE

Publication number: 20130138922

Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.

Type: Application

Filed: November 29, 2011

Publication date: May 30, 2013

Applicant: International Business Machines Corporation

Inventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
Information processing apparatus and information processing method

Patent number: 8452944

Abstract: An information processing apparatus includes: a first pipeline having first nodes, and moving data held in each first node to a first node located in a first direction; a second pipeline having second nodes corresponding to the first nodes, and moving data held in each second node to a second node located in a second direction that is opposite to the first direction; a first comparison unit arranged to compare data held in a node of interest with data held in a second node corresponding to the node of interest, where the node of interest is one of the first nodes; and a second comparison unit arranged to compare the data held in the node of interest with data held in a second node located one node on an upstream or downstream side of the second node corresponding to the node of interest.

Type: Grant

Filed: April 27, 2010

Date of Patent: May 28, 2013

Assignee: Canon Kabushiki Kaisha

Inventor: Tadayuki Ito
Parallel distributed processing method and computer system

Patent number: 8423605

Abstract: Provided is a parallel distributed processing method executed by a computer system comprising a parallel-distributed-processing control server, a plurality of extraction processing servers and a plurality of aggregation processing servers. The managed data includes at least a first and a second data items, the plurality of data items each including a value. The method includes a step of extracting data from one of the plurality of chunks according to a value in the second data item, to thereby group the data, a step of merging groups having the same value in the second data item based on an order of a value in the first data item of data contained in a group among groups, and a step of processing data in a group obtained through the merging by focusing on the order of the value in the first data item.

Type: Grant

Filed: March 17, 2010

Date of Patent: April 16, 2013

Assignee: Hitachi, Ltd.

Inventor: Ryo Kawai
SIMD Memory Circuit And Methodology To Support Upsampling, Downsampling And Transposition

Publication number: 20130091339

Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.

Type: Application

Filed: October 5, 2011

Publication date: April 11, 2013

Applicant: ST-Ericsson SA

Inventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
Apparatus and method for implementing instruction support for performing a cyclic redundancy check (CRC)

Patent number: 8417961

Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.

Type: Grant

Filed: March 16, 2010

Date of Patent: April 9, 2013

Assignee: Oracle International Corporation

Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
CONTROL REGISTER MAPPING IN HETEROGENEOUS INSTRUCTION SET ARCHITECTURE PROCESSOR

Publication number: 20130067199

Abstract: A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR.

Type: Application

Filed: March 6, 2012

Publication date: March 14, 2013

Applicant: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
Systems and methods for preserving the order of data

Patent number: 8397233

Abstract: A device includes an input processing unit and an output processing unit. The input processing unit dispatches first data to one of a group of processing engines, records an identity of the one processing engine in a location in a first memory, reserves one or more corresponding locations in a second memory, causes the first data to be processed by the one processing engine, and stores the processed first data in one of the locations in the second memory. The output processing unit receives second data, assigns an entry address corresponding to a location in an output memory to the second data, transfers the second data and the entry address to one of a group of second processing engines, causes the second data to be processed by the second processing engine, and stores the processed second data to the location in the output memory.

Type: Grant

Filed: May 23, 2007

Date of Patent: March 12, 2013

Assignee: Juniper Networks, Inc.

Inventors: Raymond Marcelino Manese Lim, Stefan Dyckerhoff, Jeffrey Glenn Libby, Teshager Tesfaye
DATA PROCESSOR

Publication number: 20130013894

Abstract: A RISC data processor in which the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. An instruction for generating flags according to operands' data sizes is defined, and an instruction set handled by the RISC data processor includes an instruction capable of executing an operation on operands in more than one data size. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved.

Type: Application

Filed: September 6, 2012

Publication date: January 10, 2013

Inventor: Fumio ARAKAWA
Replay instruction morphing

Patent number: 8347066

Abstract: Replay instruction morphing. One disclosed apparatus includes an execution unit to execute an instruction. A replay system replays an altered instruction if the execution unit executes the instruction erroneously.

Type: Grant

Filed: February 28, 2005

Date of Patent: January 1, 2013

Assignee: Intel Corporation

Inventors: Douglas M. Carmean, David J. Sager, Thomas F. Toll, Karol F. Menezes
Hardware wake-and-go mechanism with look-ahead polling

Patent number: 8341635

Abstract: A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism performs a look-ahead polling operation for each of the programming idioms. If each of the look-ahead polling operations fails, then the wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom.

Type: Grant

Filed: February 1, 2008

Date of Patent: December 25, 2012

Assignee: International Business Machines Corporation

Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Method and system for power-state transition controllers

Patent number: 8341436

Abstract: Power-state transitioning arrangements are implemented using a variety of methods. Using one such method, a power-state transitioning circuit arrangement is implemented having a processing circuit that does not include an arithmetic logic unit. A power-state transition script including instructions from an instruction set is stored in a memory circuit. The processing circuit implements the power-state transition script to facilitate a change in a power-state of another processor circuit.

Type: Grant

Filed: October 24, 2008

Date of Patent: December 25, 2012

Assignee: ST-Ericsson SA

Inventor: Greg Ehmann
Processor for Executing Wide Operand Operations Using a Control Register and a Results Register

Publication number: 20120311303

Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.

Type: Application

Filed: August 13, 2012

Publication date: December 6, 2012

Applicant: MicroUnity Systems Engineering, Inc.

Inventors: Craig Hansen, John Moussouris, Alexia Massalin
APPARATUS AND METHOD FOR PROCESSING OPERATIONS IN PARALLEL USING A SINGLE INSTRUCTION MULTIPLE DATA PROCESSOR

Publication number: 20120311302

Abstract: A parallel operation processing apparatus and method using a Single Instruction Multiple Data (SIMD) processor are provided. The parallel operation processing apparatus may combine input data of source nodes in a current column with input data of source nodes in a previous column, and may store the combined input data.

Type: Application

Filed: February 2, 2012

Publication date: December 6, 2012

Inventors: Ho Yang, Hyun Seok Lee
Semiconductor device and data processing system selectively operating as one of a big endian or little endian system

Patent number: 8316217

Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface to be used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.

Type: Grant

Filed: December 16, 2011

Date of Patent: November 20, 2012

Assignee: Renesas Electronics Corporation

Inventors: Goro Sakamaki, Yuri Azuma
Optimized Scalar Promotion with Load and Splat SIMD Instructions

Publication number: 20120290816

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Application

Filed: July 23, 2012

Publication date: November 15, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
APPARATUS AND METHOD FOR HANDLING OF MODIFIED IMMEDIATE CONSTANT DURING INSTRUCTION TRANSLATION

Publication number: 20120260068

Abstract: An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline.

Type: Application

Filed: March 9, 2012

Publication date: October 11, 2012

Applicant: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
MICROPROCESSOR THAT PERFORMS X86 ISA AND ARM ISA MACHINE LANGUAGE PROGRAM INSTRUCTIONS BY HARDWARE TRANSLATION INTO MICROINSTRUCTIONS EXECUTED BY COMMON EXECUTION PIPELINE

Publication number: 20120260067

Abstract: A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.

Type: Application

Filed: September 1, 2011

Publication date: October 11, 2012

Applicant: VIA Technologies, Inc.

Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
RUNNING UNARY OPERATION INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20120210099

Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.

Type: Application

Filed: April 26, 2012

Publication date: August 16, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
General distributed reduction for data parallel computing

Patent number: 8239847

Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.

Type: Grant

Filed: March 18, 2009

Date of Patent: August 7, 2012

Assignee: Microsoft Corporation

Inventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
SHARED FUNCTION MULTI-PORTED ROM APPARATUS AND METHOD

Publication number: 20120198208

Abstract: Various embodiments may be disclosed that may share a ROM pull down logic circuit among multiple ports of a processing core. The processing core may include an execution unit (EU) having an array of read only memory (ROM) pull down logic storing math functions. The ROM pull down logic circuit may implement single instruction, multiple data (SIMD) operations. The ROM pull down logic circuit may be operatively coupled with each of the multiple ports in a multi-port function sharing arrangement. Sharing the ROM pull down logic circuit reduces the need to duplicate logic and may result in a savings of chip area as well as a savings of power.

Type: Application

Filed: December 28, 2011

Publication date: August 2, 2012

Inventors: Satish K. Damaraju, Subramaniam Maiyuran
Assigning tasks to processors in heterogeneous multiprocessors

Patent number: 8230425

Abstract: Methods and arrangements of assigning tasks to processors are discussed. Embodiments include transformations, code, state machines or other logic to detect an attempt to execute an instruction of a task on a processor not supporting the instruction (non-supporting processor). The method may involve selecting a processor supporting the instruction (supporting physical processor). In many embodiments, the method may include storing data about the attempt to execute the instruction and, based upon the data, making another assignment of the task to a physical processor supporting the instruction. In some embodiments, the method may include representing the instruction set of a virtual processor as the union of the instruction sets of the physical processors comprising the virtual processor and assigning a task to the virtual processor based upon the representing.

Type: Grant

Filed: July 30, 2007

Date of Patent: July 24, 2012

Assignee: International Business Machines Corporation

Inventors: Manish Ahuja, Nathan Fontenot, Jacob L. Moilanen, Joel H. Schopp, Michael T. Strosaker
METHOD AND APPARATUS FOR FAST DECODING AND ENHANCING EXECUTION SPEED OF AN INSTRUCTION

Publication number: 20120179895

Abstract: Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.

Type: Application

Filed: January 12, 2011

Publication date: July 12, 2012

Applicant: ADVANCED MICRO DEVICES, INC.

Inventors: Ganesh Venkataramanan, Emil Talpes
Extending operations of an application in a data processing system

Patent number: 8214623

Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.

Type: Grant

Filed: August 1, 2008

Date of Patent: July 3, 2012

Assignee: International Business Machines Corporation

Inventors: Daniel Christopher Berg, Charles Dyer Bridgham, Derek Francis Holt, Ritchard Leonard Schacher, Jason Ashley Sholl
PREDICTING BRANCHES FOR VECTOR PARTITIONING LOOPS WHEN PROCESSING VECTOR INSTRUCTIONS

Publication number: 20120166765

Abstract: While fetching the instructions from a loop in program code, a processor calculates a number of times that a backward-branching instruction at the end of the loop will actually be taken when the fetched instructions are executed. Upon determining that the backward-branching instruction has been predicted taken more than the number of times that the branch instruction will actually be taken, the processor immediately commences a mispredict operation for the branch instruction, which comprises: (1) flushing fetched instructions from the loop that will not be executed from the processor, and (2) commencing fetching instructions from an instruction following the branch instruction.

Type: Application

Filed: March 7, 2012

Publication date: June 28, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
SYSTEMS AND METHODS FOR DETERMINING COMPUTE KERNELS FOR AN APPLICATION IN A PARALLEL-PROCESSING COMPUTER SYSTEM

Publication number: 20120144162

Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.

Type: Application

Filed: February 9, 2012

Publication date: June 7, 2012

Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
SYSTEMS AND METHODS INTEGRATING BOOLEAN PROCESSING AND MEMORY

Publication number: 20120137108

Abstract: The present disclosure relates to placing a Boolean Processor on a chip with memory to eliminate memory latency issues in computing systems. An asynchronous implementation of a Boolean Processor Switched Memory can theoretically operate at terahertz speed and vastly improve the rate at which computationally relevant data is fed to a microprocessor or microcontroller. Boolean Processor Enhanced Memories hold the promise of increasing memory throughput by several orders of magnitude and shifting the burden of “catching up” to microprocessors and microcontrollers.

Type: Application

Filed: May 24, 2011

Publication date: May 31, 2012

Inventor: Kenneth Elmon KOCH, III

prev 1 2 3 4 5 6 7 8 … next