Architecture Based Instruction Processing Patents (Class 712/200)
  • Patent number: 8675002
    Abstract: A method for providing two or more processors access to a single command buffer is provided. The method includes receiving instructions in the command buffer from a central processor, at least one of the instructions being designated for a particular one of the two or more processors. The method also includes sending the at least one instruction to only the particular processor.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: March 18, 2014
    Assignee: ATI Technologies, ULC
    Inventors: Joseph Andonieh, Arshad Rahman
  • Publication number: 20140068227
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor mask extraction from a general purpose register in response to a single mask extraction from a general purpose register instruction that includes a source general purpose register operand, a destination writemask register operand, an immediate value, and an opcode are described.
    Type: Application
    Filed: December 22, 2011
    Publication date: March 6, 2014
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Publication number: 20140047213
    Abstract: A system and method for implementing memory overlays for portable pointer variables. The method includes providing a program executable by a heterogeneous processing system comprising a plurality of a processors running a plurality of instruction set architectures (ISAs). The method also includes providing a plurality of processor specific functions associated with a function pointer in the program. The method includes executing the program by a first processor. The method includes dereferencing the function pointer by mapping the function pointer to a corresponding processor specific feature based on which processor in the plurality of processors is executing the program.
    Type: Application
    Filed: August 8, 2012
    Publication date: February 13, 2014
    Applicant: NVIDIA CORPORATION
    Inventor: Olivier Giroux
  • Patent number: 8638805
    Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: January 28, 2014
    Assignee: LSI Corporation
    Inventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
  • Patent number: 8635656
    Abstract: A real-time video transmission system includes a wireless video input device, a wireless data transmission interface and a computer. The computer includes a virtual camera module, a data management module and an application module. The data management module repeatedly accesses a register. The virtual camera module writes a received real-time video request into the register. When the data management module obtains the real-time video request from the register during accessing of the same, the wireless video input device is driven to film and send back a real-time video stream. The data management module writes the received real-time video stream into the register. When the received real-time video stream is obtained from the register during accessing of the same, the virtual camera module transmits the real-time video stream to the application program module.
    Type: Grant
    Filed: October 7, 2011
    Date of Patent: January 21, 2014
    Assignee: Chicony Electronics Co., Ltd.
    Inventors: Wei-Cheng Huang, Mei-Yi Tsai, Chien-Yu Chen
  • Publication number: 20140019719
    Abstract: Methods of bit manipulation within a computer processor are disclosed. Improved flexibility in bit manipulation proves helpful in computing elementary functions critical to the performance of many programs and for other applications. In one embodiment, a unit of input data is shifted/rotated and multiple non-contiguous bit fields from the unit of input data are inserted in an output register. In another embodiment, one of two units of input data is optionally shifted or rotated, the two units of input data are partitioned into a plurality of bit fields, bitwise operations are performed on each bit field, and pairs of bit fields are combined with either an AND or an OR bitwise operation. Embodiments are also disclosed to simultaneously perform these processes on multiple units and pairs of units of input data in a Single Input, Multiple Data processing environment capable of performing logical operations on floating point data.
    Type: Application
    Filed: July 11, 2012
    Publication date: January 16, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher K. Anand, Simon C. Broadhead, Robert F. Enenkel
  • Publication number: 20140019718
    Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media are described herein for vectorized searching for a pattern P within a set of data T, the pattern P having a length m. In various embodiments, the vectorized search may include a shift of a sliding window into T by a distance d that is greater than m on determination, based on one or more ordered vectorized comparisons of portions of P and T, that no potential match of P is found within the sliding window. In various embodiments, d and m may be positive integers. In various embodiments, the one or more ordered vectorized comparisons may include one or more single instruction multiple data (“SIMD”) instructions supported by the processor.
    Type: Application
    Filed: July 10, 2012
    Publication date: January 16, 2014
    Inventor: Shihjong J. Kuo
  • Publication number: 20130332703
    Abstract: A method of sharing a plurality of registers in a shared register pool among a plurality of microprocessor threads begins with a determination that a first instruction to be executed by a microprocessor in a first microprocessor thread requires a first logical register. Next a determination is made that a second instruction to be executed by the microprocessor in a second microprocessor thread requires a second logical register. A first physical register in the shared register pool is allocated to the first microprocessor thread for execution of the first instruction and the first logical register is mapped to the first physical register. A second physical register in the shared register pool is allocated to the second microprocessor thread for execution of the second instruction. Finally, the second logical register is mapped to the second physical register.
    Type: Application
    Filed: June 8, 2012
    Publication date: December 12, 2013
    Applicant: MIPS Technologies, Inc.
    Inventor: Ilie GARBACEA
  • Patent number: 8578387
    Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.
    Type: Grant
    Filed: July 31, 2007
    Date of Patent: November 5, 2013
    Assignee: Nvidia Corporation
    Inventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
  • Patent number: 8572586
    Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    Type: Grant
    Filed: July 23, 2012
    Date of Patent: October 29, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
  • Patent number: 8555033
    Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.
    Type: Grant
    Filed: July 20, 2011
    Date of Patent: October 8, 2013
    Assignee: International Business Machines Corporation
    Inventors: Daniel C. Berg, Charles D. Bridgham, Derek F. Holt, Ritchard L. Schacher, Jason A. Sholl
  • Patent number: 8554856
    Abstract: A computer system includes one or more devices that are capable of multitasking (performing at least two tasks in parallel or substantially in parallel). In response to detecting that one of the devices is performing a first one of the tasks, the system prevents the devices from performing at least one of the tasks other than the first task (such as all of the tasks other than the first task). In response to detecting that one of the devices is performing a second one of the tasks, the system prevents the devices from performing at least one of the tasks other than the second task (such as all of the tasks other than the second task).
    Type: Grant
    Filed: November 8, 2011
    Date of Patent: October 8, 2013
    Assignee: Yagi Corp.
    Inventor: Robert Plotkin
  • Publication number: 20130262820
    Abstract: Systems and methods for event logging in a just-in-time static translation system are disclosed. One method includes executing a workload in a computing system having a native instruction set architecture, the workload stored in one or more banks of non-native instructions. At least a portion of the workload is further included in one or more banks of native instructions and executing the workload comprises executing at least part of the workload from the one or more banks of native instructions. The method also includes determining an amount of time during execution of the workload in which the execution of the workload occurs from the one or more banks of native instructions. The method includes generating a log including performance statistics generated during execution of the workload, the performance statistics including the amount of time.
    Type: Application
    Filed: March 28, 2012
    Publication date: October 3, 2013
    Inventors: Michael J. Rieschl, Thomas L. Nowatzki, James F. Merten, Nathan Zimmer
  • Publication number: 20130227250
    Abstract: Some example embodiments include an apparatus for comparing a first operand to a second operand. The apparatus includes a SIMD accelerator configured to compare first multiple parts (e.g., bytes) of first operand to second multiple parts (e.g., bytes) of the second operand. The SIMD accelerator includes a ones' complement subtraction logic and a twos' complement logic configured to perform logic operations on the multiple parts of the first operand and the multiple parts of the second operand to generate a group of carry out and propagate data across bits of the multiple parts. At least a portion of the group of carry out and propagate data is reused in the group of logic operations.
    Type: Application
    Filed: February 24, 2012
    Publication date: August 29, 2013
    Applicant: International Business Machines Corporation
    Inventors: Wilhelm Haller, Ulrich Krauch, Kurt Lind, Friedrich Schroeder, Alexander Woerner
  • Publication number: 20130219150
    Abstract: A method for implementing a hardware design that includes using a computer for receiving structured data that includes a representation of a basic hardware structure and a complex hardware structure that includes the basic hardware structure, parsing the structured data and generating, based on a result of the parsing, commands of a hardware design environment.
    Type: Application
    Filed: February 19, 2013
    Publication date: August 22, 2013
    Applicant: International Business Machines Corporation
    Inventor: International Business Machines Corporation
  • Patent number: 8516496
    Abstract: An electronic device comprising decode logic that decodes instructions and a stack coupled to the decode logic. A group of instructions causes the decode logic to push onto the stack, after halting processing of a first thread at a switch point and prior to processing a second thread, a minimum amount of information needed to resume execution of the first thread at the switch point and not information not needed to resume execution of the first thread at the switch point.
    Type: Grant
    Filed: July 21, 2005
    Date of Patent: August 20, 2013
    Assignee: Texas Instruments Incorporated
    Inventors: Gilbert Cabillic, Gerard Chauvel
  • Patent number: 8504801
    Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface tote used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.
    Type: Grant
    Filed: October 28, 2012
    Date of Patent: August 6, 2013
    Assignee: Renesas Electronics Corporation
    Inventors: Goro Sakamaki, Yuri Azuma
  • Publication number: 20130198489
    Abstract: Stream applications may inefficiently use the hardware resources that execute the processing elements of the data stream. For example, a compute node may host four processing elements and execute each using a CPU. However, other CPUs on the compute node may sit idle. To take advantage of these available hardware resources, a stream programmer may identify one or more processing elements that may be cloned. The cloned processing elements may be used to generate a different execution path that is parallel to the execution path that includes the original processing elements. Because the cloned processing elements contain the same operators as the original processing elements, the data stream that was previously flowing through only the original processing element may be split and sent through both the original and cloned processing elements. In this manner, the parallel execution path may use underutilized hardware resources to increase the throughput of the data stream.
    Type: Application
    Filed: December 10, 2012
    Publication date: August 1, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130191614
    Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: March 12, 2013
    Publication date: July 25, 2013
    Inventors: STEVEN KING, FRANK BERRY, MICHAEL KOUNAVIS
  • Patent number: 8484441
    Abstract: A computer processor with control and data processing capabilities comprises a decode unit for decoding instructions. A data processing facility comprises a first data execution path including fixed operators and a second data execution path including at least configurable operators, the configurable operators having a plurality of predefined configurations, at least some of which are selectable by means of an opcode portion of a data processing instruction. The decode unit is operable to detect whether a data processing instruction defines a fixed data processing operation or a configurable data processing operation, said decode unit causing the computer system to supply data for processing to said first data execution path when a fixed data processing instruction is detected and to said configurable data execution path when a configurable data processing instruction is detected.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: July 9, 2013
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20130159667
    Abstract: A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Applicant: MIPS TECHNOLOGIES, INC.
    Inventor: Ilie Garbacea
  • Publication number: 20130159668
    Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.
    Type: Application
    Filed: December 20, 2011
    Publication date: June 20, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Publication number: 20130159665
    Abstract: A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: Verisilicon Holdings Co., Ltd.
    Inventor: Asheesh Kashyap
  • Publication number: 20130145121
    Abstract: A stream application may allocate processing elements to one or more compute nodes (or hosts) to achieve a desired optimization goal. Each optimization mode may define processing element selection criteria and/or host selection criteria. When allocating a processing element to a host, a scheduler may place each processing element individually. Accordingly, the scheduler may use the processing element selection criteria for selecting which processing element in the stream application to allocate next. The scheduler may then determine, based on one or more constraints, which host the processing element can be placed on. If the scheduler determines that multiple hosts are suitable candidates for the processing element, it may use the host selection criteria to pick one of the candidate hosts that further optimize the stream application to meet the desired goal.
    Type: Application
    Filed: December 11, 2012
    Publication date: June 6, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Publication number: 20130138922
    Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 30, 2013
    Applicant: International Business Machines Corporation
    Inventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
  • Patent number: 8452944
    Abstract: An information processing apparatus includes: a first pipeline having first nodes, and moving data held in each first node to a first node located in a first direction; a second pipeline having second nodes corresponding to the first nodes, and moving data held in each second node to a second node located in a second direction that is opposite to the first direction; a first comparison unit arranged to compare data held in a node of interest with data held in a second node corresponding to the node of interest, where the node of interest is one of the first nodes; and a second comparison unit arranged to compare the data held in the node of interest with data held in a second node located one node on an upstream or downstream side of the second node corresponding to the node of interest.
    Type: Grant
    Filed: April 27, 2010
    Date of Patent: May 28, 2013
    Assignee: Canon Kabushiki Kaisha
    Inventor: Tadayuki Ito
  • Patent number: 8423605
    Abstract: Provided is a parallel distributed processing method executed by a computer system comprising a parallel-distributed-processing control server, a plurality of extraction processing servers and a plurality of aggregation processing servers. The managed data includes at least a first and a second data items, the plurality of data items each including a value. The method includes a step of extracting data from one of the plurality of chunks according to a value in the second data item, to thereby group the data, a step of merging groups having the same value in the second data item based on an order of a value in the first data item of data contained in a group among groups, and a step of processing data in a group obtained through the merging by focusing on the order of the value in the first data item.
    Type: Grant
    Filed: March 17, 2010
    Date of Patent: April 16, 2013
    Assignee: Hitachi, Ltd.
    Inventor: Ryo Kawai
  • Publication number: 20130091339
    Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.
    Type: Application
    Filed: October 5, 2011
    Publication date: April 11, 2013
    Applicant: ST-Ericsson SA
    Inventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
  • Patent number: 8417961
    Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: April 9, 2013
    Assignee: Oracle International Corporation
    Inventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
  • Publication number: 20130067199
    Abstract: A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR.
    Type: Application
    Filed: March 6, 2012
    Publication date: March 14, 2013
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Patent number: 8397233
    Abstract: A device includes an input processing unit and an output processing unit. The input processing unit dispatches first data to one of a group of processing engines, records an identity of the one processing engine in a location in a first memory, reserves one or more corresponding locations in a second memory, causes the first data to be processed by the one processing engine, and stores the processed first data in one of the locations in the second memory. The output processing unit receives second data, assigns an entry address corresponding to a location in an output memory to the second data, transfers the second data and the entry address to one of a group of second processing engines, causes the second data to be processed by the second processing engine, and stores the processed second data to the location in the output memory.
    Type: Grant
    Filed: May 23, 2007
    Date of Patent: March 12, 2013
    Assignee: Juniper Networks, Inc.
    Inventors: Raymond Marcelino Manese Lim, Stefan Dyckerhoff, Jeffrey Glenn Libby, Teshager Tesfaye
  • Publication number: 20130013894
    Abstract: A RISC data processor in which the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. An instruction for generating flags according to operands' data sizes is defined, and an instruction set handled by the RISC data processor includes an instruction capable of executing an operation on operands in more than one data size. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved.
    Type: Application
    Filed: September 6, 2012
    Publication date: January 10, 2013
    Inventor: Fumio ARAKAWA
  • Patent number: 8347066
    Abstract: Replay instruction morphing. One disclosed apparatus includes an execution unit to execute an instruction. A replay system replays an altered instruction if the execution unit executes the instruction erroneously.
    Type: Grant
    Filed: February 28, 2005
    Date of Patent: January 1, 2013
    Assignee: Intel Corporation
    Inventors: Douglas M. Carmean, David J. Sager, Thomas F. Toll, Karol F. Menezes
  • Patent number: 8341635
    Abstract: A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism performs a look-ahead polling operation for each of the programming idioms. If each of the look-ahead polling operations fails, then the wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom.
    Type: Grant
    Filed: February 1, 2008
    Date of Patent: December 25, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
  • Patent number: 8341436
    Abstract: Power-state transitioning arrangements are implemented using a variety of methods. Using one such method, a power-state transitioning circuit arrangement is implemented having a processing circuit that does not include an arithmetic logic unit. A power-state transition script including instructions from an instruction set is stored in a memory circuit. The processing circuit implements the power-state transition script to facilitate a change in a power-state of another processor circuit.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: December 25, 2012
    Assignee: ST-Ericsson SA
    Inventor: Greg Ehmann
  • Publication number: 20120311303
    Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
    Type: Application
    Filed: August 13, 2012
    Publication date: December 6, 2012
    Applicant: MicroUnity Systems Engineering, Inc.
    Inventors: Craig Hansen, John Moussouris, Alexia Massalin
  • Publication number: 20120311302
    Abstract: A parallel operation processing apparatus and method using a Single Instruction Multiple Data (SIMD) processor are provided. The parallel operation processing apparatus may combine input data of source nodes in a current column with input data of source nodes in a previous column, and may store the combined input data.
    Type: Application
    Filed: February 2, 2012
    Publication date: December 6, 2012
    Inventors: Ho Yang, Hyun Seok Lee
  • Patent number: 8316217
    Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface to be used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: November 20, 2012
    Assignee: Renesas Electronics Corporation
    Inventors: Goro Sakamaki, Yuri Azuma
  • Publication number: 20120290816
    Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.
    Type: Application
    Filed: July 23, 2012
    Publication date: November 15, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
  • Publication number: 20120260068
    Abstract: An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline.
    Type: Application
    Filed: March 9, 2012
    Publication date: October 11, 2012
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Publication number: 20120260067
    Abstract: A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.
    Type: Application
    Filed: September 1, 2011
    Publication date: October 11, 2012
    Applicant: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Publication number: 20120210099
    Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.
    Type: Application
    Filed: April 26, 2012
    Publication date: August 16, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Patent number: 8239847
    Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: August 7, 2012
    Assignee: Microsoft Corporation
    Inventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
  • Publication number: 20120198208
    Abstract: Various embodiments may be disclosed that may share a ROM pull down logic circuit among multiple ports of a processing core. The processing core may include an execution unit (EU) having an array of read only memory (ROM) pull down logic storing math functions. The ROM pull down logic circuit may implement single instruction, multiple data (SIMD) operations. The ROM pull down logic circuit may be operatively coupled with each of the multiple ports in a multi-port function sharing arrangement. Sharing the ROM pull down logic circuit reduces the need to duplicate logic and may result in a savings of chip area as well as a savings of power.
    Type: Application
    Filed: December 28, 2011
    Publication date: August 2, 2012
    Inventors: Satish K. Damaraju, Subramaniam Maiyuran
  • Patent number: 8230425
    Abstract: Methods and arrangements of assigning tasks to processors are discussed. Embodiments include transformations, code, state machines or other logic to detect an attempt to execute an instruction of a task on a processor not supporting the instruction (non-supporting processor). The method may involve selecting a processor supporting the instruction (supporting physical processor). In many embodiments, the method may include storing data about the attempt to execute the instruction and, based upon the data, making another assignment of the task to a physical processor supporting the instruction. In some embodiments, the method may include representing the instruction set of a virtual processor as the union of the instruction sets of the physical processors comprising the virtual processor and assigning a task to the virtual processor based upon the representing.
    Type: Grant
    Filed: July 30, 2007
    Date of Patent: July 24, 2012
    Assignee: International Business Machines Corporation
    Inventors: Manish Ahuja, Nathan Fontenot, Jacob L. Moilanen, Joel H. Schopp, Michael T. Strosaker
  • Publication number: 20120179895
    Abstract: Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.
    Type: Application
    Filed: January 12, 2011
    Publication date: July 12, 2012
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventors: Ganesh Venkataramanan, Emil Talpes
  • Patent number: 8214623
    Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.
    Type: Grant
    Filed: August 1, 2008
    Date of Patent: July 3, 2012
    Assignee: International Business Machines Corporation
    Inventors: Daniel Christopher Berg, Charles Dyer Bridgham, Derek Francis Holt, Ritchard Leonard Schacher, Jason Ashley Sholl
  • Publication number: 20120166765
    Abstract: While fetching the instructions from a loop in program code, a processor calculates a number of times that a backward-branching instruction at the end of the loop will actually be taken when the fetched instructions are executed. Upon determining that the backward-branching instruction has been predicted taken more than the number of times that the branch instruction will actually be taken, the processor immediately commences a mispredict operation for the branch instruction, which comprises: (1) flushing fetched instructions from the loop that will not be executed from the processor, and (2) commencing fetching instructions from an instruction following the branch instruction.
    Type: Application
    Filed: March 7, 2012
    Publication date: June 28, 2012
    Applicant: APPLE INC.
    Inventor: Jeffry E. Gonion
  • Publication number: 20120144162
    Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.
    Type: Application
    Filed: February 9, 2012
    Publication date: June 7, 2012
    Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
  • Publication number: 20120137108
    Abstract: The present disclosure relates to placing a Boolean Processor on a chip with memory to eliminate memory latency issues in computing systems. An asynchronous implementation of a Boolean Processor Switched Memory can theoretically operate at terahertz speed and vastly improve the rate at which computationally relevant data is fed to a microprocessor or microcontroller. Boolean Processor Enhanced Memories hold the promise of increasing memory throughput by several orders of magnitude and shifting the burden of “catching up” to microprocessors and microcontrollers.
    Type: Application
    Filed: May 24, 2011
    Publication date: May 31, 2012
    Inventor: Kenneth Elmon KOCH, III