Architecture Based Instruction Processing Patents (Class 712/200)
-
Patent number: 8675002Abstract: A method for providing two or more processors access to a single command buffer is provided. The method includes receiving instructions in the command buffer from a central processor, at least one of the instructions being designated for a particular one of the two or more processors. The method also includes sending the at least one instruction to only the particular processor.Type: GrantFiled: June 9, 2010Date of Patent: March 18, 2014Assignee: ATI Technologies, ULCInventors: Joseph Andonieh, Arshad Rahman
-
Publication number: 20140068227Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor mask extraction from a general purpose register in response to a single mask extraction from a general purpose register instruction that includes a source general purpose register operand, a destination writemask register operand, an immediate value, and an opcode are described.Type: ApplicationFiled: December 22, 2011Publication date: March 6, 2014Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
-
Publication number: 20140047213Abstract: A system and method for implementing memory overlays for portable pointer variables. The method includes providing a program executable by a heterogeneous processing system comprising a plurality of a processors running a plurality of instruction set architectures (ISAs). The method also includes providing a plurality of processor specific functions associated with a function pointer in the program. The method includes executing the program by a first processor. The method includes dereferencing the function pointer by mapping the function pointer to a corresponding processor specific feature based on which processor in the plurality of processors is executing the program.Type: ApplicationFiled: August 8, 2012Publication date: February 13, 2014Applicant: NVIDIA CORPORATIONInventor: Olivier Giroux
-
Patent number: 8638805Abstract: Described embodiments provide for restructuring a scheduling hierarchy of a network processor having a plurality of processing modules and a shared memory. The scheduling hierarchy schedules packets for transmission. The network processor generates tasks corresponding to each received packet associated with a data flow. A traffic manager receives tasks provided by one of the processing modules and determines a queue of the scheduling hierarchy corresponding to the task. The queue has a parent scheduler at each of one or more next levels of the scheduling hierarchy up to a root scheduler, forming a branch of the hierarchy. The traffic manager determines if the queue and one or more of the parent schedulers of the branch should be restructured. If so, the traffic manager drops subsequently received tasks for the branch, drains all tasks of the branch, and removes the corresponding nodes of the branch from the scheduling hierarchy.Type: GrantFiled: September 30, 2011Date of Patent: January 28, 2014Assignee: LSI CorporationInventors: Balakrishnan Sundararaman, Shashank Nemawarkar, David Sonnier, Shailendra Aulakh, Allen Vestal
-
Patent number: 8635656Abstract: A real-time video transmission system includes a wireless video input device, a wireless data transmission interface and a computer. The computer includes a virtual camera module, a data management module and an application module. The data management module repeatedly accesses a register. The virtual camera module writes a received real-time video request into the register. When the data management module obtains the real-time video request from the register during accessing of the same, the wireless video input device is driven to film and send back a real-time video stream. The data management module writes the received real-time video stream into the register. When the received real-time video stream is obtained from the register during accessing of the same, the virtual camera module transmits the real-time video stream to the application program module.Type: GrantFiled: October 7, 2011Date of Patent: January 21, 2014Assignee: Chicony Electronics Co., Ltd.Inventors: Wei-Cheng Huang, Mei-Yi Tsai, Chien-Yu Chen
-
Publication number: 20140019719Abstract: Methods of bit manipulation within a computer processor are disclosed. Improved flexibility in bit manipulation proves helpful in computing elementary functions critical to the performance of many programs and for other applications. In one embodiment, a unit of input data is shifted/rotated and multiple non-contiguous bit fields from the unit of input data are inserted in an output register. In another embodiment, one of two units of input data is optionally shifted or rotated, the two units of input data are partitioned into a plurality of bit fields, bitwise operations are performed on each bit field, and pairs of bit fields are combined with either an AND or an OR bitwise operation. Embodiments are also disclosed to simultaneously perform these processes on multiple units and pairs of units of input data in a Single Input, Multiple Data processing environment capable of performing logical operations on floating point data.Type: ApplicationFiled: July 11, 2012Publication date: January 16, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Christopher K. Anand, Simon C. Broadhead, Robert F. Enenkel
-
Publication number: 20140019718Abstract: Embodiments of computer-implemented methods, systems, computing devices, and computer-readable media are described herein for vectorized searching for a pattern P within a set of data T, the pattern P having a length m. In various embodiments, the vectorized search may include a shift of a sliding window into T by a distance d that is greater than m on determination, based on one or more ordered vectorized comparisons of portions of P and T, that no potential match of P is found within the sliding window. In various embodiments, d and m may be positive integers. In various embodiments, the one or more ordered vectorized comparisons may include one or more single instruction multiple data (“SIMD”) instructions supported by the processor.Type: ApplicationFiled: July 10, 2012Publication date: January 16, 2014Inventor: Shihjong J. Kuo
-
Publication number: 20130332703Abstract: A method of sharing a plurality of registers in a shared register pool among a plurality of microprocessor threads begins with a determination that a first instruction to be executed by a microprocessor in a first microprocessor thread requires a first logical register. Next a determination is made that a second instruction to be executed by the microprocessor in a second microprocessor thread requires a second logical register. A first physical register in the shared register pool is allocated to the first microprocessor thread for execution of the first instruction and the first logical register is mapped to the first physical register. A second physical register in the shared register pool is allocated to the second microprocessor thread for execution of the second instruction. Finally, the second logical register is mapped to the second physical register.Type: ApplicationFiled: June 8, 2012Publication date: December 12, 2013Applicant: MIPS Technologies, Inc.Inventor: Ilie GARBACEA
-
Patent number: 8578387Abstract: An embodiment of a computing system is configured to process data using a multithreaded SIMD architecture that includes heterogeneous processing engines to execute a program. The program is constructed of various program instructions. A first type of the program instructions can only be executed by a first type of processing engine and a third type of program instructions can only be executed by a second type of processing engine. A second type of program instructions can be executed by the first and the second type of processing engines. An assignment unit may be configured to dynamically determine which of the two processing engines executes any program instructions of the second type in order to balance the workload between the heterogeneous processing engines.Type: GrantFiled: July 31, 2007Date of Patent: November 5, 2013Assignee: Nvidia CorporationInventors: Peter C. Mills, Stuart F. Oberman, John Erik Lindholm, Samuel Liu
-
Patent number: 8572586Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.Type: GrantFiled: July 23, 2012Date of Patent: October 29, 2013Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
-
Patent number: 8555033Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.Type: GrantFiled: July 20, 2011Date of Patent: October 8, 2013Assignee: International Business Machines CorporationInventors: Daniel C. Berg, Charles D. Bridgham, Derek F. Holt, Ritchard L. Schacher, Jason A. Sholl
-
Patent number: 8554856Abstract: A computer system includes one or more devices that are capable of multitasking (performing at least two tasks in parallel or substantially in parallel). In response to detecting that one of the devices is performing a first one of the tasks, the system prevents the devices from performing at least one of the tasks other than the first task (such as all of the tasks other than the first task). In response to detecting that one of the devices is performing a second one of the tasks, the system prevents the devices from performing at least one of the tasks other than the second task (such as all of the tasks other than the second task).Type: GrantFiled: November 8, 2011Date of Patent: October 8, 2013Assignee: Yagi Corp.Inventor: Robert Plotkin
-
Publication number: 20130262820Abstract: Systems and methods for event logging in a just-in-time static translation system are disclosed. One method includes executing a workload in a computing system having a native instruction set architecture, the workload stored in one or more banks of non-native instructions. At least a portion of the workload is further included in one or more banks of native instructions and executing the workload comprises executing at least part of the workload from the one or more banks of native instructions. The method also includes determining an amount of time during execution of the workload in which the execution of the workload occurs from the one or more banks of native instructions. The method includes generating a log including performance statistics generated during execution of the workload, the performance statistics including the amount of time.Type: ApplicationFiled: March 28, 2012Publication date: October 3, 2013Inventors: Michael J. Rieschl, Thomas L. Nowatzki, James F. Merten, Nathan Zimmer
-
Publication number: 20130227250Abstract: Some example embodiments include an apparatus for comparing a first operand to a second operand. The apparatus includes a SIMD accelerator configured to compare first multiple parts (e.g., bytes) of first operand to second multiple parts (e.g., bytes) of the second operand. The SIMD accelerator includes a ones' complement subtraction logic and a twos' complement logic configured to perform logic operations on the multiple parts of the first operand and the multiple parts of the second operand to generate a group of carry out and propagate data across bits of the multiple parts. At least a portion of the group of carry out and propagate data is reused in the group of logic operations.Type: ApplicationFiled: February 24, 2012Publication date: August 29, 2013Applicant: International Business Machines CorporationInventors: Wilhelm Haller, Ulrich Krauch, Kurt Lind, Friedrich Schroeder, Alexander Woerner
-
Publication number: 20130219150Abstract: A method for implementing a hardware design that includes using a computer for receiving structured data that includes a representation of a basic hardware structure and a complex hardware structure that includes the basic hardware structure, parsing the structured data and generating, based on a result of the parsing, commands of a hardware design environment.Type: ApplicationFiled: February 19, 2013Publication date: August 22, 2013Applicant: International Business Machines CorporationInventor: International Business Machines Corporation
-
Patent number: 8516496Abstract: An electronic device comprising decode logic that decodes instructions and a stack coupled to the decode logic. A group of instructions causes the decode logic to push onto the stack, after halting processing of a first thread at a switch point and prior to processing a second thread, a minimum amount of information needed to resume execution of the first thread at the switch point and not information not needed to resume execution of the first thread at the switch point.Type: GrantFiled: July 21, 2005Date of Patent: August 20, 2013Assignee: Texas Instruments IncorporatedInventors: Gilbert Cabillic, Gerard Chauvel
-
Patent number: 8504801Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface tote used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.Type: GrantFiled: October 28, 2012Date of Patent: August 6, 2013Assignee: Renesas Electronics CorporationInventors: Goro Sakamaki, Yuri Azuma
-
Publication number: 20130198489Abstract: Stream applications may inefficiently use the hardware resources that execute the processing elements of the data stream. For example, a compute node may host four processing elements and execute each using a CPU. However, other CPUs on the compute node may sit idle. To take advantage of these available hardware resources, a stream programmer may identify one or more processing elements that may be cloned. The cloned processing elements may be used to generate a different execution path that is parallel to the execution path that includes the original processing elements. Because the cloned processing elements contain the same operators as the original processing elements, the data stream that was previously flowing through only the original processing element may be split and sent through both the original and cloned processing elements. In this manner, the parallel execution path may use underutilized hardware resources to increase the throughput of the data stream.Type: ApplicationFiled: December 10, 2012Publication date: August 1, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Publication number: 20130191614Abstract: In one embodiment, the present invention includes a method for receiving incoming data in a processor and performing a checksum operation on the incoming data in the processor pursuant to a user-level instruction for the checksum operation. For example, a cyclic redundancy checksum may be computed in the processor itself responsive to the user-level instruction. Other embodiments are described and claimed.Type: ApplicationFiled: March 12, 2013Publication date: July 25, 2013Inventors: STEVEN KING, FRANK BERRY, MICHAEL KOUNAVIS
-
Patent number: 8484441Abstract: A computer processor with control and data processing capabilities comprises a decode unit for decoding instructions. A data processing facility comprises a first data execution path including fixed operators and a second data execution path including at least configurable operators, the configurable operators having a plurality of predefined configurations, at least some of which are selectable by means of an opcode portion of a data processing instruction. The decode unit is operable to detect whether a data processing instruction defines a fixed data processing operation or a configurable data processing operation, said decode unit causing the computer system to supply data for processing to said first data execution path when a fixed data processing instruction is detected and to said configurable data execution path when a configurable data processing instruction is detected.Type: GrantFiled: March 31, 2004Date of Patent: July 9, 2013Assignee: Icera Inc.Inventor: Simon Knowles
-
Publication number: 20130159667Abstract: A computer has a memory adapted to store a first plurality of instructions encoded with a first vector size and a second plurality of instructions encoded with a second vector size. An execution unit executes the first plurality of instructions and the second plurality of instructions by processing vector units in a uniform manner regardless of vector size.Type: ApplicationFiled: December 16, 2011Publication date: June 20, 2013Applicant: MIPS TECHNOLOGIES, INC.Inventor: Ilie Garbacea
-
Publication number: 20130159668Abstract: A circuit arrangement, method, and program product for substituting a plurality of scalar instructions in an instruction stream with a functionally equivalent vector instruction for execution by a vector execution unit. Predecode logic is coupled to an instruction buffer which stores instructions in an instruction stream to be executed by the vector execution unit. The predecode logic analyzes the instructions passing through the instruction buffer to identify a plurality of scalar instructions that may be replaced by a vector instruction in the instruction stream. The predecode logic may generate the functionally equivalent vector instruction based on the plurality of scalar instructions, and the functionally equivalent vector instruction may be substituted into the instruction stream, such that the vector execution unit executes the vector instruction in lieu of the plurality of scalar instructions.Type: ApplicationFiled: December 20, 2011Publication date: June 20, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
-
Publication number: 20130159665Abstract: A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: Verisilicon Holdings Co., Ltd.Inventor: Asheesh Kashyap
-
Publication number: 20130145121Abstract: A stream application may allocate processing elements to one or more compute nodes (or hosts) to achieve a desired optimization goal. Each optimization mode may define processing element selection criteria and/or host selection criteria. When allocating a processing element to a host, a scheduler may place each processing element individually. Accordingly, the scheduler may use the processing element selection criteria for selecting which processing element in the stream application to allocate next. The scheduler may then determine, based on one or more constraints, which host the processing element can be placed on. If the scheduler determines that multiple hosts are suitable candidates for the processing element, it may use the host selection criteria to pick one of the candidate hosts that further optimize the stream application to meet the desired goal.Type: ApplicationFiled: December 11, 2012Publication date: June 6, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
-
Publication number: 20130138922Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.Type: ApplicationFiled: November 29, 2011Publication date: May 30, 2013Applicant: International Business Machines CorporationInventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
-
Patent number: 8452944Abstract: An information processing apparatus includes: a first pipeline having first nodes, and moving data held in each first node to a first node located in a first direction; a second pipeline having second nodes corresponding to the first nodes, and moving data held in each second node to a second node located in a second direction that is opposite to the first direction; a first comparison unit arranged to compare data held in a node of interest with data held in a second node corresponding to the node of interest, where the node of interest is one of the first nodes; and a second comparison unit arranged to compare the data held in the node of interest with data held in a second node located one node on an upstream or downstream side of the second node corresponding to the node of interest.Type: GrantFiled: April 27, 2010Date of Patent: May 28, 2013Assignee: Canon Kabushiki KaishaInventor: Tadayuki Ito
-
Patent number: 8423605Abstract: Provided is a parallel distributed processing method executed by a computer system comprising a parallel-distributed-processing control server, a plurality of extraction processing servers and a plurality of aggregation processing servers. The managed data includes at least a first and a second data items, the plurality of data items each including a value. The method includes a step of extracting data from one of the plurality of chunks according to a value in the second data item, to thereby group the data, a step of merging groups having the same value in the second data item based on an order of a value in the first data item of data contained in a group among groups, and a step of processing data in a group obtained through the merging by focusing on the order of the value in the first data item.Type: GrantFiled: March 17, 2010Date of Patent: April 16, 2013Assignee: Hitachi, Ltd.Inventor: Ryo Kawai
-
Publication number: 20130091339Abstract: An apparatus and method for creation of reordered vectors from sequential input data for block based decimation, filtering, interpolation and matrix transposition using a memory circuit for a Single Instruction, Multiple Data (SIMD) Digital Signal Processor (DSP). This memory circuit includes a two-dimensional storage array, a rotate-and-distribute unit, a read-controller and a write to controller, to map input vectors containing sequential data elements in columns of the two-dimensional array and extract reordered target vectors from this array. The data elements and memory configuration are received from the SIMD DSP.Type: ApplicationFiled: October 5, 2011Publication date: April 11, 2013Applicant: ST-Ericsson SAInventors: David Van Kampen, Kees Van Berkel, Sven Goossens, Wim Kloosterhuis, Claudiu Zissulescu-Ianculescu
-
Patent number: 8417961Abstract: Techniques relating to a processor including instruction support for implementing a cyclic redundancy check (CRC) operation. The processor may issue, for execution, programmer-selectable instructions from a defined instruction set architecture (ISA). The processor may include a cryptographic unit configured to receive instructions that include a first instance of a cyclic redundancy check (CRC) instruction defined within the ISA, where the first instance of the CRC instruction is executable by the cryptographic unit to perform a first CRC operation on a set of data that produces a checksum value. In one embodiment, the cryptographic unit is configured to generate the checksum value using a generator polynomial of 0x11EDC6F41.Type: GrantFiled: March 16, 2010Date of Patent: April 9, 2013Assignee: Oracle International CorporationInventors: Christopher H. Olson, Gregory F. Grohoski, Lawrence A. Spracklen
-
Publication number: 20130067199Abstract: A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR.Type: ApplicationFiled: March 6, 2012Publication date: March 14, 2013Applicant: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Patent number: 8397233Abstract: A device includes an input processing unit and an output processing unit. The input processing unit dispatches first data to one of a group of processing engines, records an identity of the one processing engine in a location in a first memory, reserves one or more corresponding locations in a second memory, causes the first data to be processed by the one processing engine, and stores the processed first data in one of the locations in the second memory. The output processing unit receives second data, assigns an entry address corresponding to a location in an output memory to the second data, transfers the second data and the entry address to one of a group of second processing engines, causes the second data to be processed by the second processing engine, and stores the processed second data to the location in the output memory.Type: GrantFiled: May 23, 2007Date of Patent: March 12, 2013Assignee: Juniper Networks, Inc.Inventors: Raymond Marcelino Manese Lim, Stefan Dyckerhoff, Jeffrey Glenn Libby, Teshager Tesfaye
-
Publication number: 20130013894Abstract: A RISC data processor in which the number of flags generated by each instruction is increased so that a decrease of flag-generating instructions exceeds an increase of flag-using instructions in quantity, thereby achieving the decrease in instructions. An instruction for generating flags according to operands' data sizes is defined, and an instruction set handled by the RISC data processor includes an instruction capable of executing an operation on operands in more than one data size. An identical operation process is conducted on the small-size operand and on low-order bits of the large-size operand, and flags are generated capable of coping with the respective data sizes regardless of the data size of each operand subjected to the operation. Thus, a reduction in instruction code space of the RISC data processor can be achieved.Type: ApplicationFiled: September 6, 2012Publication date: January 10, 2013Inventor: Fumio ARAKAWA
-
Patent number: 8347066Abstract: Replay instruction morphing. One disclosed apparatus includes an execution unit to execute an instruction. A replay system replays an altered instruction if the execution unit executes the instruction erroneously.Type: GrantFiled: February 28, 2005Date of Patent: January 1, 2013Assignee: Intel CorporationInventors: Douglas M. Carmean, David J. Sager, Thomas F. Toll, Karol F. Menezes
-
Patent number: 8341635Abstract: A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism performs a look-ahead polling operation for each of the programming idioms. If each of the look-ahead polling operations fails, then the wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom.Type: GrantFiled: February 1, 2008Date of Patent: December 25, 2012Assignee: International Business Machines CorporationInventors: Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
-
Patent number: 8341436Abstract: Power-state transitioning arrangements are implemented using a variety of methods. Using one such method, a power-state transitioning circuit arrangement is implemented having a processing circuit that does not include an arithmetic logic unit. A power-state transition script including instructions from an instruction set is stored in a memory circuit. The processing circuit implements the power-state transition script to facilitate a change in a power-state of another processor circuit.Type: GrantFiled: October 24, 2008Date of Patent: December 25, 2012Assignee: ST-Ericsson SAInventor: Greg Ehmann
-
Publication number: 20120311303Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.Type: ApplicationFiled: August 13, 2012Publication date: December 6, 2012Applicant: MicroUnity Systems Engineering, Inc.Inventors: Craig Hansen, John Moussouris, Alexia Massalin
-
Publication number: 20120311302Abstract: A parallel operation processing apparatus and method using a Single Instruction Multiple Data (SIMD) processor are provided. The parallel operation processing apparatus may combine input data of source nodes in a current column with input data of source nodes in a previous column, and may store the combined input data.Type: ApplicationFiled: February 2, 2012Publication date: December 6, 2012Inventors: Ho Yang, Hyun Seok Lee
-
Patent number: 8316217Abstract: A semiconductor device correctly switches endian modes regardless of the current endian mode of an interface. The semiconductor device includes a switching circuit and a first register. The switching circuit switches an interface to be used in big endian or little endian mode. The first register holds control data of the switching circuit. The switching circuit sets the interface in little endian mode when first predetermined control information is supplied to the first register, and sets the interface in big endian mode when second predetermined control information is supplied to the first register. The control information can be correctly inputted without being influenced by the endian setting status.Type: GrantFiled: December 16, 2011Date of Patent: November 20, 2012Assignee: Renesas Electronics CorporationInventors: Goro Sakamaki, Yuri Azuma
-
Publication number: 20120290816Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.Type: ApplicationFiled: July 23, 2012Publication date: November 15, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
-
Publication number: 20120260068Abstract: An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline.Type: ApplicationFiled: March 9, 2012Publication date: October 11, 2012Applicant: VIA TECHNOLOGIES, INC.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Publication number: 20120260067Abstract: A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator.Type: ApplicationFiled: September 1, 2011Publication date: October 11, 2012Applicant: VIA Technologies, Inc.Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
-
Publication number: 20120210099Abstract: During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector.Type: ApplicationFiled: April 26, 2012Publication date: August 16, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Patent number: 8239847Abstract: General-purpose distributed data-parallel computing using high-level computing languages is described. Data parallel portions of a sequential program written in a high-level language are automatically translated into a distributed execution plan. Map and reduction computations are automatically added to the plan. Patterns in the sequential program can be automatically identified to trigger map and reduction processing. Direct invocation of map and reduction processing is also provided. One or more portions of the reduce computation are pushed to the map stage and dynamic aggregation is inserted when possible. The system automatically identifies opportunities for partial reductions and aggregation, but also provides a set of extensions in a high-level computing language for the generation and optimization of the distributed execution plan. The extensions include annotations to declare functions suitable for these optimizations.Type: GrantFiled: March 18, 2009Date of Patent: August 7, 2012Assignee: Microsoft CorporationInventors: Yuan Yu, Pradeep Kumar Gunda, Michael A Isard
-
Publication number: 20120198208Abstract: Various embodiments may be disclosed that may share a ROM pull down logic circuit among multiple ports of a processing core. The processing core may include an execution unit (EU) having an array of read only memory (ROM) pull down logic storing math functions. The ROM pull down logic circuit may implement single instruction, multiple data (SIMD) operations. The ROM pull down logic circuit may be operatively coupled with each of the multiple ports in a multi-port function sharing arrangement. Sharing the ROM pull down logic circuit reduces the need to duplicate logic and may result in a savings of chip area as well as a savings of power.Type: ApplicationFiled: December 28, 2011Publication date: August 2, 2012Inventors: Satish K. Damaraju, Subramaniam Maiyuran
-
Patent number: 8230425Abstract: Methods and arrangements of assigning tasks to processors are discussed. Embodiments include transformations, code, state machines or other logic to detect an attempt to execute an instruction of a task on a processor not supporting the instruction (non-supporting processor). The method may involve selecting a processor supporting the instruction (supporting physical processor). In many embodiments, the method may include storing data about the attempt to execute the instruction and, based upon the data, making another assignment of the task to a physical processor supporting the instruction. In some embodiments, the method may include representing the instruction set of a virtual processor as the union of the instruction sets of the physical processors comprising the virtual processor and assigning a task to the virtual processor based upon the representing.Type: GrantFiled: July 30, 2007Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventors: Manish Ahuja, Nathan Fontenot, Jacob L. Moilanen, Joel H. Schopp, Michael T. Strosaker
-
Publication number: 20120179895Abstract: Method and apparatus for fast decoding of microinstructions are disclosed. An integrated circuit is disclosed wherein microinstructions are queued for execution in an execution unit having multiple pipelines where each pipeline is configured to execute a set of supported microinstructions. The execution unit receives microinstruction data including an operation code (opcode) or a complex opcode. The execution unit executes the microinstruction multiple times wherein the microinstruction is executed at least once to get an address value and at least once to get a result of an operation. The execution unit processes complex opcodes by utilizing both a load/store support and a simple opcode support by splitting the complex opcode into load/store and simple opcode components and creating an internal source/destination between the two components.Type: ApplicationFiled: January 12, 2011Publication date: July 12, 2012Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Ganesh Venkataramanan, Emil Talpes
-
Patent number: 8214623Abstract: A method, an apparatus, and computer instructions are provided for extending operations of an application in a data processing system. A primary operation is executed. All extended operations of the primary operation are cached and pre and post operation identifiers are identified. For each pre operation identifier, a pre operation instance is created and executed. For each post operation identifier, a post operation instance is created and executed.Type: GrantFiled: August 1, 2008Date of Patent: July 3, 2012Assignee: International Business Machines CorporationInventors: Daniel Christopher Berg, Charles Dyer Bridgham, Derek Francis Holt, Ritchard Leonard Schacher, Jason Ashley Sholl
-
Publication number: 20120166765Abstract: While fetching the instructions from a loop in program code, a processor calculates a number of times that a backward-branching instruction at the end of the loop will actually be taken when the fetched instructions are executed. Upon determining that the backward-branching instruction has been predicted taken more than the number of times that the branch instruction will actually be taken, the processor immediately commences a mispredict operation for the branch instruction, which comprises: (1) flushing fetched instructions from the loop that will not be executed from the processor, and (2) commencing fetching instructions from an instruction following the branch instruction.Type: ApplicationFiled: March 7, 2012Publication date: June 28, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120144162Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.Type: ApplicationFiled: February 9, 2012Publication date: June 7, 2012Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
-
Publication number: 20120137108Abstract: The present disclosure relates to placing a Boolean Processor on a chip with memory to eliminate memory latency issues in computing systems. An asynchronous implementation of a Boolean Processor Switched Memory can theoretically operate at terahertz speed and vastly improve the rate at which computationally relevant data is fed to a microprocessor or microcontroller. Boolean Processor Enhanced Memories hold the promise of increasing memory throughput by several orders of magnitude and shifting the burden of “catching up” to microprocessors and microcontrollers.Type: ApplicationFiled: May 24, 2011Publication date: May 31, 2012Inventor: Kenneth Elmon KOCH, III