Scalar/vector Processor Interface Patents (Class 712/3)
-
Publication number: 20140244967Abstract: Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified, in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.Type: ApplicationFiled: February 26, 2013Publication date: August 28, 2014Applicant: Qualcomm IncorporatedInventors: Ajay A. Ingle, Marc M. Hoffman, Jose Fridman, Lucian Codrescu
-
Publication number: 20140244968Abstract: A system implementing a method for generating code for execution based on a SIMT model with parallel units of threads is provided. The system identifies a loop within a program that includes vector processing. The system generates instructions for a thread that include an instruction to set a predicate based on whether the thread of a parallel unit corresponds to a vector element. The system also generates instructions to perform the vector processing via scalar operations predicated on the predicate. As a result, the system generates instructions to perform the vector processing but to avoid branch divergence within the parallel unit of threads that would be needed to check whether a thread corresponds to a vector element.Type: ApplicationFiled: February 28, 2013Publication date: August 28, 2014Inventor: Cray Inc.
-
Publication number: 20140201450Abstract: There is provided a system and method for optimizing matrix and vector calculations in instruction limited algorithms that perform EOS calculations. The method includes dividing each matrix associated with an EOS stability equation and an EOS phase split equation into a number of tiles, wherein the tile size is heterogeneous or homogenous. Each vector associated with the EOS stability equation and the EOS phase split equation may be divided into a number of strips. The tiles and strips may be stored in main memory, cache, or registers, and the matrix and vector operations associated with successive substitutions and Newton iterations may be performed in parallel using the tiles and strips.Type: ApplicationFiled: July 23, 2012Publication date: July 17, 2014Inventor: Kjetil B. Haugen
-
Publication number: 20140189287Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 8738891Abstract: A method for implementing command acceleration. The method includes receiving a first set of instructions from a first processor, wherein the first set of instructions are formatted in accordance with a microarchitecture of the first processor. The first set of instructions are translated into a second set of instructions, wherein the second set of instructions are formatted in accordance with a microarchitecture of a second processor. The second set instructions are then transmitted to the second processor for execution by the second processor.Type: GrantFiled: November 4, 2005Date of Patent: May 27, 2014Assignee: NVIDIA CorporationInventors: Ashish Karandikar, Shirish Gadre, Amir H. Salek
-
Patent number: 8698817Abstract: A video processor for executing video processing operations. The video processor includes a host interface for implementing communication between the video processor and a host CPU. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A scalar execution unit is coupled to the host interface and the memory interface and is configured to execute scalar video processing operations. A vector execution unit is coupled to the host interface and the memory interface and is configured to execute vector video processing operations.Type: GrantFiled: November 4, 2005Date of Patent: April 15, 2014Assignee: Nvidia CorporationInventors: Shirish Gadre, Ashish Karandikar, Stephen D. Lew, Christopher T. Cheng
-
Patent number: 8687008Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.Type: GrantFiled: November 4, 2005Date of Patent: April 1, 2014Assignee: NVIDIA CorporationInventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
-
Patent number: 8683178Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.Type: GrantFiled: April 20, 2011Date of Patent: March 25, 2014Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8683184Abstract: A method for implementing multi context execution on a video processor having a scalar execution unit and a vector execution unit. The method includes allocating a first task to a vector execution unit and allocating a second task to the vector execution unit. The first task is from a first context in the second task is from a second context. The method further includes interleaving a plurality of work packages comprising the first task and the second task to generate a combined work package stream. The combined work package stream is subsequently executed on the vector execution unit.Type: GrantFiled: November 4, 2005Date of Patent: March 25, 2014Assignee: Nvidia CorporationInventors: Stephen D. Lew, Ashish Karandikar, Shirish Gadre, Franciscus W. Sijstermans
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Publication number: 20140040594Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.Type: ApplicationFiled: October 2, 2013Publication date: February 6, 2014Applicants: Samsung Electronics, IMECInventors: Bruno Bougard, Thomas Schuster
-
Patent number: 8627042Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.Type: GrantFiled: December 30, 2009Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Patent number: 8627043Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.Type: GrantFiled: March 26, 2012Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Publication number: 20140006748Abstract: A reconfigurable vector processor is described that allows the size of its vector units to be changed in order to process vectors of different sizes. The reconfigurable vector processor comprises a plurality of processor units. Each of the processor units comprises a control unit for decoding instructions and generating control signals, a scalar unit for processing instructions on scalar data, and a vector unit for processing instructions on vector data under control of control signals. The reconfigurable vector processor architecture also comprises a vector control selector for selectively providing control signals generated by one processor unit of the plurality of processor units to the vector unit of a different processor unit of the plurality of processor units.Type: ApplicationFiled: January 25, 2011Publication date: January 2, 2014Applicant: COGNIVUE CORPORATIONInventors: Malcolm Stewart, Ali Osman Ors, Daniel Laroche
-
Patent number: 8549256Abstract: Methods and apparatus relating to a tightly coupled scalar and Boolean processor are described. In an embodiment, a Boolean unit may include a result vector subunit. The result vector subunit may be controlled by an instruction flow that is managed by a scalar unit. Other embodiments are also disclosed.Type: GrantFiled: January 15, 2007Date of Patent: October 1, 2013Assignee: Intel CorporationInventor: Charles Narad
-
Patent number: 8510534Abstract: A scalar/vector processor includes a plurality of functional units (252, 260, 262, 264, 266, 268, 270). At least one of the functional units includes a vector section (210) for operating on at least one vector and a scalar section (220) for operating on at least one scalar. The vector section and scalar section of the functional unit co-operate by the scalar section being arranged to provide and/or consume at least one scalar required by and/or supplied by the vector section of the functional unit.Type: GrantFiled: May 22, 2003Date of Patent: August 13, 2013Assignee: ST-Ericsson SAInventors: Cornelis Hermanus Van Berkel, Patrick Peter Elizabeth Meuwissen, Nur Engin
-
Publication number: 20130185539Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.Type: ApplicationFiled: July 13, 2012Publication date: July 18, 2013Applicant: TEXAS INSTRUMENTS INCORPORATEDInventors: Ching-Yu HUNG, Shinri INAMORI, Jagadeesh SANKARAN, Peter CHANG
-
Publication number: 20130185538Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage. The instruction stream includes scalar instructions executable by the scalar processor core and vector instructions executable by the vector coprocessor core. The scalar processor core is configured to pass the vector instructions to the vector coprocessor core. The vector coprocessor core configured to process a plurality of data values in parallel while executing each vector instruction passed by the scalar processor core. The vector coprocessor core includes a plurality of processing paths arranged in parallel to process the data values. Each of the processing paths includes an execution unit. Each of the execution units is configured to communicate a result of processing to each other of the execution units.Type: ApplicationFiled: July 13, 2012Publication date: July 18, 2013Applicant: TEXAS INSTRUMENTS INCORPORATEDInventors: Ching-Yu Hung, Shinri Inamori, Jagadeesh Sankaran, Peter Chang
-
Publication number: 20130173884Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.Type: ApplicationFiled: December 7, 2012Publication date: July 4, 2013Applicants: Samsung Electronics, ImecInventors: Imec, Samsung Electronics
-
Publication number: 20130159665Abstract: A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided.Type: ApplicationFiled: December 15, 2011Publication date: June 20, 2013Applicant: Verisilicon Holdings Co., Ltd.Inventor: Asheesh Kashyap
-
Patent number: 8370817Abstract: A mechanism is provided for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning the slots of SIMD registers. With the mechanism, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.Type: GrantFiled: May 27, 2008Date of Patent: February 5, 2013Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
-
Publication number: 20130024652Abstract: Various methods and systems are provided for processing units that may be scaled. In one embodiment, a processing unit includes a plurality of scalar processing units and a vector processing unit in communication with each of the plurality of scalar processing units. The vector processing unit is configured to coordinate execution of instructions received from the plurality of scalar processing units. In another embodiment, a scalar instruction packet including a pre-fix instruction and a vector instruction packet including a vector instruction is obtained. Execution of the vector instruction may be modified by the pre-fix instruction in a processing unit including a vector processing unit. In another embodiment, a scalar instruction packet including a plurality of partitions is obtained. The location of the partitions is determined based upon a partition indicator included in the scalar instruction packet and a scalar instruction included in a partition is executed by a processing unit.Type: ApplicationFiled: September 20, 2011Publication date: January 24, 2013Applicant: BROADCOM CORPORATIONInventors: Neil Bailey, Eben Upton
-
Patent number: 8296457Abstract: Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.Type: GrantFiled: August 2, 2007Date of Patent: October 23, 2012Assignee: International Business Machines CorporationInventors: Charles J. Archer, Ahmad A. Faraj, Todd A. Inglett, Joseph D. Ratterman
-
Publication number: 20120260061Abstract: A data processing apparatus having processing circuitry, a scalar register bank and a vector register bank, including decoding circuitry arranged to decode a sequence of instructions to generate control signals for the processing circuitry. The decoding circuitry is responsive to a decode modifier instruction within the sequence of instructions to alter decoding of a subsequent scalar instruction in the sequence by mapping at least one scalar operand specified by the subsequent scalar instruction to at least one vector operand in the vector register bank, and, in dependence on the scalar operation specified by the subsequent scalar instruction, determining a vector operation to be performed on at least a subset of the operand elements within the at least the one vector operand. Such an approach enables a wide variety of vector operations to be specified without the need to individually define separate vector instructions for those vector operations.Type: ApplicationFiled: April 4, 2012Publication date: October 11, 2012Applicant: ARM LIMITEDInventor: Alastair David Reid
-
Patent number: 8280826Abstract: In one embodiment, the present invention includes a method for identifying a deformable object of a scene of a computer game that is visible by an artificial intelligence (AI) character of the game, requesting a speculative physics simulation associated with the deformable object to determine a result of an action to the deformable object by the AI character, and selecting an action to be performed by the AI character, where the selection is based at least in part on the speculative physics simulation. Other embodiments are described and claimed.Type: GrantFiled: October 10, 2011Date of Patent: October 2, 2012Assignee: Intel CorporationInventors: David Putzolu, Aaron Kunze, Teresa Morrison
-
Patent number: 8203567Abstract: A graphics processing method and apparatus described herein is capable of converting graphics processing of a window system into a vector-based application program interface (API) format usable in the GPU and performing the converted graphics processing in the GPU. For example, the vector-based API may be based on an OpenVG standard or an EGL standard.Type: GrantFiled: July 3, 2009Date of Patent: June 19, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Dong-kyun Jeong, Soo-chan Lim, Na-min Kim
-
Patent number: 8190830Abstract: A memory controller may execute instructions instead of sending the instructions to a processor for execution. To maintain synchronization between the memory controller and the processor, the memory controller may queue a null instruction in the memory controller for each non-filler instruction sent to the processor and may send a filler instruction to the processor for each non-null instruction to be executed by the memory controller.Type: GrantFiled: March 10, 2006Date of Patent: May 29, 2012Assignee: Intel CorporationInventor: Gurumurthy Rajaram
-
Patent number: 8190854Abstract: A method of processing data is disclosed that includes performing a fetch of a plurality of instructions from a memory unit. The method also includes grouping the plurality of instructions into packets of instructions of different types for parallel execution by a plurality of instruction execution units. The packets of instructions include a first instruction and a second instruction. The method includes using a combined scalar and vector condition code register to execute the first instruction for a compare operation and the second instruction for a conditional operation using the combined scalar and vector condition code register. The method also includes when the compare operation is a scalar compare operation, receiving a scalar compare instruction for the scalar compare operation at an instruction executing unit and storing results of the scalar compare operation in the combined scalar and vector condition code register.Type: GrantFiled: January 20, 2010Date of Patent: May 29, 2012Assignee: QUALCOMM IncorporatedInventors: Lucian Codrescu, Erich J. Plondke, Taylor Simpson
-
Publication number: 20120124332Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.Type: ApplicationFiled: October 24, 2011Publication date: May 17, 2012Applicant: FUJITSU LIMITEDInventors: GE Yi, Yoshimasa Takebe, Hiromasa Takahashi
-
Patent number: 8090928Abstract: In one embodiment of the present invention, a processor includes a scalar computation unit; a vector co-processor coupled to the scalar computation unit; and one or more function-specific engines coupled to the scalar computation unit, where the engines are adapted to minimize data exchange penalties by processing small in-out bit slices.Type: GrantFiled: June 28, 2002Date of Patent: January 3, 2012Assignee: Intellectual Ventures I LLCInventors: Dominik J. Schmidt, Robert Warren Sherburne, Jr.
-
Publication number: 20110307684Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.Type: ApplicationFiled: June 10, 2010Publication date: December 15, 2011Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
-
Patent number: 8078804Abstract: A data cache memory coupled to a processor including processor clusters are adapted to operate simultaneously on scalar and vectorial data by providing data locations in the data cache memory for storing data for processing. The data locations are accessed either in a scalar mode or in a vectorial mode. This is done by explicitly mapping the data locations that are scalar and the data locations that are vectorial.Type: GrantFiled: June 26, 2007Date of Patent: December 13, 2011Assignees: STMicroelectronics S.r.l., STMicroelectronics N.V.Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
-
Patent number: 8069124Abstract: In one embodiment, the present invention includes a method for identifying a deformable object of a scene of a computer game that is visible by an artificial intelligence (AI) character of the game, requesting a speculative physics simulation associated with the deformable object to determine a result of an action to the deformable object by the AI character, and selecting an action to be performed by the AI character, where the selection is based at least in part on the speculative physics simulation. Other embodiments are described and claimed.Type: GrantFiled: March 26, 2008Date of Patent: November 29, 2011Assignee: Intel CorporationInventors: David Putzolu, Aaron Kunze, Teresa Morrison
-
Publication number: 20110161623Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.Type: ApplicationFiled: December 30, 2009Publication date: June 30, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
-
Patent number: 7971197Abstract: A digital computer system automatically creates an Instruction Set Architecture (ISA) that potentially exploits VLIW instructions, vector operations, fused operations, and specialized operations with the goal of increasing the performance of a set of applications while keeping hardware cost below a designer specified limit, or with the goal of minimizing hardware cost given a required level of performance.Type: GrantFiled: August 18, 2005Date of Patent: June 28, 2011Assignee: Tensilica, Inc.Inventors: David William Goodwin, Dror Maydan, Ding-Kai Chen, Darin Stamenov Petkov, Steven Weng-Kiang Tjiang, Peng Tu, Christopher Rowen
-
Publication number: 20110145543Abstract: A processing unit executes a vector width instruction in a program and the processing unit obtains and supplies the width of an appropriate vector register that will be used to process variable vector processing instructions. Then, when the processing unit executes variable vector processing instructions in the program, the processing unit processes the variable vector processing instructions using the appropriate vector register with the instructions having the same width as the appropriate vector register. The width that the processing unit obtains may be less than an actual width of the appropriate vector register and may set by the processing unit. In this way, many different vector widths can be supported using a single set of instructions for vector processing. New instructions are not required if vector widths are changed and processing units having vector registers of differing widths do not require different code.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Applicant: Sun Microsystems, Inc.Inventor: Peter Carl Damron
-
Patent number: 7908460Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: GrantFiled: May 3, 2010Date of Patent: March 15, 2011Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Publication number: 20110055517Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.Type: ApplicationFiled: August 26, 2009Publication date: March 3, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
-
Patent number: 7895411Abstract: One embodiment of the invention sets forth a hardware-based physics processing unit (PPU) having unique architecture designed to efficiently generate physics data. The PPU includes a PPU control engine (PCE), a data movement engine and a floating point engine (FPE). The PCE manages the overall operation of the PPU by allocating memory resources and transmitting graphics processing commands to the FPE and data movement commands to the DME. The FPE includes multiple vector processors that operate in parallel and perform floating point operations on data received from a host unit to generate physics simulation data. The DME facilitates the transmission of data between the host unit and the FPE by performs data movement operations between memories internal and external to the PPU.Type: GrantFiled: November 19, 2003Date of Patent: February 22, 2011Assignee: NVIDIA CorporationInventors: Monier Maher, Otto A. Schmid, Curtis Davis, Manju Hegde, Jean Pierre Bordes
-
Patent number: 7877582Abstract: A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file.Type: GrantFiled: January 31, 2008Date of Patent: January 25, 2011Assignee: International Business Machines CorporationInventors: Michael K. Gschwind, Brett Olsson
-
Publication number: 20110010522Abstract: A multiprocessor computer system includes a plurality of processor nodes coupled by a direct processor interconnect network, and a plurality of processor nodes coupled by an indirect processor interconnect network. A bridge directly couples the direct processor interconnect network and the indirect processor interconnect network.Type: ApplicationFiled: June 11, 2010Publication date: January 13, 2011Applicant: Cray Inc.Inventors: Dennis C. Abts, Peter M. Klausler, James Nowicki
-
Publication number: 20100312988Abstract: A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes.Type: ApplicationFiled: January 19, 2010Publication date: December 9, 2010Applicant: ARM LIMITEDInventors: Andreas BJĂ–RKLUND, Erik Persson, Ola Hugosson
-
Patent number: 7831804Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.Type: GrantFiled: May 30, 2008Date of Patent: November 9, 2010Assignee: ST Microelectronics S.R.L.Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
-
Publication number: 20100235607Abstract: A processor includes a setting register in which a mode is set, a general-purpose register including a preferred slot used during scalar computing and a slot not used during the scalar computing, a selector configured to select and output data of a register designated by a mode set in the setting register during the scalar computing, and a computing unit configured to execute the scalar computing using the preferred slot of the general-purpose register and store computing result data of the scalar computing in the preferred slot of the general-purpose register. The data of the register output from the selector is stored in the slot of the general-purpose register.Type: ApplicationFiled: March 2, 2010Publication date: September 16, 2010Applicant: Kabushiki Kaisha ToshibaInventors: Hiroaki Sugita, Seiji Maeda, Tatsuya Mizutani
-
Publication number: 20100217954Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: ApplicationFiled: May 3, 2010Publication date: August 26, 2010Applicant: Nintendo Co., Ltd.,Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J.. Van Hook
-
Publication number: 20100186006Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.Type: ApplicationFiled: December 17, 2009Publication date: July 22, 2010Applicants: IMEC, Samsung ElectronicsInventors: Bruno Bougard, Thomas Schuster
-
Patent number: 7739479Abstract: A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.Type: GrantFiled: November 19, 2003Date of Patent: June 15, 2010Assignee: NVIDIA CorporationInventors: Jean Pierre Bordes, Curtis Davis, Monier Maher, Manju Hegde, Otto A. Schmid
-
Patent number: 7739480Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: GrantFiled: January 11, 2005Date of Patent: June 15, 2010Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Publication number: 20100118852Abstract: A method of processing data is disclosed that includes performing a fetch of a plurality of instructions from a memory unit. The method also includes grouping the plurality of instructions into packets of instructions of different types for parallel execution by a plurality of instruction execution units. The packets of instructions include a first instruction and a second instruction. The method includes using a combined scalar and vector condition code register to execute the first instruction for a compare operation and the second instruction for a conditional operation using the combined scalar and vector condition code register. The method also includes when the compare operation is a scalar compare operation, receiving a scalar compare instruction for the scalar compare operation at an instruction executing unit and storing results of the scalar compare operation in the combined scalar and vector condition code register.Type: ApplicationFiled: January 20, 2010Publication date: May 13, 2010Applicant: QUALCOMM INCORPORATEDInventors: Lucian Codrescu, Erich J. Plondke, Taylor Simpson
-
Publication number: 20100115228Abstract: A multiprocessor computer system has a plurality of first processors having a first addressable memory space, and a plurality of second processors having a second addressable memory space. The second addressable memory space is of a different size than the first addressable memory space, and the first addressable memory space and second addressable memory space comprise a part of the same common address space.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Applicant: CRAY INC.Inventors: Michael Parker, Timothy J. Johnson, Laurence S. Kaplan, Steven L. Scott, Robert Alverson, Skef Iterum