Scalar/vector Processor Interface Patents (Class 712/3)
  • Publication number: 20140244967
    Abstract: Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified, in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction.
    Type: Application
    Filed: February 26, 2013
    Publication date: August 28, 2014
    Applicant: Qualcomm Incorporated
    Inventors: Ajay A. Ingle, Marc M. Hoffman, Jose Fridman, Lucian Codrescu
  • Publication number: 20140244968
    Abstract: A system implementing a method for generating code for execution based on a SIMT model with parallel units of threads is provided. The system identifies a loop within a program that includes vector processing. The system generates instructions for a thread that include an instruction to set a predicate based on whether the thread of a parallel unit corresponds to a vector element. The system also generates instructions to perform the vector processing via scalar operations predicated on the predicate. As a result, the system generates instructions to perform the vector processing but to avoid branch divergence within the parallel unit of threads that would be needed to check whether a thread corresponds to a vector element.
    Type: Application
    Filed: February 28, 2013
    Publication date: August 28, 2014
    Inventor: Cray Inc.
  • Publication number: 20140201450
    Abstract: There is provided a system and method for optimizing matrix and vector calculations in instruction limited algorithms that perform EOS calculations. The method includes dividing each matrix associated with an EOS stability equation and an EOS phase split equation into a number of tiles, wherein the tile size is heterogeneous or homogenous. Each vector associated with the EOS stability equation and the EOS phase split equation may be divided into a number of strips. The tiles and strips may be stored in main memory, cache, or registers, and the matrix and vector operations associated with successive substitutions and Newton iterations may be performed in parallel using the tiles and strips.
    Type: Application
    Filed: July 23, 2012
    Publication date: July 17, 2014
    Inventor: Kjetil B. Haugen
  • Publication number: 20140189287
    Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 8738891
    Abstract: A method for implementing command acceleration. The method includes receiving a first set of instructions from a first processor, wherein the first set of instructions are formatted in accordance with a microarchitecture of the first processor. The first set of instructions are translated into a second set of instructions, wherein the second set of instructions are formatted in accordance with a microarchitecture of a second processor. The second set instructions are then transmitted to the second processor for execution by the second processor.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: May 27, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ashish Karandikar, Shirish Gadre, Amir H. Salek
  • Patent number: 8698817
    Abstract: A video processor for executing video processing operations. The video processor includes a host interface for implementing communication between the video processor and a host CPU. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A scalar execution unit is coupled to the host interface and the memory interface and is configured to execute scalar video processing operations. A vector execution unit is coupled to the host interface and the memory interface and is configured to execute vector video processing operations.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 15, 2014
    Assignee: Nvidia Corporation
    Inventors: Shirish Gadre, Ashish Karandikar, Stephen D. Lew, Christopher T. Cheng
  • Patent number: 8687008
    Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: April 1, 2014
    Assignee: NVIDIA Corporation
    Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
  • Patent number: 8683178
    Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.
    Type: Grant
    Filed: April 20, 2011
    Date of Patent: March 25, 2014
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8683184
    Abstract: A method for implementing multi context execution on a video processor having a scalar execution unit and a vector execution unit. The method includes allocating a first task to a vector execution unit and allocating a second task to the vector execution unit. The first task is from a first context in the second task is from a second context. The method further includes interleaving a plurality of work packages comprising the first task and the second task to generate a combined work package stream. The combined work package stream is subsequently executed on the vector execution unit.
    Type: Grant
    Filed: November 4, 2005
    Date of Patent: March 25, 2014
    Assignee: Nvidia Corporation
    Inventors: Stephen D. Lew, Ashish Karandikar, Shirish Gadre, Franciscus W. Sijstermans
  • Patent number: 8649508
    Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: February 11, 2014
    Assignee: Tata Consultancy Services Ltd.
    Inventor: Natarajan Vijayarangan
  • Publication number: 20140040594
    Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.
    Type: Application
    Filed: October 2, 2013
    Publication date: February 6, 2014
    Applicants: Samsung Electronics, IMEC
    Inventors: Bruno Bougard, Thomas Schuster
  • Patent number: 8627042
    Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
    Type: Grant
    Filed: December 30, 2009
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
  • Patent number: 8627043
    Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
    Type: Grant
    Filed: March 26, 2012
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
  • Publication number: 20140006748
    Abstract: A reconfigurable vector processor is described that allows the size of its vector units to be changed in order to process vectors of different sizes. The reconfigurable vector processor comprises a plurality of processor units. Each of the processor units comprises a control unit for decoding instructions and generating control signals, a scalar unit for processing instructions on scalar data, and a vector unit for processing instructions on vector data under control of control signals. The reconfigurable vector processor architecture also comprises a vector control selector for selectively providing control signals generated by one processor unit of the plurality of processor units to the vector unit of a different processor unit of the plurality of processor units.
    Type: Application
    Filed: January 25, 2011
    Publication date: January 2, 2014
    Applicant: COGNIVUE CORPORATION
    Inventors: Malcolm Stewart, Ali Osman Ors, Daniel Laroche
  • Patent number: 8549256
    Abstract: Methods and apparatus relating to a tightly coupled scalar and Boolean processor are described. In an embodiment, a Boolean unit may include a result vector subunit. The result vector subunit may be controlled by an instruction flow that is managed by a scalar unit. Other embodiments are also disclosed.
    Type: Grant
    Filed: January 15, 2007
    Date of Patent: October 1, 2013
    Assignee: Intel Corporation
    Inventor: Charles Narad
  • Patent number: 8510534
    Abstract: A scalar/vector processor includes a plurality of functional units (252, 260, 262, 264, 266, 268, 270). At least one of the functional units includes a vector section (210) for operating on at least one vector and a scalar section (220) for operating on at least one scalar. The vector section and scalar section of the functional unit co-operate by the scalar section being arranged to provide and/or consume at least one scalar required by and/or supplied by the vector section of the functional unit.
    Type: Grant
    Filed: May 22, 2003
    Date of Patent: August 13, 2013
    Assignee: ST-Ericsson SA
    Inventors: Cornelis Hermanus Van Berkel, Patrick Peter Elizabeth Meuwissen, Nur Engin
  • Publication number: 20130185539
    Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop.
    Type: Application
    Filed: July 13, 2012
    Publication date: July 18, 2013
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Ching-Yu HUNG, Shinri INAMORI, Jagadeesh SANKARAN, Peter CHANG
  • Publication number: 20130185538
    Abstract: A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage. The instruction stream includes scalar instructions executable by the scalar processor core and vector instructions executable by the vector coprocessor core. The scalar processor core is configured to pass the vector instructions to the vector coprocessor core. The vector coprocessor core configured to process a plurality of data values in parallel while executing each vector instruction passed by the scalar processor core. The vector coprocessor core includes a plurality of processing paths arranged in parallel to process the data values. Each of the processing paths includes an execution unit. Each of the execution units is configured to communicate a result of processing to each other of the execution units.
    Type: Application
    Filed: July 13, 2012
    Publication date: July 18, 2013
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Ching-Yu Hung, Shinri Inamori, Jagadeesh Sankaran, Peter Chang
  • Publication number: 20130173884
    Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.
    Type: Application
    Filed: December 7, 2012
    Publication date: July 4, 2013
    Applicants: Samsung Electronics, Imec
    Inventors: Imec, Samsung Electronics
  • Publication number: 20130159665
    Abstract: A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided.
    Type: Application
    Filed: December 15, 2011
    Publication date: June 20, 2013
    Applicant: Verisilicon Holdings Co., Ltd.
    Inventor: Asheesh Kashyap
  • Patent number: 8370817
    Abstract: A mechanism is provided for optimizing scalar code executed on a single instruction multiple data (SIMD) engine by aligning the slots of SIMD registers. With the mechanism, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: February 5, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
  • Publication number: 20130024652
    Abstract: Various methods and systems are provided for processing units that may be scaled. In one embodiment, a processing unit includes a plurality of scalar processing units and a vector processing unit in communication with each of the plurality of scalar processing units. The vector processing unit is configured to coordinate execution of instructions received from the plurality of scalar processing units. In another embodiment, a scalar instruction packet including a pre-fix instruction and a vector instruction packet including a vector instruction is obtained. Execution of the vector instruction may be modified by the pre-fix instruction in a processing unit including a vector processing unit. In another embodiment, a scalar instruction packet including a plurality of partitions is obtained. The location of the partitions is determined based upon a partition indicator included in the scalar instruction packet and a scalar instruction included in a partition is executed by a processing unit.
    Type: Application
    Filed: September 20, 2011
    Publication date: January 24, 2013
    Applicant: BROADCOM CORPORATION
    Inventors: Neil Bailey, Eben Upton
  • Patent number: 8296457
    Abstract: Methods, apparatus, and products are disclosed for providing nearest neighbor point-to-point communications among compute nodes of an operational group in a global combining network of a parallel computer, each compute node connected to each adjacent compute node in the global combining network through a link, that include: identifying each link in the global combining network for each compute node of the operational group; designating one of a plurality of point-to-point class routing identifiers for each link such that no compute node in the operational group is connected to two adjacent compute nodes in the operational group with links designated for the same class routing identifiers; and configuring each compute node of the operational group for point-to-point communications with each adjacent compute node in the global combining network through the link between that compute node and that adjacent compute node using that link's designated class routing identifier.
    Type: Grant
    Filed: August 2, 2007
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventors: Charles J. Archer, Ahmad A. Faraj, Todd A. Inglett, Joseph D. Ratterman
  • Publication number: 20120260061
    Abstract: A data processing apparatus having processing circuitry, a scalar register bank and a vector register bank, including decoding circuitry arranged to decode a sequence of instructions to generate control signals for the processing circuitry. The decoding circuitry is responsive to a decode modifier instruction within the sequence of instructions to alter decoding of a subsequent scalar instruction in the sequence by mapping at least one scalar operand specified by the subsequent scalar instruction to at least one vector operand in the vector register bank, and, in dependence on the scalar operation specified by the subsequent scalar instruction, determining a vector operation to be performed on at least a subset of the operand elements within the at least the one vector operand. Such an approach enables a wide variety of vector operations to be specified without the need to individually define separate vector instructions for those vector operations.
    Type: Application
    Filed: April 4, 2012
    Publication date: October 11, 2012
    Applicant: ARM LIMITED
    Inventor: Alastair David Reid
  • Patent number: 8280826
    Abstract: In one embodiment, the present invention includes a method for identifying a deformable object of a scene of a computer game that is visible by an artificial intelligence (AI) character of the game, requesting a speculative physics simulation associated with the deformable object to determine a result of an action to the deformable object by the AI character, and selecting an action to be performed by the AI character, where the selection is based at least in part on the speculative physics simulation. Other embodiments are described and claimed.
    Type: Grant
    Filed: October 10, 2011
    Date of Patent: October 2, 2012
    Assignee: Intel Corporation
    Inventors: David Putzolu, Aaron Kunze, Teresa Morrison
  • Patent number: 8203567
    Abstract: A graphics processing method and apparatus described herein is capable of converting graphics processing of a window system into a vector-based application program interface (API) format usable in the GPU and performing the converted graphics processing in the GPU. For example, the vector-based API may be based on an OpenVG standard or an EGL standard.
    Type: Grant
    Filed: July 3, 2009
    Date of Patent: June 19, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Dong-kyun Jeong, Soo-chan Lim, Na-min Kim
  • Patent number: 8190830
    Abstract: A memory controller may execute instructions instead of sending the instructions to a processor for execution. To maintain synchronization between the memory controller and the processor, the memory controller may queue a null instruction in the memory controller for each non-filler instruction sent to the processor and may send a filler instruction to the processor for each non-null instruction to be executed by the memory controller.
    Type: Grant
    Filed: March 10, 2006
    Date of Patent: May 29, 2012
    Assignee: Intel Corporation
    Inventor: Gurumurthy Rajaram
  • Patent number: 8190854
    Abstract: A method of processing data is disclosed that includes performing a fetch of a plurality of instructions from a memory unit. The method also includes grouping the plurality of instructions into packets of instructions of different types for parallel execution by a plurality of instruction execution units. The packets of instructions include a first instruction and a second instruction. The method includes using a combined scalar and vector condition code register to execute the first instruction for a compare operation and the second instruction for a conditional operation using the combined scalar and vector condition code register. The method also includes when the compare operation is a scalar compare operation, receiving a scalar compare instruction for the scalar compare operation at an instruction executing unit and storing results of the scalar compare operation in the combined scalar and vector condition code register.
    Type: Grant
    Filed: January 20, 2010
    Date of Patent: May 29, 2012
    Assignee: QUALCOMM Incorporated
    Inventors: Lucian Codrescu, Erich J. Plondke, Taylor Simpson
  • Publication number: 20120124332
    Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.
    Type: Application
    Filed: October 24, 2011
    Publication date: May 17, 2012
    Applicant: FUJITSU LIMITED
    Inventors: GE Yi, Yoshimasa Takebe, Hiromasa Takahashi
  • Patent number: 8090928
    Abstract: In one embodiment of the present invention, a processor includes a scalar computation unit; a vector co-processor coupled to the scalar computation unit; and one or more function-specific engines coupled to the scalar computation unit, where the engines are adapted to minimize data exchange penalties by processing small in-out bit slices.
    Type: Grant
    Filed: June 28, 2002
    Date of Patent: January 3, 2012
    Assignee: Intellectual Ventures I LLC
    Inventors: Dominik J. Schmidt, Robert Warren Sherburne, Jr.
  • Publication number: 20110307684
    Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.
    Type: Application
    Filed: June 10, 2010
    Publication date: December 15, 2011
    Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
  • Patent number: 8078804
    Abstract: A data cache memory coupled to a processor including processor clusters are adapted to operate simultaneously on scalar and vectorial data by providing data locations in the data cache memory for storing data for processing. The data locations are accessed either in a scalar mode or in a vectorial mode. This is done by explicitly mapping the data locations that are scalar and the data locations that are vectorial.
    Type: Grant
    Filed: June 26, 2007
    Date of Patent: December 13, 2011
    Assignees: STMicroelectronics S.r.l., STMicroelectronics N.V.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elena Salurso, Elio Guidetti
  • Patent number: 8069124
    Abstract: In one embodiment, the present invention includes a method for identifying a deformable object of a scene of a computer game that is visible by an artificial intelligence (AI) character of the game, requesting a speculative physics simulation associated with the deformable object to determine a result of an action to the deformable object by the AI character, and selecting an action to be performed by the AI character, where the selection is based at least in part on the speculative physics simulation. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 26, 2008
    Date of Patent: November 29, 2011
    Assignee: Intel Corporation
    Inventors: David Putzolu, Aaron Kunze, Teresa Morrison
  • Publication number: 20110161623
    Abstract: Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code.
    Type: Application
    Filed: December 30, 2009
    Publication date: June 30, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Brian K. Flachs, Charles R. Johns, Mark R. Nutter
  • Patent number: 7971197
    Abstract: A digital computer system automatically creates an Instruction Set Architecture (ISA) that potentially exploits VLIW instructions, vector operations, fused operations, and specialized operations with the goal of increasing the performance of a set of applications while keeping hardware cost below a designer specified limit, or with the goal of minimizing hardware cost given a required level of performance.
    Type: Grant
    Filed: August 18, 2005
    Date of Patent: June 28, 2011
    Assignee: Tensilica, Inc.
    Inventors: David William Goodwin, Dror Maydan, Ding-Kai Chen, Darin Stamenov Petkov, Steven Weng-Kiang Tjiang, Peng Tu, Christopher Rowen
  • Publication number: 20110145543
    Abstract: A processing unit executes a vector width instruction in a program and the processing unit obtains and supplies the width of an appropriate vector register that will be used to process variable vector processing instructions. Then, when the processing unit executes variable vector processing instructions in the program, the processing unit processes the variable vector processing instructions using the appropriate vector register with the instructions having the same width as the appropriate vector register. The width that the processing unit obtains may be less than an actual width of the appropriate vector register and may set by the processing unit. In this way, many different vector widths can be supported using a single set of instructions for vector processing. New instructions are not required if vector widths are changed and processing units having vector registers of differing widths do not require different code.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: Sun Microsystems, Inc.
    Inventor: Peter Carl Damron
  • Patent number: 7908460
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Grant
    Filed: May 3, 2010
    Date of Patent: March 15, 2011
    Assignee: Nintendo Co., Ltd.
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
  • Publication number: 20110055517
    Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.
    Type: Application
    Filed: August 26, 2009
    Publication date: March 3, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
  • Patent number: 7895411
    Abstract: One embodiment of the invention sets forth a hardware-based physics processing unit (PPU) having unique architecture designed to efficiently generate physics data. The PPU includes a PPU control engine (PCE), a data movement engine and a floating point engine (FPE). The PCE manages the overall operation of the PPU by allocating memory resources and transmitting graphics processing commands to the FPE and data movement commands to the DME. The FPE includes multiple vector processors that operate in parallel and perform floating point operations on data received from a host unit to generate physics simulation data. The DME facilitates the transmission of data between the host unit and the FPE by performs data movement operations between memories internal and external to the PPU.
    Type: Grant
    Filed: November 19, 2003
    Date of Patent: February 22, 2011
    Assignee: NVIDIA Corporation
    Inventors: Monier Maher, Otto A. Schmid, Curtis Davis, Manju Hegde, Jean Pierre Bordes
  • Patent number: 7877582
    Abstract: A single register file may be addressed using both scalar and SIMD instructions. That is, subsets of registers within a multi-addressable register file according to the illustrative embodiments, are addressable with different instruction forms, e.g., scalar instructions, SIMD instructions, etc., while the entire set of registers may be addressed with yet another form of instructions, referred to herein as Vector-Scalar Extension (VSX) instructions. The operation set that may be performed on the entire set of registers using the VSX instruction form is substantially similar to that of the operation sets of the subsets of registers. Such an arrangement allows legacy instructions to access subsets of registers within the multi-addressable register file while new instructions, i.e. the VSX instructions, may access the entire range of registers within the multi-addressable register file.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Brett Olsson
  • Publication number: 20110010522
    Abstract: A multiprocessor computer system includes a plurality of processor nodes coupled by a direct processor interconnect network, and a plurality of processor nodes coupled by an indirect processor interconnect network. A bridge directly couples the direct processor interconnect network and the indirect processor interconnect network.
    Type: Application
    Filed: June 11, 2010
    Publication date: January 13, 2011
    Applicant: Cray Inc.
    Inventors: Dennis C. Abts, Peter M. Klausler, James Nowicki
  • Publication number: 20100312988
    Abstract: A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes.
    Type: Application
    Filed: January 19, 2010
    Publication date: December 9, 2010
    Applicant: ARM LIMITED
    Inventors: Andreas BJĂ–RKLUND, Erik Persson, Ola Hugosson
  • Patent number: 7831804
    Abstract: A processor architecture includes a number of processing elements for treating input signals. The architecture is organized according to a matrix including rows and columns, the columns of which each include at least one microprocessor block having a computational part and a set of associated processing elements that are able to receive the same input signals. The number of associated processing elements is selectively variable in the direction of the column so as to exploit the parallelism of said signals. Additionally the processor architecture of the present invention enable dynamic switching between instruction parallelism and data parallel processing typical of vectorial functionality. The architecture can be scaled in various dimensions in an optimal configuration for the algorithm to be executed.
    Type: Grant
    Filed: May 30, 2008
    Date of Patent: November 9, 2010
    Assignee: ST Microelectronics S.R.L.
    Inventors: Francesco Pappalardo, Giuseppe Notarangelo, Elio Guidetti
  • Publication number: 20100235607
    Abstract: A processor includes a setting register in which a mode is set, a general-purpose register including a preferred slot used during scalar computing and a slot not used during the scalar computing, a selector configured to select and output data of a register designated by a mode set in the setting register during the scalar computing, and a computing unit configured to execute the scalar computing using the preferred slot of the general-purpose register and store computing result data of the scalar computing in the preferred slot of the general-purpose register. The data of the register output from the selector is stored in the slot of the general-purpose register.
    Type: Application
    Filed: March 2, 2010
    Publication date: September 16, 2010
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Hiroaki Sugita, Seiji Maeda, Tatsuya Mizutani
  • Publication number: 20100217954
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Application
    Filed: May 3, 2010
    Publication date: August 26, 2010
    Applicant: Nintendo Co., Ltd.,
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J.. Van Hook
  • Publication number: 20100186006
    Abstract: A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters.
    Type: Application
    Filed: December 17, 2009
    Publication date: July 22, 2010
    Applicants: IMEC, Samsung Electronics
    Inventors: Bruno Bougard, Thomas Schuster
  • Patent number: 7739479
    Abstract: A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.
    Type: Grant
    Filed: November 19, 2003
    Date of Patent: June 15, 2010
    Assignee: NVIDIA Corporation
    Inventors: Jean Pierre Bordes, Curtis Davis, Monier Maher, Manju Hegde, Otto A. Schmid
  • Patent number: 7739480
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Grant
    Filed: January 11, 2005
    Date of Patent: June 15, 2010
    Assignee: Nintendo Co., Ltd.
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
  • Publication number: 20100118852
    Abstract: A method of processing data is disclosed that includes performing a fetch of a plurality of instructions from a memory unit. The method also includes grouping the plurality of instructions into packets of instructions of different types for parallel execution by a plurality of instruction execution units. The packets of instructions include a first instruction and a second instruction. The method includes using a combined scalar and vector condition code register to execute the first instruction for a compare operation and the second instruction for a conditional operation using the combined scalar and vector condition code register. The method also includes when the compare operation is a scalar compare operation, receiving a scalar compare instruction for the scalar compare operation at an instruction executing unit and storing results of the scalar compare operation in the combined scalar and vector condition code register.
    Type: Application
    Filed: January 20, 2010
    Publication date: May 13, 2010
    Applicant: QUALCOMM INCORPORATED
    Inventors: Lucian Codrescu, Erich J. Plondke, Taylor Simpson
  • Publication number: 20100115228
    Abstract: A multiprocessor computer system has a plurality of first processors having a first addressable memory space, and a plurality of second processors having a second addressable memory space. The second addressable memory space is of a different size than the first addressable memory space, and the first addressable memory space and second addressable memory space comprise a part of the same common address space.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Applicant: CRAY INC.
    Inventors: Michael Parker, Timothy J. Johnson, Laurence S. Kaplan, Steven L. Scott, Robert Alverson, Skef Iterum