Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
  • Patent number: 8175853
    Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: May 8, 2012
    Assignee: International Business Machines Corporation
    Inventor: Karen A. Magerlein
  • Publication number: 20120084498
    Abstract: Described embodiments provide a method of controlling processing flow in a network processor having one or more processing modules. A given one of the processing modules loads a script into a compute engine. The script includes instructions for the compute engine. The given one of the processing modules loads a register file into the compute engine. The register file includes operands for the instructions of the loaded script. A tracking vector of the compute engine is initialized to a default value, and the compute engine executes the instructions of the loaded script based on the operands of the loaded register file. The compute engine updates corresponding portions of the register file with updated data corresponding to the executed script. The tracking vector tracks the updated portions of the register file. The compute engine provides the tracking vector and the updated register file to the given one of the processing modules.
    Type: Application
    Filed: December 9, 2011
    Publication date: April 5, 2012
    Inventors: David Sonnier, Chris Randall Stone, Charles Edwards Peet, JR.
  • Publication number: 20120060016
    Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.
    Type: Application
    Filed: September 7, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
  • Publication number: 20120060015
    Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.
    Type: Application
    Filed: September 7, 2010
    Publication date: March 8, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
  • Patent number: 8108652
    Abstract: The claimed invention is an efficient and high-performance vector processor. Through minimizing the use of multiple banks of memory and/or multi-ported memory blocks to reduce implementation cost, vector memory 450 provides abundant memory bandwidth and enables sustained low-delay memory operations for a large number of SIMD (Single Instruction, Multiple Data) or vector operators simultaneously.
    Type: Grant
    Filed: September 10, 2008
    Date of Patent: January 31, 2012
    Inventor: Ronald Chi-Chun Hui
  • Patent number: 8073881
    Abstract: Multiple computers in a cluster maintain respective sets of identifiers of neighbor computers in the cluster for each of multiple named resource. A combination of the respective sets of identifiers define a respective tree formed by the respective sets of identifiers for a respective named resource in the set of named resources. Upon origination and detection of a request at a given computer in the cluster, a given computer forwards the request from the given computer over a network to successive computers in the hierarchical tree leading to the computers relevant in handling the request based on use of identifiers of neighbor computers. Thus, a combination of identifiers of neighbor computers identify potential paths to related computers in the tree.
    Type: Grant
    Filed: August 3, 2009
    Date of Patent: December 6, 2011
    Assignee: Sanbolic, Inc.
    Inventor: Ivan I. Georgiev
  • Publication number: 20110276782
    Abstract: The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector.
    Type: Application
    Filed: July 22, 2011
    Publication date: November 10, 2011
    Applicant: APPLE INC.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 7984273
    Abstract: A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 31, 2007
    Date of Patent: July 19, 2011
    Assignees: Intel Corporation
    Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Tom Forsyth, Michael Abrash
  • Publication number: 20110161624
    Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.
    Type: Application
    Filed: December 29, 2009
    Publication date: June 30, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Brian K. Flachs, Seiji Maeda, Steven Osman
  • Patent number: 7971036
    Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.
    Type: Grant
    Filed: April 18, 2007
    Date of Patent: June 28, 2011
    Assignee: Altera Corp.
    Inventors: Gerald George Pechanek, Mihailo M. Stojancic
  • Patent number: 7949853
    Abstract: A processor for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.
    Type: Grant
    Filed: December 5, 2007
    Date of Patent: May 24, 2011
    Assignee: International Business Machines Corporation
    Inventors: Peter A. Sandon, R. Michael P. West
  • Publication number: 20110087859
    Abstract: The present invention provides efficient transfer of misaligned vector elements between a vector register file and data memory in a single clock cycle. One vector register of N elements can be loaded from memory with any memory element address alignment during a single clock cycle of the processor. Also, a partial segment of vector register elements can be loaded into a vector register in a single clock cycle with any element alignment from data memory. The present invention comprises properly partitioned multiple multi-port data memory modules in conjunction with a crossbar and address generation circuit. A preferred embodiment of the present invention uses a dual-issue processor containing both a RISC-type scalar processor and a vector/SIMD processor, whereby one scalar and one SIMD instruction are executed every clock cycle, and the RISC processor handles program flow control and also loading and storing of vector registers.
    Type: Application
    Filed: February 3, 2003
    Publication date: April 14, 2011
    Inventor: Tibet Mimar
  • Publication number: 20110072236
    Abstract: The present invention relates to an efficient implementation of color space conversion in a SIMD processor as part of converting output of video decompression to interface to a display unit.
    Type: Application
    Filed: September 20, 2009
    Publication date: March 24, 2011
    Inventor: Tibet Mimar
  • Patent number: 7908460
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Grant
    Filed: May 3, 2010
    Date of Patent: March 15, 2011
    Assignee: Nintendo Co., Ltd.
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
  • Patent number: 7895379
    Abstract: Control logic of a node controller receives an input vector and produces an output vector. The control logic includes a plurality of tied control store entries including hard-coded logic to identify unique values of the input vector and to produce the output vector from a hard-coded output vector when the input vector is identified and when the tied control store is enabled. The control logic also includes a plurality of spare control store entries including programmable logic configurable to identify values of the input vector and to produce the output vector from a programmable output vector when the input vector is identified and when the spare control store is enabled. One of the spare control store entries that is configured to identify a value of the input vector that none of the tied control store entries that are enabled by the entry-enables register are configured to identify is enabled.
    Type: Grant
    Filed: December 23, 2008
    Date of Patent: February 22, 2011
    Assignee: Unisys Corporation
    Inventors: Ross M. Weber, David R. Spatafore
  • Publication number: 20110040951
    Abstract: Various embodiments for deduplicated data processing rate control using at least one processor device in a computing environment are provided. A plurality of workers is configured for parallel processing of deduplicated data entities in a plurality of chunks. The deduplicated data processing rate is regulated using a rate control mechanism. The rate control mechanism incorporates a debt/credit algorithm specifying which of the plurality of workers processing the deduplicated data entities must wait for each of a plurality of calculated required sleep times. The rate control mechanism is adapted to limit a data flow rate based on a penalty acquired during a last processing of one of the plurality of chunks in a retroactive manner, and further adapted to operate on at least one vector representation of at least one limit specification to accommodate a variety of available dimensions corresponding to the at least one limit specification.
    Type: Application
    Filed: August 11, 2009
    Publication date: February 17, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Shay H. AKIRAV, Ron ASHER, Yariv BACHAR, Lior KLIPPER, Oded SONIN
  • Publication number: 20100332792
    Abstract: Systems and methods for improved vector data processing based on separately processing elements of a vector in multiple simultaneously executing vector element processing units are disclosed. One embodiment of the present invention is a vector processing system including a plurality of vector element processing units and a routing infrastructure. The routing infrastructure is configured to route each element of a received vector to a respective one of the vector element processing units. The received vector may be from a memory which is coupled to the vector element processing units by the routing infrastructure. Each vector element processing unit is configured to simultaneously process two or more elements, wherein each of the two or more elements is from a separate vector. Embodiments of the present invention also provide for forwarding of data and results of computation between vector element processing units.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Daniel B. CLIFTON
  • Publication number: 20100274988
    Abstract: In addition to the usual modes of SIMD processor operation, where corresponding elements of two source vector registers are used as input pairs to be operated upon by the execution unit, or where one element of a source vector register is broadcast for use across the elements of another source vector register, the new system provides several other modes of operation for the elements of one or two source vector registers. Improving upon the time-costly moving of elements for an operation such as DCT, the present invention defines a more general set of modes of vector operations. In one embodiment, these new modes of operation use a third vector register to define how each element of one or both source vector registers are mapped, in order to pair these mapped elements as inputs to a vector execution unit.
    Type: Application
    Filed: February 3, 2003
    Publication date: October 28, 2010
    Inventor: Tibet Mimar
  • Patent number: 7818539
    Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.
    Type: Grant
    Filed: August 28, 2006
    Date of Patent: October 19, 2010
    Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology
    Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
  • Patent number: 7809931
    Abstract: A vector permutation system (100) for a single-instruction multiple-data microprocessor has a set of vector registers (110) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or need to be executed, and no permutation parameters need to be stored in the vector registers (10). This leads to higher performance, a smaller vector registers file and hence a smaller size of the microprocessor and better program code density.
    Type: Grant
    Filed: October 6, 2003
    Date of Patent: October 5, 2010
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Martin Raubuch
  • Patent number: 7793073
    Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.
    Type: Grant
    Filed: June 29, 2007
    Date of Patent: September 7, 2010
    Assignee: Cray Inc.
    Inventor: James R. Kohn
  • Patent number: 7793072
    Abstract: A microprocessor including an execution unit enabled to execute an asymmetric instruction, where the asymmetric instruction includes a set of operand fields and an operation code (opcode). The execution unit is configured to interpret the opcode to perform a first operation on a first set of data indicated by the set of operand fields and to perform a second operation on a second set of data indicated by the set of operand fields, wherein the set of operand fields indicate different sets of data with respect to the first and second operations and further wherein the first and second operations are mathematically different.
    Type: Grant
    Filed: October 31, 2003
    Date of Patent: September 7, 2010
    Assignee: International Business Machines Corporation
    Inventor: Kenneth Dockser
  • Publication number: 20100223444
    Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.
    Type: Application
    Filed: August 18, 2006
    Publication date: September 2, 2010
    Applicant: Freescale Semiconductor, Inc.
    Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
  • Patent number: 7788471
    Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: August 31, 2010
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Chengke Sheng
  • Patent number: 7743231
    Abstract: Provided are a method, information processing system, and computer readable medium for identifying active bits in a vector. The method comprises receiving a pointer associated with a vector of bits. The pointer is associated with a current bit within the vector of bits. The vector of bits if grouped into groups of a mathematical power of two, which is any non-negative integer powers of two. One or more current groups are determined which are the groups of the mathematical power of two comprising the current bit. The one or more current groups of the power of two are analyzed. A largest group of the power of two is identified in the one or more current groups comprising all empty bits. The pointer is set to point to a bit following a last bit in the identified largest group of the power of two comprising all empty bits.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: June 22, 2010
    Assignee: International Business Machines Corporation
    Inventors: Scot H. Rider, Todd A. Strader
  • Patent number: 7739480
    Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.
    Type: Grant
    Filed: January 11, 2005
    Date of Patent: June 15, 2010
    Assignee: Nintendo Co., Ltd.
    Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
  • Patent number: 7725678
    Abstract: A method for generating a permutation index vector includes receiving a condition vector and performing an index generation function using the condition vector in order to generate the permutation index vector. An index vector generation circuit is also disclosed.
    Type: Grant
    Filed: February 17, 2005
    Date of Patent: May 25, 2010
    Assignee: Texas Instruments Incorporated
    Inventor: Steven Krueger
  • Publication number: 20100115232
    Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.
    Type: Application
    Filed: October 31, 2008
    Publication date: May 6, 2010
    Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
  • Publication number: 20100106939
    Abstract: A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.
    Type: Application
    Filed: October 27, 2008
    Publication date: April 29, 2010
    Inventors: Daniel Citron, Ayal Zaks
  • Publication number: 20100095087
    Abstract: Mechanisms are provided for dynamic data driven alignment and data formatting in a floating point SIMD architecture. At least two operand inputs are input to a permute unit of a processor. Each operand input contains at least one floating point value upon which a permute operation is to be performed by the permute unit. A control vector input, having a plurality of floating point values that together constitute the control vector input, is input to the permute unit of the processor for controlling the permute operation of the permute unit. The permute unit performs a permute operation on the at least two operand inputs according to a permutation pattern specified by the plurality of floating point values that constitute the control vector input. Moreover, a result output of the permute operation is output from the permute unit to a result vector register of the processor.
    Type: Application
    Filed: October 14, 2008
    Publication date: April 15, 2010
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Bruce M. Fleischer, Michael K. Gschwind
  • Publication number: 20090276606
    Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.
    Type: Application
    Filed: March 12, 2009
    Publication date: November 5, 2009
    Inventor: TIBET MIMAR
  • Patent number: 7610469
    Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.
    Type: Grant
    Filed: March 29, 2004
    Date of Patent: October 27, 2009
    Assignee: NEC Electronics America, Inc.
    Inventor: Ahmad R. Ansari
  • Publication number: 20090259823
    Abstract: A circuit and design structure for a streaming digital data filter embodied in a machine readable medium, the design structure including: a data processing unit and a pointer processing unit, the data processing unit and the pointer unit connected to a control logic; the pointer processing unit consisting of n serially connected pointer processing stages from a first to a last pointer processing stage, each pointer processing stage except for the first and last processing stages of the pointer processing unit including a pointer register and a multiplexer, wherein n is a positive integer greater than 2; the data processing unit consisting of n serially connected data processing stages from a first data processing stage to a last data processing stage, each data processing stage including a multiplexer, a data register and a comparator; and one or more filter output stages connected to the data processing unit.
    Type: Application
    Filed: April 10, 2008
    Publication date: October 15, 2009
    Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
  • Publication number: 20090259822
    Abstract: A method of filtering streaming digital data in real time. The method including: (a) initializing and storing a set of m data elements and an associated set of m pointer data from 1 to m in sequence, m an integer greater than 2; (b) receiving in real time a first or next data element of a digital data stream of sequential data elements; (c) simultaneously with (b), replacing a stored data element associated with the pointer datum m with the received data element, changing the pointer datum of m to 1, and incrementing the value of all other pointer data by 1; (d) simultaneously with (b) sorting in order from a low to high all stored data elements; (e) simultaneously with (b), maintaining the association of pointer datum and data elements; (f) simultaneously with (b), filtering all stored data elements; and (g) repeating (b) through (f) multiple times.
    Type: Application
    Filed: April 10, 2008
    Publication date: October 15, 2009
    Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
  • Publication number: 20090249026
    Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.
    Type: Application
    Filed: March 28, 2008
    Publication date: October 1, 2009
    Inventors: Mikhail Smelyanskiy, Sanjeev Kumar, Daehyun Kim, Jatin Chhugani, Changkyu Kim, Christopher J. Hughes, Victor W. Lee, Anthony D. Nguyen, Yen-Kuang Chen
  • Publication number: 20090172348
    Abstract: A computer processor includes control logic for executing LoadUnpack and PackStore instructions. In one embodiment, the processor includes a vector register and a mask register. In response to a PackStore instruction with an argument specifying a memory location, a circuit in the processor copies unmasked vector elements from the vector register to consecutive memory locations, starting at the specified memory location, without copying masked vector elements. In response to a LoadUnpack instruction, the circuit copies data items from consecutive memory locations, starting at an identified memory location, into unmasked vector elements of the vector register, without copying data to masked vector elements. Other embodiments are described and claimed.
    Type: Application
    Filed: December 26, 2007
    Publication date: July 2, 2009
    Inventor: Robert Cavin
  • Publication number: 20090172350
    Abstract: A processor using a vertically configured non-volatile memory array that can retain values through a power failure is disclosed. The processor may include a register block configured to store and retrieve one or more values, the register block being a vertically configured non-volatile memory array, an arithmetic block configured to perform an arithmetic operation on the one or more values, and a control block configured to control the register block, the arithmetic block, and a memory block. The vertically configured non-volatile memory array may include a plurality of two-terminal memory elements. The two-terminal memory elements may be resistivity-sensitive and store data in the absence of power. The two-terminal memory elements store data as plurality of conductivity profiles that can be non-destructively read by applying a read voltage across the terminals of the memory element and data can be written by applying a write voltage across the terminals.
    Type: Application
    Filed: December 28, 2007
    Publication date: July 2, 2009
    Applicant: UNITY SEMICONDUCTOR CORPORATION
    Inventor: Robert Norman
  • Publication number: 20090172349
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: December 26, 2007
    Publication date: July 2, 2009
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Patent number: 7548248
    Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.
    Type: Grant
    Filed: June 7, 2007
    Date of Patent: June 16, 2009
    Assignee: Apple Inc.
    Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
  • Publication number: 20090150648
    Abstract: Embodiments of the invention generally relate to the field of image processing, and more specifically to instructions and hardware for supporting image processing. An integrated processing unit configured to process vector instructions and vector permute instructions is provided. A vector permute instruction may be issued to the integrated processing unit to set controls of one or more multiplexers so that the multiplexers rearrange the results of a subsequent vector instruction.
    Type: Application
    Filed: December 6, 2007
    Publication date: June 11, 2009
    Inventor: Eric Oliver Mejdrich
  • Publication number: 20090150649
    Abstract: An apparatus for storing X-bit digitized data, the register file comprising: a plurality of registers each register configured for storing X bits, wherein each register is partitioned into Y sub-registers such that each sub-register stores at least X/Y bits, and wherein at least one extra X/Y-bit sub-register is incorporated in each register to provide redundancy in the number of sub-registers for a total of at least Y+1 sub-registers per register, so that if a first sub-register in a first register includes faulty bits, data destined for storage in the first sub-register is stored in a second sub-register, in the first register, that does not include faulty bits.
    Type: Application
    Filed: December 10, 2007
    Publication date: June 11, 2009
    Inventors: Jaume Abella, Javier Carretero Casado, Pedro Chaparro Monferrer, Xavier Vera
  • Publication number: 20090113168
    Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
    Type: Application
    Filed: October 24, 2007
    Publication date: April 30, 2009
    Inventor: Ayal Zaks
  • Publication number: 20090106526
    Abstract: Embodiments of the invention are generally related to image processing, and more specifically to register files for supporting image processing. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated.
    Type: Application
    Filed: October 22, 2007
    Publication date: April 23, 2009
    Inventors: David Arnold Luick, Eric Oliver Mejdrieh
  • Publication number: 20090100247
    Abstract: A processor in a data processing system executes a permutation instruction which identifies a first source register, at least one other source register, and a destination register. The first source register stores at least one in-range index value for the at least one other source register and at least one out-of-range index value for the at least one other source register. The at least one other source register stores a plurality of vector element values, wherein each in-range index value indicates which vector element value of the at least one other source register is to be stored into a corresponding vector element of the destination register. Each out-of-range index value is used to indicate which one of at least two predetermined constant values is to be stored into a corresponding vector element of the destination register. Partial table lookups using a permutation instruction shortens the time required to retrieve data.
    Type: Application
    Filed: October 12, 2007
    Publication date: April 16, 2009
    Inventors: William C. Moyer, Imran Ahmed, Dan E. Tamir
  • Patent number: 7519797
    Abstract: An event occurring in a graphics pipeline is detected and counted at the location of its occurrence using an event detector and a local counter. The event count maintained by the local counter is reported asynchronously to a global counter. The global counter is configured to be of higher precision than the local counter and is positioned at a place that is convenient for reporting the events, e.g., at the end of the graphics pipeline.
    Type: Grant
    Filed: November 2, 2006
    Date of Patent: April 14, 2009
    Assignee: NIVIDIA Corporation
    Inventors: Gregory J. Stiehl, David L. Anderson, Cass W. Everitt, Mark J. French, Steven E. Molnar
  • Patent number: 7516299
    Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.
    Type: Grant
    Filed: August 29, 2005
    Date of Patent: April 7, 2009
    Assignee: International Business Machines Corporation
    Inventors: Daniel Citron, Ayal Zaks
  • Publication number: 20090077345
    Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.
    Type: Application
    Filed: September 13, 2007
    Publication date: March 19, 2009
    Inventor: William C. Moyer
  • Patent number: 7496731
    Abstract: A method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: February 24, 2009
    Assignee: International Business Machines Corporation
    Inventors: Peter A. Sandon, R. Michael P. West
  • Patent number: 7467287
    Abstract: Methods and apparatuses for performing vector table look-up using multiple look-up tables. In one aspect of the invention, a method for execution by a microprocessor in response to receiving a single instruction includes: receiving a plurality of numbers; partitioning look-up memory into a plurality of look-up tables; looking up simultaneously a plurality of elements from the plurality of look-up tables. Each of the plurality of elements is in one of the plurality of look-up tables and is pointed to by one of the plurality of numbers. The above operations are performed in response to the microprocessor receiving the single instruction.
    Type: Grant
    Filed: December 31, 2001
    Date of Patent: December 16, 2008
    Assignee: Apple Inc.
    Inventors: Joseph P. Bratt, Sushma Shrikant Trivedi
  • Publication number: 20080301401
    Abstract: A processor includes: a plurality of registers; an instruction readout circuit configured to read out an instruction from a memory; an instruction generation circuit configured to generate instructions for saving data into a predetermined storage area, for the respective registers, if the instruction read out by the instruction readout circuit is an instruction causing the data stored in each of the plurality of registers to be saved; and an instruction execution circuit configured to execute the instruction read out from the memory and the instructions generated by the instruction generation circuit.
    Type: Application
    Filed: May 28, 2008
    Publication date: December 4, 2008
    Applicants: Sanyo Electric Co., Ltd., Sanyo Semiconductor Co., Ltd.
    Inventors: Iwao Honda, Shinya Kishida