Distributing Of Vector Data To Vector Registers Patents (Class 712/4)
-
Patent number: 8255884Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.Type: GrantFiled: June 6, 2008Date of Patent: August 28, 2012Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
-
Publication number: 20120185670Abstract: A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions.Type: ApplicationFiled: January 14, 2011Publication date: July 19, 2012Inventors: Bret L. Toll, Robert Valentine, Maxim Locktyukhin, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 8175853Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.Type: GrantFiled: March 28, 2008Date of Patent: May 8, 2012Assignee: International Business Machines CorporationInventor: Karen A. Magerlein
-
Publication number: 20120084498Abstract: Described embodiments provide a method of controlling processing flow in a network processor having one or more processing modules. A given one of the processing modules loads a script into a compute engine. The script includes instructions for the compute engine. The given one of the processing modules loads a register file into the compute engine. The register file includes operands for the instructions of the loaded script. A tracking vector of the compute engine is initialized to a default value, and the compute engine executes the instructions of the loaded script based on the operands of the loaded register file. The compute engine updates corresponding portions of the register file with updated data corresponding to the executed script. The tracking vector tracks the updated portions of the register file. The compute engine provides the tracking vector and the updated register file to the given one of the processing modules.Type: ApplicationFiled: December 9, 2011Publication date: April 5, 2012Inventors: David Sonnier, Chris Randall Stone, Charles Edwards Peet, JR.
-
Publication number: 20120060016Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.Type: ApplicationFiled: September 7, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
-
Publication number: 20120060015Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.Type: ApplicationFiled: September 7, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
-
Patent number: 8108652Abstract: The claimed invention is an efficient and high-performance vector processor. Through minimizing the use of multiple banks of memory and/or multi-ported memory blocks to reduce implementation cost, vector memory 450 provides abundant memory bandwidth and enables sustained low-delay memory operations for a large number of SIMD (Single Instruction, Multiple Data) or vector operators simultaneously.Type: GrantFiled: September 10, 2008Date of Patent: January 31, 2012Inventor: Ronald Chi-Chun Hui
-
Patent number: 8073881Abstract: Multiple computers in a cluster maintain respective sets of identifiers of neighbor computers in the cluster for each of multiple named resource. A combination of the respective sets of identifiers define a respective tree formed by the respective sets of identifiers for a respective named resource in the set of named resources. Upon origination and detection of a request at a given computer in the cluster, a given computer forwards the request from the given computer over a network to successive computers in the hierarchical tree leading to the computers relevant in handling the request based on use of identifiers of neighbor computers. Thus, a combination of identifiers of neighbor computers identify potential paths to related computers in the tree.Type: GrantFiled: August 3, 2009Date of Patent: December 6, 2011Assignee: Sanbolic, Inc.Inventor: Ivan I. Georgiev
-
Publication number: 20110276782Abstract: The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector.Type: ApplicationFiled: July 22, 2011Publication date: November 10, 2011Applicant: APPLE INC.Inventors: Jeffry E. Gonion, Keith E. Diefendorff
-
Patent number: 7984273Abstract: A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.Type: GrantFiled: December 31, 2007Date of Patent: July 19, 2011Assignees: Intel CorporationInventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Tom Forsyth, Michael Abrash
-
Publication number: 20110161624Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.Type: ApplicationFiled: December 29, 2009Publication date: June 30, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brian K. Flachs, Seiji Maeda, Steven Osman
-
Patent number: 7971036Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.Type: GrantFiled: April 18, 2007Date of Patent: June 28, 2011Assignee: Altera Corp.Inventors: Gerald George Pechanek, Mihailo M. Stojancic
-
Patent number: 7949853Abstract: A processor for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.Type: GrantFiled: December 5, 2007Date of Patent: May 24, 2011Assignee: International Business Machines CorporationInventors: Peter A. Sandon, R. Michael P. West
-
Publication number: 20110087859Abstract: The present invention provides efficient transfer of misaligned vector elements between a vector register file and data memory in a single clock cycle. One vector register of N elements can be loaded from memory with any memory element address alignment during a single clock cycle of the processor. Also, a partial segment of vector register elements can be loaded into a vector register in a single clock cycle with any element alignment from data memory. The present invention comprises properly partitioned multiple multi-port data memory modules in conjunction with a crossbar and address generation circuit. A preferred embodiment of the present invention uses a dual-issue processor containing both a RISC-type scalar processor and a vector/SIMD processor, whereby one scalar and one SIMD instruction are executed every clock cycle, and the RISC processor handles program flow control and also loading and storing of vector registers.Type: ApplicationFiled: February 3, 2003Publication date: April 14, 2011Inventor: Tibet Mimar
-
Publication number: 20110072236Abstract: The present invention relates to an efficient implementation of color space conversion in a SIMD processor as part of converting output of video decompression to interface to a display unit.Type: ApplicationFiled: September 20, 2009Publication date: March 24, 2011Inventor: Tibet Mimar
-
Patent number: 7908460Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: GrantFiled: May 3, 2010Date of Patent: March 15, 2011Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Patent number: 7895379Abstract: Control logic of a node controller receives an input vector and produces an output vector. The control logic includes a plurality of tied control store entries including hard-coded logic to identify unique values of the input vector and to produce the output vector from a hard-coded output vector when the input vector is identified and when the tied control store is enabled. The control logic also includes a plurality of spare control store entries including programmable logic configurable to identify values of the input vector and to produce the output vector from a programmable output vector when the input vector is identified and when the spare control store is enabled. One of the spare control store entries that is configured to identify a value of the input vector that none of the tied control store entries that are enabled by the entry-enables register are configured to identify is enabled.Type: GrantFiled: December 23, 2008Date of Patent: February 22, 2011Assignee: Unisys CorporationInventors: Ross M. Weber, David R. Spatafore
-
Publication number: 20110040951Abstract: Various embodiments for deduplicated data processing rate control using at least one processor device in a computing environment are provided. A plurality of workers is configured for parallel processing of deduplicated data entities in a plurality of chunks. The deduplicated data processing rate is regulated using a rate control mechanism. The rate control mechanism incorporates a debt/credit algorithm specifying which of the plurality of workers processing the deduplicated data entities must wait for each of a plurality of calculated required sleep times. The rate control mechanism is adapted to limit a data flow rate based on a penalty acquired during a last processing of one of the plurality of chunks in a retroactive manner, and further adapted to operate on at least one vector representation of at least one limit specification to accommodate a variety of available dimensions corresponding to the at least one limit specification.Type: ApplicationFiled: August 11, 2009Publication date: February 17, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shay H. AKIRAV, Ron ASHER, Yariv BACHAR, Lior KLIPPER, Oded SONIN
-
Publication number: 20100332792Abstract: Systems and methods for improved vector data processing based on separately processing elements of a vector in multiple simultaneously executing vector element processing units are disclosed. One embodiment of the present invention is a vector processing system including a plurality of vector element processing units and a routing infrastructure. The routing infrastructure is configured to route each element of a received vector to a respective one of the vector element processing units. The received vector may be from a memory which is coupled to the vector element processing units by the routing infrastructure. Each vector element processing unit is configured to simultaneously process two or more elements, wherein each of the two or more elements is from a separate vector. Embodiments of the present invention also provide for forwarding of data and results of computation between vector element processing units.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Applicant: Advanced Micro Devices, Inc.Inventor: Daniel B. CLIFTON
-
Publication number: 20100274988Abstract: In addition to the usual modes of SIMD processor operation, where corresponding elements of two source vector registers are used as input pairs to be operated upon by the execution unit, or where one element of a source vector register is broadcast for use across the elements of another source vector register, the new system provides several other modes of operation for the elements of one or two source vector registers. Improving upon the time-costly moving of elements for an operation such as DCT, the present invention defines a more general set of modes of vector operations. In one embodiment, these new modes of operation use a third vector register to define how each element of one or both source vector registers are mapped, in order to pair these mapped elements as inputs to a vector execution unit.Type: ApplicationFiled: February 3, 2003Publication date: October 28, 2010Inventor: Tibet Mimar
-
Patent number: 7818539Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.Type: GrantFiled: August 28, 2006Date of Patent: October 19, 2010Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of TechnologyInventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
-
Patent number: 7809931Abstract: A vector permutation system (100) for a single-instruction multiple-data microprocessor has a set of vector registers (110) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or need to be executed, and no permutation parameters need to be stored in the vector registers (10). This leads to higher performance, a smaller vector registers file and hence a smaller size of the microprocessor and better program code density.Type: GrantFiled: October 6, 2003Date of Patent: October 5, 2010Assignee: Freescale Semiconductor, Inc.Inventor: Martin Raubuch
-
Patent number: 7793073Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.Type: GrantFiled: June 29, 2007Date of Patent: September 7, 2010Assignee: Cray Inc.Inventor: James R. Kohn
-
Patent number: 7793072Abstract: A microprocessor including an execution unit enabled to execute an asymmetric instruction, where the asymmetric instruction includes a set of operand fields and an operation code (opcode). The execution unit is configured to interpret the opcode to perform a first operation on a first set of data indicated by the set of operand fields and to perform a second operation on a second set of data indicated by the set of operand fields, wherein the set of operand fields indicate different sets of data with respect to the first and second operations and further wherein the first and second operations are mathematically different.Type: GrantFiled: October 31, 2003Date of Patent: September 7, 2010Assignee: International Business Machines CorporationInventor: Kenneth Dockser
-
Publication number: 20100223444Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.Type: ApplicationFiled: August 18, 2006Publication date: September 2, 2010Applicant: Freescale Semiconductor, Inc.Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
-
Patent number: 7788471Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.Type: GrantFiled: September 18, 2006Date of Patent: August 31, 2010Assignee: Freescale Semiconductor, Inc.Inventor: Chengke Sheng
-
Patent number: 7743231Abstract: Provided are a method, information processing system, and computer readable medium for identifying active bits in a vector. The method comprises receiving a pointer associated with a vector of bits. The pointer is associated with a current bit within the vector of bits. The vector of bits if grouped into groups of a mathematical power of two, which is any non-negative integer powers of two. One or more current groups are determined which are the groups of the mathematical power of two comprising the current bit. The one or more current groups of the power of two are analyzed. A largest group of the power of two is identified in the one or more current groups comprising all empty bits. The pointer is set to point to a bit following a last bit in the identified largest group of the power of two comprising all empty bits.Type: GrantFiled: February 27, 2007Date of Patent: June 22, 2010Assignee: International Business Machines CorporationInventors: Scot H. Rider, Todd A. Strader
-
Patent number: 7739480Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.Type: GrantFiled: January 11, 2005Date of Patent: June 15, 2010Assignee: Nintendo Co., Ltd.Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
-
Patent number: 7725678Abstract: A method for generating a permutation index vector includes receiving a condition vector and performing an index generation function using the condition vector in order to generate the permutation index vector. An index vector generation circuit is also disclosed.Type: GrantFiled: February 17, 2005Date of Patent: May 25, 2010Assignee: Texas Instruments IncorporatedInventor: Steven Krueger
-
Publication number: 20100115232Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
-
Publication number: 20100106939Abstract: A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.Type: ApplicationFiled: October 27, 2008Publication date: April 29, 2010Inventors: Daniel Citron, Ayal Zaks
-
Publication number: 20100095087Abstract: Mechanisms are provided for dynamic data driven alignment and data formatting in a floating point SIMD architecture. At least two operand inputs are input to a permute unit of a processor. Each operand input contains at least one floating point value upon which a permute operation is to be performed by the permute unit. A control vector input, having a plurality of floating point values that together constitute the control vector input, is input to the permute unit of the processor for controlling the permute operation of the permute unit. The permute unit performs a permute operation on the at least two operand inputs according to a permutation pattern specified by the plurality of floating point values that constitute the control vector input. Moreover, a result output of the permute operation is output from the permute unit to a result vector register of the processor.Type: ApplicationFiled: October 14, 2008Publication date: April 15, 2010Applicant: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Bruce M. Fleischer, Michael K. Gschwind
-
Publication number: 20090276606Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.Type: ApplicationFiled: March 12, 2009Publication date: November 5, 2009Inventor: TIBET MIMAR
-
Patent number: 7610469Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.Type: GrantFiled: March 29, 2004Date of Patent: October 27, 2009Assignee: NEC Electronics America, Inc.Inventor: Ahmad R. Ansari
-
Publication number: 20090259822Abstract: A method of filtering streaming digital data in real time. The method including: (a) initializing and storing a set of m data elements and an associated set of m pointer data from 1 to m in sequence, m an integer greater than 2; (b) receiving in real time a first or next data element of a digital data stream of sequential data elements; (c) simultaneously with (b), replacing a stored data element associated with the pointer datum m with the received data element, changing the pointer datum of m to 1, and incrementing the value of all other pointer data by 1; (d) simultaneously with (b) sorting in order from a low to high all stored data elements; (e) simultaneously with (b), maintaining the association of pointer datum and data elements; (f) simultaneously with (b), filtering all stored data elements; and (g) repeating (b) through (f) multiple times.Type: ApplicationFiled: April 10, 2008Publication date: October 15, 2009Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
-
Publication number: 20090259823Abstract: A circuit and design structure for a streaming digital data filter embodied in a machine readable medium, the design structure including: a data processing unit and a pointer processing unit, the data processing unit and the pointer unit connected to a control logic; the pointer processing unit consisting of n serially connected pointer processing stages from a first to a last pointer processing stage, each pointer processing stage except for the first and last processing stages of the pointer processing unit including a pointer register and a multiplexer, wherein n is a positive integer greater than 2; the data processing unit consisting of n serially connected data processing stages from a first data processing stage to a last data processing stage, each data processing stage including a multiplexer, a data register and a comparator; and one or more filter output stages connected to the data processing unit.Type: ApplicationFiled: April 10, 2008Publication date: October 15, 2009Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
-
Publication number: 20090249026Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.Type: ApplicationFiled: March 28, 2008Publication date: October 1, 2009Inventors: Mikhail Smelyanskiy, Sanjeev Kumar, Daehyun Kim, Jatin Chhugani, Changkyu Kim, Christopher J. Hughes, Victor W. Lee, Anthony D. Nguyen, Yen-Kuang Chen
-
Publication number: 20090172349Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.Type: ApplicationFiled: December 26, 2007Publication date: July 2, 2009Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
-
Publication number: 20090172348Abstract: A computer processor includes control logic for executing LoadUnpack and PackStore instructions. In one embodiment, the processor includes a vector register and a mask register. In response to a PackStore instruction with an argument specifying a memory location, a circuit in the processor copies unmasked vector elements from the vector register to consecutive memory locations, starting at the specified memory location, without copying masked vector elements. In response to a LoadUnpack instruction, the circuit copies data items from consecutive memory locations, starting at an identified memory location, into unmasked vector elements of the vector register, without copying data to masked vector elements. Other embodiments are described and claimed.Type: ApplicationFiled: December 26, 2007Publication date: July 2, 2009Inventor: Robert Cavin
-
Publication number: 20090172350Abstract: A processor using a vertically configured non-volatile memory array that can retain values through a power failure is disclosed. The processor may include a register block configured to store and retrieve one or more values, the register block being a vertically configured non-volatile memory array, an arithmetic block configured to perform an arithmetic operation on the one or more values, and a control block configured to control the register block, the arithmetic block, and a memory block. The vertically configured non-volatile memory array may include a plurality of two-terminal memory elements. The two-terminal memory elements may be resistivity-sensitive and store data in the absence of power. The two-terminal memory elements store data as plurality of conductivity profiles that can be non-destructively read by applying a read voltage across the terminals of the memory element and data can be written by applying a write voltage across the terminals.Type: ApplicationFiled: December 28, 2007Publication date: July 2, 2009Applicant: UNITY SEMICONDUCTOR CORPORATIONInventor: Robert Norman
-
Patent number: 7548248Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.Type: GrantFiled: June 7, 2007Date of Patent: June 16, 2009Assignee: Apple Inc.Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
-
Publication number: 20090150648Abstract: Embodiments of the invention generally relate to the field of image processing, and more specifically to instructions and hardware for supporting image processing. An integrated processing unit configured to process vector instructions and vector permute instructions is provided. A vector permute instruction may be issued to the integrated processing unit to set controls of one or more multiplexers so that the multiplexers rearrange the results of a subsequent vector instruction.Type: ApplicationFiled: December 6, 2007Publication date: June 11, 2009Inventor: Eric Oliver Mejdrich
-
Publication number: 20090150649Abstract: An apparatus for storing X-bit digitized data, the register file comprising: a plurality of registers each register configured for storing X bits, wherein each register is partitioned into Y sub-registers such that each sub-register stores at least X/Y bits, and wherein at least one extra X/Y-bit sub-register is incorporated in each register to provide redundancy in the number of sub-registers for a total of at least Y+1 sub-registers per register, so that if a first sub-register in a first register includes faulty bits, data destined for storage in the first sub-register is stored in a second sub-register, in the first register, that does not include faulty bits.Type: ApplicationFiled: December 10, 2007Publication date: June 11, 2009Inventors: Jaume Abella, Javier Carretero Casado, Pedro Chaparro Monferrer, Xavier Vera
-
Publication number: 20090113168Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.Type: ApplicationFiled: October 24, 2007Publication date: April 30, 2009Inventor: Ayal Zaks
-
Publication number: 20090106526Abstract: Embodiments of the invention are generally related to image processing, and more specifically to register files for supporting image processing. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated.Type: ApplicationFiled: October 22, 2007Publication date: April 23, 2009Inventors: David Arnold Luick, Eric Oliver Mejdrieh
-
Publication number: 20090100247Abstract: A processor in a data processing system executes a permutation instruction which identifies a first source register, at least one other source register, and a destination register. The first source register stores at least one in-range index value for the at least one other source register and at least one out-of-range index value for the at least one other source register. The at least one other source register stores a plurality of vector element values, wherein each in-range index value indicates which vector element value of the at least one other source register is to be stored into a corresponding vector element of the destination register. Each out-of-range index value is used to indicate which one of at least two predetermined constant values is to be stored into a corresponding vector element of the destination register. Partial table lookups using a permutation instruction shortens the time required to retrieve data.Type: ApplicationFiled: October 12, 2007Publication date: April 16, 2009Inventors: William C. Moyer, Imran Ahmed, Dan E. Tamir
-
Patent number: 7519797Abstract: An event occurring in a graphics pipeline is detected and counted at the location of its occurrence using an event detector and a local counter. The event count maintained by the local counter is reported asynchronously to a global counter. The global counter is configured to be of higher precision than the local counter and is positioned at a place that is convenient for reporting the events, e.g., at the end of the graphics pipeline.Type: GrantFiled: November 2, 2006Date of Patent: April 14, 2009Assignee: NIVIDIA CorporationInventors: Gregory J. Stiehl, David L. Anderson, Cass W. Everitt, Mark J. French, Steven E. Molnar
-
Patent number: 7516299Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.Type: GrantFiled: August 29, 2005Date of Patent: April 7, 2009Assignee: International Business Machines CorporationInventors: Daniel Citron, Ayal Zaks
-
Publication number: 20090077345Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.Type: ApplicationFiled: September 13, 2007Publication date: March 19, 2009Inventor: William C. Moyer
-
Patent number: 7496731Abstract: A method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.Type: GrantFiled: September 6, 2007Date of Patent: February 24, 2009Assignee: International Business Machines CorporationInventors: Peter A. Sandon, R. Michael P. West