Distributing Of Vector Data To Vector Registers Patents (Class 712/4)

Masking to control an access to data in vector register (Class 712/5)

Optimized scalar promotion with load and splat SIMD instructions

Patent number: 8255884

Abstract: Mechanisms for optimizing scalar code executed on a single instruction multiple data (SIMD) engine are provided. Placement of vector operation-splat operations may be determined based on an identification of scalar and SIMD operations in an original code representation. The original code representation may be modified to insert the vector operation-splat operations based on the determined placement of vector operation-splat operations to generate a first modified code representation. Placement of separate splat operations may be determined based on identification of scalar and SIMD operations in the first modified code representation. The first modified code representation may be modified to insert or delete separate splat operations based on the determined placement of the separate splat operations to generate a second modified code representation. SIMD code may be output based on the second modified code representation for execution by the SIMD engine.

Type: Grant

Filed: June 6, 2008

Date of Patent: August 28, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, John A. Gunnels
SCALAR INTEGER INSTRUCTIONS CAPABLE OF EXECUTION WITH THREE REGISTERS

Publication number: 20120185670

Abstract: A processing core implemented on a semiconductor chip is described. The processing core includes logic circuitry to identify whether vector instructions and integer scalar instructions are to be executed with two registers or three registers, where, in the case of two registers input operand information is destroyed in one of two registers, and, in the case of three registers input operand is not destroyed. The processing core also includes steering circuitry coupled to the logic circuitry. The steering circuitry is to control first data paths between scalar integer execution units and a scalar integer register bank such that two registers are accessed from the scalar register bank if two register execution is identified for the scalar integer instructions or three registers are accessed from the scalar integer register bank if three register execution is identified for the scalar integer instructions.

Type: Application

Filed: January 14, 2011

Publication date: July 19, 2012

Inventors: Bret L. Toll, Robert Valentine, Maxim Locktyukhin, Elmoustapha Ould-Ahmed-Vall
Systems and methods for a combined matrix-vector and matrix transpose vector multiply for a block-sparse matrix

Patent number: 8175853

Abstract: Systems and methods for combined matrix-vector and matrix-transpose vector multiply for block sparse matrices. Exemplary embodiments include a method of updating a simulation of physical objects in an interactive computer, including generating a set of representations of objects in the interactive computer environment, partitioning the set of representations into a plurality of subsets such that objects in any given set interact only with other objects in that set, generating a vector b describing an expected position of each object at the end of a time interval h, applying a biconjugate gradient algorithm to solve A*?v=b for the vector ?v of position and velocity changes to be applied to each object wherein the q=Ap and qt=AT(pt) calculations are combined so that A only has to be read once, integrating the updated motion vectors to determine a next state of the simulated objects, and converting the simulated objects to a visual.

Type: Grant

Filed: March 28, 2008

Date of Patent: May 8, 2012

Assignee: International Business Machines Corporation

Inventor: Karen A. Magerlein
TRACKING WRITTEN ADDRESSES OF A SHARED MEMORY OF A MULTI-CORE PROCESSOR

Publication number: 20120084498

Abstract: Described embodiments provide a method of controlling processing flow in a network processor having one or more processing modules. A given one of the processing modules loads a script into a compute engine. The script includes instructions for the compute engine. The given one of the processing modules loads a register file into the compute engine. The register file includes operands for the instructions of the loaded script. A tracking vector of the compute engine is initialized to a default value, and the compute engine executes the instructions of the loaded script based on the operands of the loaded register file. The compute engine updates corresponding portions of the register file with updated data corresponding to the executed script. The tracking vector tracks the updated portions of the register file. The compute engine provides the tracking vector and the updated register file to the given one of the processing modules.

Type: Application

Filed: December 9, 2011

Publication date: April 5, 2012

Inventors: David Sonnier, Chris Randall Stone, Charles Edwards Peet, JR.
Vector Loads from Scattered Memory Locations

Publication number: 20120060016

Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, a gather instruction is receive in a logic unit of a processor, the gather instruction specifying a plurality of addresses in a memory from which data is to be loaded into a target vector register of the processor. A plurality of separate load instructions for loading the data from the plurality of addresses in the memory are automatically generated within the logic unit. The plurality of separate load instructions are sent, from the logic unit, to one or more load/store units of the processor. The data corresponding to the plurality of addresses is gathered in a buffer of the processor. The logic unit then writes data stored in the buffer to the target vector register.

Type: Application

Filed: September 7, 2010

Publication date: March 8, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
Vector Loads with Multiple Vector Elements from a Same Cache Line in a Scattered Load Operation

Publication number: 20120060015

Abstract: Mechanisms for performing a scattered load operation are provided. With these mechanisms, an extended address is received in a cache memory of a processor. The extended address has a plurality of data element address portions that specify a plurality of data elements to be accessed using the single extended address. Each of the plurality of data element address portions is provided to corresponding data element selector logic units of the cache memory. Each data element selector logic unit in the cache memory selects a corresponding data element from a cache line buffer based on a corresponding data element address portion provided to the data element selector logic unit. Each data element selector logic unit outputs the corresponding data element for use by the processor.

Type: Application

Filed: September 7, 2010

Publication date: March 8, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael K. Gschwind, Valentina Salapura
Vector processing with high execution throughput

Patent number: 8108652

Abstract: The claimed invention is an efficient and high-performance vector processor. Through minimizing the use of multiple banks of memory and/or multi-ported memory blocks to reduce implementation cost, vector memory 450 provides abundant memory bandwidth and enables sustained low-delay memory operations for a large number of SIMD (Single Instruction, Multiple Data) or vector operators simultaneously.

Type: Grant

Filed: September 10, 2008

Date of Patent: January 31, 2012

Inventor: Ronald Chi-Chun Hui
Methods and apparatus facilitating access to storage among multiple computers

Patent number: 8073881

Abstract: Multiple computers in a cluster maintain respective sets of identifiers of neighbor computers in the cluster for each of multiple named resource. A combination of the respective sets of identifiers define a respective tree formed by the respective sets of identifiers for a respective named resource in the set of named resources. Upon origination and detection of a request at a given computer in the cluster, a given computer forwards the request from the given computer over a network to successive computers in the hierarchical tree leading to the computers relevant in handling the request based on use of identifiers of neighbor computers. Thus, a combination of identifiers of neighbor computers identify potential paths to related computers in the tree.

Type: Grant

Filed: August 3, 2009

Date of Patent: December 6, 2011

Assignee: Sanbolic, Inc.

Inventor: Ivan I. Georgiev
RUNNING SUBTRACT AND RUNNING DIVIDE INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20110276782

Abstract: The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector.

Type: Application

Filed: July 22, 2011

Publication date: November 10, 2011

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
System and method for using a mask register to track progress of gathering elements from memory

Patent number: 7984273

Abstract: A system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed.

Type: Grant

Filed: December 31, 2007

Date of Patent: July 19, 2011

Assignees: Intel Corporation

Inventors: Eric Sprangle, Anwar Rohillah, Robert Cavin, Tom Forsyth, Michael Abrash
Floating Point Collect and Operate

Publication number: 20110161624

Abstract: Mechanisms are provided for performing a floating point collect and operate for a summation across a vector for a dot product operation. A routing network placed before the single instruction multiple data (SIMD) unit allows the SIMD unit to perform a summation across a vector with a single stage of adders. The routing network routes the vector elements to the adders in a first cycle. The SIMD unit stores the results of the adders into a results vector register. The routing network routes the summation results from the results vector register to the adders in a second cycle. The SIMD unit then stores the results from the second cycle in the results vector register.

Type: Application

Filed: December 29, 2009

Publication date: June 30, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Brian K. Flachs, Seiji Maeda, Steven Osman
Methods and apparatus for attaching application specific functions within an array processor

Patent number: 7971036

Abstract: A multi-node video signal processor (VSPN) is describes that tightly couples multiple multi-cycle state machines (hardware assist units) to each processor and each memory in each node of an N node scalable array processor. VSPN memory hardware assist instructions are used to initiate multi-cycle state machine functions, to pass parameters to the multi-cycle state machines, to fetch operands from a node's memory, and to control the transfer of results from the multi-cycle state machines.

Type: Grant

Filed: April 18, 2007

Date of Patent: June 28, 2011

Assignee: Altera Corp.

Inventors: Gerald George Pechanek, Mihailo M. Stojancic
Two dimensional addressing of a matrix-vector register array

Patent number: 7949853

Abstract: A processor for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.

Type: Grant

Filed: December 5, 2007

Date of Patent: May 24, 2011

Assignee: International Business Machines Corporation

Inventors: Peter A. Sandon, R. Michael P. West
SYSTEM CYCLE LOADING AND STORING OF MISALIGNED VECTOR ELEMENTS IN A SIMD PROCESSOR

Publication number: 20110087859

Abstract: The present invention provides efficient transfer of misaligned vector elements between a vector register file and data memory in a single clock cycle. One vector register of N elements can be loaded from memory with any memory element address alignment during a single clock cycle of the processor. Also, a partial segment of vector register elements can be loaded into a vector register in a single clock cycle with any element alignment from data memory. The present invention comprises properly partitioned multiple multi-port data memory modules in conjunction with a crossbar and address generation circuit. A preferred embodiment of the present invention uses a dual-issue processor containing both a RISC-type scalar processor and a vector/SIMD processor, whereby one scalar and one SIMD instruction are executed every clock cycle, and the RISC processor handles program flow control and also loading and storing of vector registers.

Type: Application

Filed: February 3, 2003

Publication date: April 14, 2011

Inventor: Tibet Mimar
Method for efficient and parallel color space conversion in a programmable processor

Publication number: 20110072236

Abstract: The present invention relates to an efficient implementation of color space conversion in a SIMD processor as part of converting output of video decompression to interface to a display unit.

Type: Application

Filed: September 20, 2009

Publication date: March 24, 2011

Inventor: Tibet Mimar
Method and apparatus for obtaining a scalar value directly from a vector register

Patent number: 7908460

Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.

Type: Grant

Filed: May 3, 2010

Date of Patent: March 15, 2011

Assignee: Nintendo Co., Ltd.

Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
Logic controller having hard-coded control logic and programmable override control store entries

Patent number: 7895379

Abstract: Control logic of a node controller receives an input vector and produces an output vector. The control logic includes a plurality of tied control store entries including hard-coded logic to identify unique values of the input vector and to produce the output vector from a hard-coded output vector when the input vector is identified and when the tied control store is enabled. The control logic also includes a plurality of spare control store entries including programmable logic configurable to identify values of the input vector and to produce the output vector from a programmable output vector when the input vector is identified and when the spare control store is enabled. One of the spare control store entries that is configured to identify a value of the input vector that none of the tied control store entries that are enabled by the entry-enables register are configured to identify is enabled.

Type: Grant

Filed: December 23, 2008

Date of Patent: February 22, 2011

Assignee: Unisys Corporation

Inventors: Ross M. Weber, David R. Spatafore
DEDUPLICATED DATA PROCESSING RATE CONTROL

Publication number: 20110040951

Abstract: Various embodiments for deduplicated data processing rate control using at least one processor device in a computing environment are provided. A plurality of workers is configured for parallel processing of deduplicated data entities in a plurality of chunks. The deduplicated data processing rate is regulated using a rate control mechanism. The rate control mechanism incorporates a debt/credit algorithm specifying which of the plurality of workers processing the deduplicated data entities must wait for each of a plurality of calculated required sleep times. The rate control mechanism is adapted to limit a data flow rate based on a penalty acquired during a last processing of one of the plurality of chunks in a retroactive manner, and further adapted to operate on at least one vector representation of at least one limit specification to accommodate a variety of available dimensions corresponding to the at least one limit specification.

Type: Application

Filed: August 11, 2009

Publication date: February 17, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Shay H. AKIRAV, Ron ASHER, Yariv BACHAR, Lior KLIPPER, Oded SONIN
Integrated Vector-Scalar Processor

Publication number: 20100332792

Abstract: Systems and methods for improved vector data processing based on separately processing elements of a vector in multiple simultaneously executing vector element processing units are disclosed. One embodiment of the present invention is a vector processing system including a plurality of vector element processing units and a routing infrastructure. The routing infrastructure is configured to route each element of a received vector to a respective one of the vector element processing units. The received vector may be from a memory which is coupled to the vector element processing units by the routing infrastructure. Each vector element processing unit is configured to simultaneously process two or more elements, wherein each of the two or more elements is from a separate vector. Embodiments of the present invention also provide for forwarding of data and results of computation between vector element processing units.

Type: Application

Filed: June 30, 2009

Publication date: December 30, 2010

Applicant: Advanced Micro Devices, Inc.

Inventor: Daniel B. CLIFTON
Flexible vector modes of operation for SIMD processor

Publication number: 20100274988

Abstract: In addition to the usual modes of SIMD processor operation, where corresponding elements of two source vector registers are used as input pairs to be operated upon by the execution unit, or where one element of a source vector register is broadcast for use across the elements of another source vector register, the new system provides several other modes of operation for the elements of one or two source vector registers. Improving upon the time-costly moving of elements for an operation such as DCT, the present invention defines a more general set of modes of vector operations. In one embodiment, these new modes of operation use a third vector register to define how each element of one or both source vector registers are mapped, in order to pair these mapped elements as inputs to a vector execution unit.

Type: Application

Filed: February 3, 2003

Publication date: October 28, 2010

Inventor: Tibet Mimar
System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values

Patent number: 7818539

Abstract: A processor implements conditional vector operations in which, for example, an input vector containing multiple operands to be used in conditional operations is divided into two or more output vectors based on a condition vector. Each output vector can then be processed at full processor efficiency without cycles wasted due to branch latency. Data to be processed are divided into two groups based on whether or not they satisfy a given condition by e.g., steering each to one of the two index vectors. Once the data have been segregated in this way, subsequent processing can be performed without conditional operations, processor cycles wasted due to branch latency, incorrect speculation or execution of unnecessary instructions due to predication. Other examples of conditional operations include combining one or more input vectors into a single output vector based on a condition vector, conditional vector switching, conditional vector combining, and conditional vector load balancing.

Type: Grant

Filed: August 28, 2006

Date of Patent: October 19, 2010

Assignees: The Board of Trustees of the Leland Stanford Junior University, The Massachusetts Institute of Technology

Inventors: Scott Rixner, John D. Owens, Ujval J. Kapasi, William J. Dally
Arrangement, system and method for vector permutation in single-instruction multiple-data mircoprocessors

Patent number: 7809931

Abstract: A vector permutation system (100) for a single-instruction multiple-data microprocessor has a set of vector registers (110) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or need to be executed, and no permutation parameters need to be stored in the vector registers (10). This leads to higher performance, a smaller vector registers file and hence a smaller size of the microprocessor and better program code density.

Type: Grant

Filed: October 6, 2003

Date of Patent: October 5, 2010

Assignee: Freescale Semiconductor, Inc.

Inventor: Martin Raubuch
Method and apparatus for indirectly addressed vector load-add-store across multi-processors

Patent number: 7793073

Abstract: A method and apparatus to correctly compute a vector-gather, vector-operate (e.g., vector add), and vector-scatter sequence, particularly when elements of the vector may be redundantly presented, as with indirectly addressed vector operations. For an add operation, one vector register is loaded with the “add-in” values, and another vector register is loaded with address values of “add to” elements to be gathered from memory into a third vector register. If the vector of address values has a plurality of elements that point to the same memory address, the algorithm should add all the “add in” values from elements corresponding to the elements having the duplicated addresses. An indirectly addressed load performs the “gather” operation to load the “add to” values. A vector add operation then adds corresponding elements from the “add in” vector to the “add to” vector. An indirectly addressed store then performs the “scatter” operation to store the results.

Type: Grant

Filed: June 29, 2007

Date of Patent: September 7, 2010

Assignee: Cray Inc.

Inventor: James R. Kohn
Vector execution unit to process a vector instruction by executing a first operation on a first set of operands and a second operation on a second set of operands

Patent number: 7793072

Abstract: A microprocessor including an execution unit enabled to execute an asymmetric instruction, where the asymmetric instruction includes a set of operand fields and an operation code (opcode). The execution unit is configured to interpret the opcode to perform a first operation on a first set of data indicated by the set of operand fields and to perform a second operation on a second set of data indicated by the set of operand fields, wherein the set of operand fields indicate different sets of data with respect to the first and second operations and further wherein the first and second operations are mathematically different.

Type: Grant

Filed: October 31, 2003

Date of Patent: September 7, 2010

Assignee: International Business Machines Corporation

Inventor: Kenneth Dockser
METHOD FOR PERFORMING PLURALITY OF BIT OPERATIONS AND A DEVICE HAVING PLURALITY OF BIT OPERATIONS CAPABILITIES

Publication number: 20100223444

Abstract: A method and a device having a plurality of bit operations capability, the device includes: a first and a second registers and an instruction fetch circuit, and an arithmetic logic unit adapted to: calculate, during a first clock cycle, a position value representative of a position, within a first information vector, of a first bit of information that has a first value; and to multiply the position value by a multiplication factor to provide a first result and to alter the value of the first bit to a second value to provide an updated information vector, during the first clock cycle.

Type: Application

Filed: August 18, 2006

Publication date: September 2, 2010

Applicant: Freescale Semiconductor, Inc.

Inventors: Eran Glickman, Evgeni Ginzburg, Noam Sheffer
Data processor and methods thereof

Patent number: 7788471

Abstract: A system and method for performing vector arithmetic is disclosed. The method includes loading two operand vectors, each composed of a number of vector elements, into two storage locations. A selected arithmetic operation is performed on the operand vectors to produce a result vector having the number of vector elements. Each vector element of the result vector is associated with an arithmetic logic cell that has a first input that can receive any vector element from the first vector and a second input that can receive any vector element from the second vector. Accordingly each vector element of the result vector is a function of any two individual vector elements of the operand vectors. By applying the operand vector elements to the appropriate arithmetic logic cells, and by selecting the appropriate arithmetic operation, complex vector operations can be performed efficiently.

Type: Grant

Filed: September 18, 2006

Date of Patent: August 31, 2010

Assignee: Freescale Semiconductor, Inc.

Inventor: Chengke Sheng
Fast sparse list walker

Patent number: 7743231

Abstract: Provided are a method, information processing system, and computer readable medium for identifying active bits in a vector. The method comprises receiving a pointer associated with a vector of bits. The pointer is associated with a current bit within the vector of bits. The vector of bits if grouped into groups of a mathematical power of two, which is any non-negative integer powers of two. One or more current groups are determined which are the groups of the mathematical power of two comprising the current bit. The one or more current groups of the power of two are analyzed. A largest group of the power of two is identified in the one or more current groups comprising all empty bits. The pointer is set to point to a bit following a last bit in the identified largest group of the power of two comprising all empty bits.

Type: Grant

Filed: February 27, 2007

Date of Patent: June 22, 2010

Assignee: International Business Machines Corporation

Inventors: Scot H. Rider, Todd A. Strader
Method and apparatus for obtaining a scalar value directly from a vector register

Patent number: 7739480

Abstract: A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector.

Type: Grant

Filed: January 11, 2005

Date of Patent: June 15, 2010

Assignee: Nintendo Co., Ltd.

Inventors: Yu-Chung C. Liao, Peter A. Sandon, Howard Cheng, Timothy J. Van Hook
Method and apparatus for producing an index vector for use in performing a vector permute operation

Patent number: 7725678

Abstract: A method for generating a permutation index vector includes receiving a condition vector and performing an index generation function using the condition vector in order to generate the permutation index vector. An index vector generation circuit is also disclosed.

Type: Grant

Filed: February 17, 2005

Date of Patent: May 25, 2010

Assignee: Texas Instruments Incorporated

Inventor: Steven Krueger
LARGE INTEGER SUPPORT IN VECTOR OPERATIONS

Publication number: 20100115232

Abstract: A vector processor or vector processing computer has a first vector register operable to store two or more vector elements that together comprise a single first large integer and a second vector register operable to store two or more vector elements that together comprise a single second large integer. An adder having a carry-in bit is operable to add the large integer in the first vector register to the large integer in the second vector register by using the carry-in bit to add sequential elements of the vector registers.

Type: Application

Filed: October 31, 2008

Publication date: May 6, 2010

Inventors: Timothy J. Johnson, Eric P. Lundberg, Michael Parker, Gregory J. Faanes
TRANSFERRING DATA FROM INTEGER TO VECTOR REGISTERS

Publication number: 20100106939

Abstract: A method for transferring data from a general purpose register to a vector register, the method including splatting a byte of data directly from a general purpose register (GPR) to a vector register (VR) by means of vector permute instructions, and splatting another byte of data from the GPR to the VR and vectorially combining the data in the VR.

Type: Application

Filed: October 27, 2008

Publication date: April 29, 2010

Inventors: Daniel Citron, Ayal Zaks
Dynamic Data Driven Alignment and Data Formatting in a Floating-Point SIMD Architecture

Publication number: 20100095087

Abstract: Mechanisms are provided for dynamic data driven alignment and data formatting in a floating point SIMD architecture. At least two operand inputs are input to a permute unit of a processor. Each operand input contains at least one floating point value upon which a permute operation is to be performed by the permute unit. A control vector input, having a plurality of floating point values that together constitute the control vector input, is input to the permute unit of the processor for controlling the permute operation of the permute unit. The permute unit performs a permute operation on the at least two operand inputs according to a permutation pattern specified by the plurality of floating point values that constitute the control vector input. Moreover, a result output of the permute operation is output from the permute unit to a result vector register of the processor.

Type: Application

Filed: October 14, 2008

Publication date: April 15, 2010

Applicant: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Bruce M. Fleischer, Michael K. Gschwind
METHOD AND SYSTEM FOR PARALLEL HISTOGRAM CALCULATION IN A SIMD AND VLIW PROCESSOR

Publication number: 20090276606

Abstract: The present invention provides histogram calculation for images and video applications using a SIMD and VLIW processor with vector Look-Up Table (LUT) operations. This provides a speed up of histogram calculation by a factor of N times over a scalar processor where the SIMD processor could perform N LUT operations per instruction. Histogram operation is partitioned into a vector LUT operation, followed by vector increment, vector LUT update, and at the end by reduction of vector histogram components. The present invention could be used for intensity, RGBA, YUV, and other type of multi-component images.

Type: Application

Filed: March 12, 2009

Publication date: November 5, 2009

Inventor: TIBET MIMAR
Vector transfer system for packing dis-contiguous vector elements together into a single bus transfer

Patent number: 7610469

Abstract: A vector transfer unit for handling transfers of vector data between a memory and a data processor in a computer system. Vector data transfer instructions are posted to an instruction queue in the vector transfer unit. Program instructions for performing a burst transfer include determining the starting address of the vector data to be transferred, the ending address of the vector data to be transferred, and whether the ending address of the vector data to be transferred is within the same virtual memory page as the starting address. The ending address of the vector data to be transferred is determined based on the number of data elements to be transferred, the stride of the vector data to be transferred, and the width of the vector data elements to be transferred. When the amount of data to be transferred is divisible by a factor of two, the multiplication of the stride and width of the data elements is carried out by shifting.

Type: Grant

Filed: March 29, 2004

Date of Patent: October 27, 2009

Assignee: NEC Electronics America, Inc.

Inventor: Ahmad R. Ansari
STREAMING DIGITAL DATA FILTER

Publication number: 20090259822

Abstract: A method of filtering streaming digital data in real time. The method including: (a) initializing and storing a set of m data elements and an associated set of m pointer data from 1 to m in sequence, m an integer greater than 2; (b) receiving in real time a first or next data element of a digital data stream of sequential data elements; (c) simultaneously with (b), replacing a stored data element associated with the pointer datum m with the received data element, changing the pointer datum of m to 1, and incrementing the value of all other pointer data by 1; (d) simultaneously with (b) sorting in order from a low to high all stored data elements; (e) simultaneously with (b), maintaining the association of pointer datum and data elements; (f) simultaneously with (b), filtering all stored data elements; and (g) repeating (b) through (f) multiple times.

Type: Application

Filed: April 10, 2008

Publication date: October 15, 2009

Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
CIRCUIT AND DESIGN STRUCTURE FOR A STREAMING DIGITAL DATA FILTER

Publication number: 20090259823

Abstract: A circuit and design structure for a streaming digital data filter embodied in a machine readable medium, the design structure including: a data processing unit and a pointer processing unit, the data processing unit and the pointer unit connected to a control logic; the pointer processing unit consisting of n serially connected pointer processing stages from a first to a last pointer processing stage, each pointer processing stage except for the first and last processing stages of the pointer processing unit including a pointer register and a multiplexer, wherein n is a positive integer greater than 2; the data processing unit consisting of n serially connected data processing stages from a first data processing stage to a last data processing stage, each data processing stage including a multiplexer, a data register and a comparator; and one or more filter output stages connected to the data processing unit.

Type: Application

Filed: April 10, 2008

Publication date: October 15, 2009

Inventors: Timothy M. Platt, Richard Jean-Luc St-Pierre
Vector instructions to enable efficient synchronization and parallel reduction operations

Publication number: 20090249026

Abstract: In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed.

Type: Application

Filed: March 28, 2008

Publication date: October 1, 2009

Inventors: Mikhail Smelyanskiy, Sanjeev Kumar, Daehyun Kim, Jatin Chhugani, Changkyu Kim, Christopher J. Hughes, Victor W. Lee, Anthony D. Nguyen, Yen-Kuang Chen
METHODS, APPARATUS, AND INSTRUCTIONS FOR CONVERTING VECTOR DATA

Publication number: 20090172349

Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.

Type: Application

Filed: December 26, 2007

Publication date: July 2, 2009

Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
METHODS, APPARATUS, AND INSTRUCTIONS FOR PROCESSING VECTOR DATA

Publication number: 20090172348

Abstract: A computer processor includes control logic for executing LoadUnpack and PackStore instructions. In one embodiment, the processor includes a vector register and a mask register. In response to a PackStore instruction with an argument specifying a memory location, a circuit in the processor copies unmasked vector elements from the vector register to consecutive memory locations, starting at the specified memory location, without copying masked vector elements. In response to a LoadUnpack instruction, the circuit copies data items from consecutive memory locations, starting at an identified memory location, into unmasked vector elements of the vector register, without copying data to masked vector elements. Other embodiments are described and claimed.

Type: Application

Filed: December 26, 2007

Publication date: July 2, 2009

Inventor: Robert Cavin
Non-volatile processor register

Publication number: 20090172350

Abstract: A processor using a vertically configured non-volatile memory array that can retain values through a power failure is disclosed. The processor may include a register block configured to store and retrieve one or more values, the register block being a vertically configured non-volatile memory array, an arithmetic block configured to perform an arithmetic operation on the one or more values, and a control block configured to control the register block, the arithmetic block, and a memory block. The vertically configured non-volatile memory array may include a plurality of two-terminal memory elements. The two-terminal memory elements may be resistivity-sensitive and store data in the absence of power. The two-terminal memory elements store data as plurality of conductivity profiles that can be non-destructively read by applying a read voltage across the terminals of the memory element and data can be written by applying a write voltage across the terminals.

Type: Application

Filed: December 28, 2007

Publication date: July 2, 2009

Applicant: UNITY SEMICONDUCTOR CORPORATION

Inventor: Robert Norman
Method and apparatus for image blending

Patent number: 7548248

Abstract: Methods and apparatuses for blending two images using vector table look up operations. In one aspect of the invention, a method to blend two images includes: loading a vector of keys into a vector register; converting the vector of keys into a first vector of blending factors for the first image and a second vector of blending factors for the second image using a plurality of look up tables; and computing an image attribute for the blended image using the blending factors.

Type: Grant

Filed: June 7, 2007

Date of Patent: June 16, 2009

Assignee: Apple Inc.

Inventors: Steven Todd Weybrew, David Ligon, Ronald Gerard Langhi
Vector Permute and Vector Register File Write Mask Instruction Variant State Extension for RISC Length Vector Instructions

Publication number: 20090150648

Abstract: Embodiments of the invention generally relate to the field of image processing, and more specifically to instructions and hardware for supporting image processing. An integrated processing unit configured to process vector instructions and vector permute instructions is provided. A vector permute instruction may be issued to the integrated processing unit to set controls of one or more multiplexers so that the multiplexers rearrange the results of a subsequent vector instruction.

Type: Application

Filed: December 6, 2007

Publication date: June 11, 2009

Inventor: Eric Oliver Mejdrich
CAPACITY REGISTER FILE

Publication number: 20090150649

Abstract: An apparatus for storing X-bit digitized data, the register file comprising: a plurality of registers each register configured for storing X bits, wherein each register is partitioned into Y sub-registers such that each sub-register stores at least X/Y bits, and wherein at least one extra X/Y-bit sub-register is incorporated in each register to provide redundancy in the number of sub-registers for a total of at least Y+1 sub-registers per register, so that if a first sub-register in a first register includes faulty bits, data destined for storage in the first sub-register is stored in a second sub-register, in the first register, that does not include faulty bits.

Type: Application

Filed: December 10, 2007

Publication date: June 11, 2009

Inventors: Jaume Abella, Javier Carretero Casado, Pedro Chaparro Monferrer, Xavier Vera
Software Pipelining Using One or More Vector Registers

Publication number: 20090113168

Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.

Type: Application

Filed: October 24, 2007

Publication date: April 30, 2009

Inventor: Ayal Zaks
Scalar Float Register Overlay on Vector Register File for Efficient Register Allocation and Scalar Float and Vector Register Sharing

Publication number: 20090106526

Abstract: Embodiments of the invention are generally related to image processing, and more specifically to register files for supporting image processing. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated.

Type: Application

Filed: October 22, 2007

Publication date: April 23, 2009

Inventors: David Arnold Luick, Eric Oliver Mejdrieh
SIMD PERMUTATIONS WITH EXTENDED RANGE IN A DATA PROCESSOR

Publication number: 20090100247

Abstract: A processor in a data processing system executes a permutation instruction which identifies a first source register, at least one other source register, and a destination register. The first source register stores at least one in-range index value for the at least one other source register and at least one out-of-range index value for the at least one other source register. The at least one other source register stores a plurality of vector element values, wherein each in-range index value indicates which vector element value of the at least one other source register is to be stored into a corresponding vector element of the destination register. Each out-of-range index value is used to indicate which one of at least two predetermined constant values is to be stored into a corresponding vector element of the destination register. Partial table lookups using a permutation instruction shortens the time required to retrieve data.

Type: Application

Filed: October 12, 2007

Publication date: April 16, 2009

Inventors: William C. Moyer, Imran Ahmed, Dan E. Tamir
Hierarchical multi-precision pipeline counters

Patent number: 7519797

Abstract: An event occurring in a graphics pipeline is detected and counted at the location of its occurrence using an event detector and a local counter. The event count maintained by the local counter is reported asynchronously to a global counter. The global counter is configured to be of higher precision than the local counter and is positioned at a place that is convenient for reporting the events, e.g., at the end of the graphics pipeline.

Type: Grant

Filed: November 2, 2006

Date of Patent: April 14, 2009

Assignee: NIVIDIA Corporation

Inventors: Gregory J. Stiehl, David L. Anderson, Cass W. Everitt, Mark J. French, Steven E. Molnar
Splat copying GPR data to vector register elements by executing lvsr or lvsl and vector subtract instructions

Patent number: 7516299

Abstract: A method for transferring data from a general purpose register (GPR) to a vector register (VR), the method including vectorially combining data in the VR from the GPR, by executing instructions of a PowerPC Instruction Set Architecture (ISA), the step of combining including splatting a low nibble from the GPR into a low nibble in each element of a first VR by executing two “load vector for shift left” (lvsl) or “load vector for shift right” (lvsr) and one “vector subtract unsigned byte modulo” (vsububm), shifting a high nibble of the GPR into a low nibble the GPR, splatting the low nibble of the GPR into a low nibble in each element of a second VR by re-executing the two lvsl or lvsr and one vsububm instructions, shifting the low nibble of the second VR into a high nibble of the second VR and combining both first and second VRs into one VR.

Type: Grant

Filed: August 29, 2005

Date of Patent: April 7, 2009

Assignee: International Business Machines Corporation

Inventors: Daniel Citron, Ayal Zaks
SIMD DOT PRODUCT OPERATIONS WITH OVERLAPPED OPERANDS

Publication number: 20090077345

Abstract: A data processing system includes a plurality of general purpose registers, and processor circuitry for executing one or more instructions, including a vector dot product instruction for simultaneously performing at least two dot products. The vector dot product instruction identifies a first and second source register, each for storing a plurality of vector elements, where a first dot product is to be performed between a first subset of vector elements of the first source register and a first subset of vector elements of the second source register, and a second dot product is to be performed between a second subset of vector elements of the first source register and a second subset of vector elements of the second source register. The first and second subsets of the second source register are different and at least two vector elements of the first and second subsets of the second source register overlap.

Type: Application

Filed: September 13, 2007

Publication date: March 19, 2009

Inventor: William C. Moyer
Two dimensional addressing of a matrix-vector register array

Patent number: 7496731

Abstract: A method for processing matrix data. The processor includes M independent vector register files which are adapted to collectively store a matrix of L data elements. Each data element has B binary bits. The matrix has N rows and M columns, and L=N*M. Each column has K subcolumns. N?2, M?2, K?2, and B?1. Each row and each subcolumn is addressable. The processor does not duplicatively store the L data elements. The matrix includes a set of arrays such that each array is a row or subcolumn of the matrix. The processor may execute an instruction that performs an operation on a first array of the set of arrays, such that the operation is performed with selectivity with respect to the data elements of the first array.

Type: Grant

Filed: September 6, 2007

Date of Patent: February 24, 2009

Assignee: International Business Machines Corporation

Inventors: Peter A. Sandon, R. Michael P. West

prev 1 2 3 4 5 6 next