Vector Processor Operation Patents (Class 712/7)

Sequential (Class 712/8)

Concurrent (Class 712/9)

PROCESSING VECTORS USING WRAPPING BOOLEAN INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE

Publication number: 20130024656

Abstract: Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a Boolean operation on another input vector dependent upon the input vector and the control vector.

Type: Application

Filed: September 27, 2012

Publication date: January 24, 2013

Applicant: APPLE INC.

Inventor: Apple Inc.
VECTOR OPERATIONS FOR COMPRESSING SELECTED VECTOR ELEMENTS

Publication number: 20130024654

Abstract: A processor, method, and medium for using vector operations to compress selected elements of a vector. An input vector is compared to a criteria vector, and then a subset of the plurality of elements of the input vector are selected based on the comparison. A permutation vector is generated based on the locations of the selected elements and then the permutation vector is used to permute the selected elements of the input vector to an output vector. The selected elements of the input vector are stored in contiguous locations in the leftmost elements of the output vector. Then, the output vector is stored to memory and a pointer to the memory location is incremented by the number of selected elements.

Type: Application

Filed: July 20, 2011

Publication date: January 24, 2013

Inventor: Darryl J. Gove
Pipelined multiple operand minimum and maximum function

Patent number: 8356160

Abstract: Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands.

Type: Grant

Filed: January 15, 2008

Date of Patent: January 15, 2013

Assignee: International Business Machines Corporation

Inventors: Adam J. Muff, Matthew R. Tubbs
Vector processor with plural arithmetic units for processing a vector data string divided into plural register banks accessed by read pointers starting at different positions

Patent number: 8316215

Abstract: It is an object to speed up a vector store instruction on a memory that is divided into banks as setting a plurality of elements as a unit while minimizing an increase in physical quantity. A vector processing apparatus has a plurality of register banks and processes a data string including a plurality of data elements retained in the plurality of register banks, wherein: the plurality of register banks each have a read pointer 113 that points to a read position for reading the data elements; and the start position of the read pointer 113 is changed from one register bank to another. For example, consecutive numbers assigned to the register banks may be used as the read start positions of the respective register banks.

Type: Grant

Filed: March 7, 2008

Date of Patent: November 20, 2012

Assignee: NEC Corporation

Inventor: Noritaka Hoshi
Implementing vector memory operations

Patent number: 8316216

Abstract: In one embodiment, the present invention includes an apparatus having a register file to store vector data, an address generator coupled to the register file to generate addresses for a vector memory operation, and a controller to generate an output slice from one or more slices each including multiple addresses, where the output slice includes addresses each corresponding to a separately addressable portion of a memory. Other embodiments are described and claimed.

Type: Grant

Filed: October 21, 2009

Date of Patent: November 20, 2012

Assignee: Intel Corporation

Inventors: Roger Espasa, Joel Emer, Geoff Lowney, Roger Gramunt, Santiago Galan, Toni Juan, Jesus Corbal, Federico Ardanaz, Isaac Hernandez
Method and system for interprocedural prefetching

Patent number: 8312442

Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.

Type: Grant

Filed: December 10, 2008

Date of Patent: November 13, 2012

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Vector Slot Processor Execution Unit for High Speed Streaming Inputs

Publication number: 20120284487

Abstract: A vector slot processor that is capable of supporting multiple signal processing operations for multiple demodulation standards is provided. The vector slot processor includes a plurality of micro execution slot (MES) that performs the multiple signal processing operations on the high speed streaming inputs. Each of the MES includes one or more n-way signal registers that receive the high speed streaming inputs, one or more n-way coefficient registers that store filter coefficients for the multiple signal processing, and one or more n-way Multiply and Accumulate (MAC) units that receive the high speed streaming inputs from the one or more n-way signal registers and filter coefficients from one or more n-way coefficient registers. The one or more n-way MAC units perform a vertical MAC operation and a horizontal multiply and add operation on the high speed streaming inputs.

Type: Application

Filed: May 2, 2012

Publication date: November 8, 2012

Applicant: Saankhya Labs Private Limited

Inventors: Anindya SAHA, Gururaj PADAKI, Santosh BILLAVA, Rakesh A. JOSHI
Device and method for finding extreme values in a data block

Patent number: 8296548

Abstract: A method for locating an extreme value data chunk within a data block, the method includes: fetching, by a processor, an instruction; fetching, in response to a content of the instruction, a data unit that comprises multiple data chunks; selectively masking the fetched data chunks in response to a value of a mask; comparing, by a hardware accelerator, between values of valid data chunks to provide a extreme value data chunk; wherein valid data chunks include un-masked data chunks that belong to the data block; updating the value of the mask and jumping to the stage of fetching a new data unit, until the whole data block is fetched.

Type: Grant

Filed: January 18, 2006

Date of Patent: October 23, 2012

Assignee: Freescale Semiconductor, Inc.

Inventors: Moti Dvir, Evgeni Ginzburg, Adi Katz
Area efficient selector circuit

Patent number: 8264391

Abstract: A signal converting system has a multi-segment digital to analog converter coupled to an error shaping loop. A control value is received at a vector processor that indicates a number N of elements that are to be selected from a vector having M elements. The elements of the vector are sorted into a bitonic sequence and separated into a larger value group and a smaller value group using a bitonic split. Only the larger value group is sorted into an ordered sequence with repeated bitonic splits when the control value is less than M/2, and N largest elements are selected from the ordered sequence. Only the smaller value group is sorted into an ordered sequence with repeated bitonic splits when the control value is greater than M/2, and N?M/2 largest elements are selected from the ordered sequence.

Type: Grant

Filed: October 12, 2010

Date of Patent: September 11, 2012

Assignee: Texas Instruments Incorporated

Inventor: Yanto Suryono
CONFIGURABLE VECTOR LENGTH COMPUTER PROCESSOR

Publication number: 20120221830

Abstract: A processor core, comprises one or more vector units operable to change between a fine-grained vector mode having a shorter maximum vector length and a coarse-grained vector mode having a longer maximum vector length. Changing vector modes comprises halting all instruction stream execution in the core, flushing one or more registers in a register space, reconfiguring one or more vector registers in the register space, and restarting instruction execution in the core.

Type: Application

Filed: February 29, 2012

Publication date: August 30, 2012

Applicant: CRAY INC.

Inventors: Gregory J. Faanes, Eric P. Lundberg, Abdulla Bataineh, Timothy J. Johnson, Michael Parker, James Robert Kohn, Steven L. Scott, Robert Alverson
APPARATUS AND METHOD OF SINGLE-INSTRUCTION, MULTIPLE-DATA VECTOR OPERATION MASKING

Publication number: 20120216011

Abstract: An apparatus, method, and medium for performing a vector operation on portions of one or more source vector registers. A vector unit performs an operation on the source vector registers and only stores results in the target vector register for elements which are selected by the vector operation mask. The vector operation mask can be read by the vector unit or loaded into the vector unit for each instruction cycle. The vector operation mask allows the vector unit to be used with partially filled source vector registers and eliminates the need for scalar operations to be performed on vector data.

Type: Application

Filed: February 18, 2011

Publication date: August 23, 2012

Inventors: Darryl Gove, David Weaver
SHARING A FAULT-STATUS REGISTER WHEN PROCESSING VECTOR INSTRUCTIONS

Publication number: 20120192005

Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition.

Type: Application

Filed: April 20, 2011

Publication date: July 26, 2012

Applicant: APPLE INC.

Inventor: Jeffry E. Gonion
VECTOR CONFLICT INSTRUCTIONS

Publication number: 20120166761

Abstract: A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector.

Type: Application

Filed: December 22, 2010

Publication date: June 28, 2012

Inventors: Christopher J. Hughes, Mark J. Charney, Yen-Kuang Chen, Jesus Corbal, Andrew T. Forsyth, Milind B. Girkar, Jonathan C. Hall, Hideki Ido, Robert Valentine, Jeffrey Wiedemeier
Method and apparatus for executing program code

Patent number: 8209525

Abstract: The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction.

Type: Grant

Filed: April 7, 2009

Date of Patent: June 26, 2012

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
SYSTEM, DEVICE, AND METHOD FOR ON-THE-FLY PERMUTATIONS OF VECTOR MEMORIES FOR EXECUTING INTRA-VECTOR OPERATIONS

Publication number: 20120131308

Abstract: A device system and method for processing program instructions, for example, to execute intra vector operations. A fetch unit may receive a program instruction defining different operations on data elements stored at the same vector memory address. A processor may include different types of execution units each executing a different one of a predetermined plurality of elemental instructions. Each program instruction may be a combination of one or more of the elemental instructions. The processor may receive a vector of data elements stored non-consecutively at the same vector memory address to be processed by a same one of the elemental instructions and a vector of configuration values independently associated with executing the same elemental instruction on the non-consecutive data elements. At least two configuration values may be different to implement different operations by executing the same elemental instruction using the different configuration values on the vector of non-consecutive data elements.

Type: Application

Filed: November 18, 2010

Publication date: May 24, 2012

Inventors: Yaakov Dekter, Michael Boukaya, Shai Shpigelblat, Moshe Steinberg
VECTOR PROCESSING CIRCUIT, COMMAND ISSUANCE CONTROL METHOD, AND PROCESSOR SYSTEM

Publication number: 20120124332

Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.

Type: Application

Filed: October 24, 2011

Publication date: May 17, 2012

Applicant: FUJITSU LIMITED

Inventors: GE Yi, Yoshimasa Takebe, Hiromasa Takahashi
Scalar precision float implementation on the “W” lane of vector unit

Patent number: 8169439

Abstract: Embodiments of the invention are generally related to image processing, and more specifically to vector units for supporting image processing. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved.

Type: Grant

Filed: October 23, 2007

Date of Patent: May 1, 2012

Assignee: International Business Machines Corporation

Inventors: David Arnold Luick, Eric Oliver Mejdrich, Adam James Muff
STALL PROPAGATION IN A PROCESSING SYSTEM WITH INTERSPERSED PROCESSORS AND COMMUNICATON ELEMENTS

Publication number: 20120102299

Abstract: A processing system includes processors and dynamically configurable communication elements (DCCs) coupled together in an interspersed arrangement. A source device may transfer a data item through an intermediate subset of the DCCs to a destination device. The source and destination devices may each correspond to different processors, DCCs, or input/output devices, or mixed combinations of these. In response to detecting a stall after the source device begins transfer of the data item to the destination device and prior to receipt of all of the data item at the destination device, a stalling device is operable to propagate stalling information through one or more of the intermediate subset towards the source device. In response to receiving the stalling information, at least one of the intermediate subset is operable to buffer all or part of the data item.

Type: Application

Filed: December 30, 2011

Publication date: April 26, 2012

Inventors: Michael B. Doerr, William H. Hallidy, David A. Gibson, Craig M. Chase
Area Efficient Selector Circuit

Publication number: 20120086591

Abstract: A signal converting system has a multi-segment digital to analog converter coupled to an error shaping loop. A control value is received at a vector processor that indicates a number N of elements that are to be selected from a vector having M elements. The elements of the vector are sorted into a bitonic sequence and separated into a larger value group and a smaller value group using a bitonic split. Only the larger value group is sorted into an ordered sequence with repeated bitonic splits when the control value is less than M/2, and N largest elements are selected from the ordered sequence. Only the smaller value group is sorted into an ordered sequence with repeated bitonic splits when the control value is greater than M/2, and N?M/2 largest elements are selected from the ordered sequence.

Type: Application

Filed: October 12, 2010

Publication date: April 12, 2012

Inventor: Yanto Suryono
Method and apparatus for data stream alignment support

Patent number: 8156310

Abstract: One embodiment of the present method and apparatus for data stream alignment support includes retrieving a first input from a first register file, retrieving a second input from a second register file, the second register file being dedicated to a stream shift unit and performing the stream shift instruction in accordance with the first input, the second input and a third input.

Type: Grant

Filed: September 11, 2006

Date of Patent: April 10, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John-David Wellman, Peng Wu
VECTOR LOGICAL REDUCTION OPERATION IMPLEMENTED ON A SEMICONDUCTOR CHIP

Publication number: 20120079233

Abstract: A semiconductor processor is described. The semiconductor processor includes logic circuitry to perform a logical reduction instruction. The logic circuitry has swizzle circuitry to swizzle a vector's elements so as to form a swizzle vector. The logic circuitry also has vector logic circuitry to perform a vector logic operation on said vector and said swizzle vector.

Type: Application

Filed: September 24, 2010

Publication date: March 29, 2012

Inventors: Jeff Wiedemeier, Sridhan Samudrala, Roger Golliver
SIMD processor performing fractional multiply operation with saturation history data processing to generate condition code flags

Patent number: 8131981

Abstract: A data processing system, apparatus and method for performing fractional multiply operations is disclosed. The system includes a memory that stores instructions for SIMD operations and a processing core. The processing core includes registers that store operands for the fractional multiply operations. A coprocessor included in the processing core performs the fractional multiply operations on the operands and stores the result in a destination register that is also included in the processing core.

Type: Grant

Filed: August 12, 2009

Date of Patent: March 6, 2012

Assignee: Marvell International Ltd.

Inventors: Nigel C. Paver, Bradley C. Aldrich
Check-hazard instructions for processing vectors

Patent number: 8131979

Abstract: The described embodiments provide a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses. During operation, the system receives a first input vector and a second input vector. The first input vector includes a number of elements containing memory addresses for a first memory operation, while the second input vector includes a number of elements containing memory addresses for a second memory operation, wherein the first memory operation occurs before the second memory operation in program order. The system then determines elements in the first and second input vectors where the memory addresses indicate that a dependency exists between the memory operations. The system next generates a result vector, wherein the result vector indicates the elements where dependencies exist between the memory operations.

Type: Grant

Filed: April 7, 2009

Date of Patent: March 6, 2012

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff, Jr.
PARALLEL COMPARISON/SELECTION OPERATION APPARATUS, PROCESSOR, AND PARALLEL COMPARISON/SELECTION OPERATION METHOD

Publication number: 20120023308

Abstract: Provided is a parallel comparison/selection operation apparatus which efficiently executes a search for a maximum value or a search for a minimum value with an index. The parallel comparison/selection operation apparatus includes a vector comparison/selection unit 242 that compares each element included in vector data 1 and vector data 2 for each corresponding element using the vector data 1 and the vector data 2, selects one element of the vector data 1 and the vector data 2 based on the comparison result, and generates vector data 3 including the selected element, and an index vector selection unit 243 that selects one element of an index vector 1 and an index vector 2 based on the comparison result vector using the index vector 1 of the vector data 1, the index vector 2 of the vector data 2, and the comparison result vector to generate and output an index vector 3 including the selected element.

Type: Application

Filed: January 25, 2010

Publication date: January 26, 2012

Applicants: RENESAS ELECTRONICS CORPORATION, NEC CORPORATION

Inventors: Takahiro Kumura, Hideki Matsuyama
Information handling system including a processor with a bifurcated issue queue

Patent number: 8103852

Abstract: An information handling system includes a processor with a bifurcated unified issue queue that may perform unified issue queue VSU store instruction dependency operations. The bifurcated unified issue queue BUIQ maintains VSU store instructions in the form of internal operations data. The BUIQ includes a unified issue queue UIQ 0 and a unified issue queue UIQ 1. The BUIQ may manage a particular VSU store instruction from one UIQ to determine data dependencies and employ the other UIQ to determine address dependencies of that particular VSU store instruction. The UIQs employ a dependency matrix including a dependency array. The dependency array data maintains both data and address dependency information. The particular VSU store instruction issues to execution units such as VSUs for data dependency information and load store units (LSUs) for address dependency information. A particular VSU store instruction may execute to provide data dependency information independent of address dependency information.

Type: Grant

Filed: December 22, 2008

Date of Patent: January 24, 2012

Assignee: International Business Machines Corporation

Inventors: James Wilson Bishop, Mary Douglass Brown, William Elton Burky, Todd Alan Venton
Multi-channel timing recovery system

Patent number: 8094768

Abstract: The present invention discloses a novel multi-channel timing recovery scheme that utilizes a shared CORDIC to accurately compute the phase for each tone. Then a hardware-based linear combiner module is used to reconstruct the best phase estimate from multiple phase measurements. The firmware monitors the noise variance for the pilot tones and determines the corresponding weight for each tone to ensure that the minimum phase jitter noise is achieved through the linear combiner. Then a hardware-based second-order timing recovery control loop generates the frequency reference signal for VCXO or DCXO. A single sequentially controlled multiplier is used for all multiplications in the control loop.

Type: Grant

Filed: December 21, 2006

Date of Patent: January 10, 2012

Assignee: Triductor Technology (Suzhou) Inc.

Inventor: Yaolong Tan
VARIABLE WIDTH VECTOR INSTRUCTION PROCESSOR

Publication number: 20110320765

Abstract: A computer processor, method, and computer program product for executing vector processing instructions on a variable width vector register file. An example embodiment is a computer processor that includes an instruction execution unit coupled to a variable width vector register file which contains a number of vector registers, the width of the vector registers is changeable during operation of the computer processor.

Type: Application

Filed: June 28, 2010

Publication date: December 29, 2011

Applicant: International Business Machines Corporation

Inventors: Tejas Karkhanis, Jose E. Moreira, Valentina Salapura
METHOD FOR VECTOR PROCESSING

Publication number: 20110314254

Abstract: The present application relates to a method for processing data in a vector processor. The present application relates also to a vector processor for performing said method and a cellular communication device comprising said vector processor. The method for processing data in a vector processor comprises executing segmented operations on a segment of a vector for generating results, collecting the results of the segmented operations, and delivering the results in a result vector in such a way that subsequent operations remain processing in vector mode.

Type: Application

Filed: May 29, 2009

Publication date: December 22, 2011

Applicant: NXP B.V.

Inventors: Mahima Smriti, Jean-Paul Charles Francois Hubert Smeets, Willem Egbert Hendrik Kloosterhuis
Multithreaded processor with multiple concurrent pipelines per thread

Patent number: 8074051

Abstract: A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.

Type: Grant

Filed: April 1, 2005

Date of Patent: December 6, 2011

Assignee: Aspen Acquisition Corporation

Inventors: Erdem Hokenek, Mayan Moudgill, Michael J. Schulte, C. John Glossner
Macroscalar processor architecture

Patent number: 8065502

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Grant

Filed: November 6, 2009

Date of Patent: November 22, 2011

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
RECONFIGURABLE PROCESSOR AND RECONFIGURABLE PROCESSING METHOD

Publication number: 20110219207

Abstract: A reconfigurable processor for efficiently performing a vector operation, and a method of controlling the reconfigurable processor are provided. The reconfigurable processor designates at least one of a plurality of processing elements as a vector lane based on vector lane configuration information, and allocates a vector operation to the designated vector lane.

Type: Application

Filed: January 10, 2011

Publication date: September 8, 2011

Applicant: Samsung Electronics Co., Ltd.

Inventors: Dong-Kwan Suh, Hyeong-Seok Yu, Suk-Jin Kim
Automatic instruction set architecture generation

Patent number: 7971197

Abstract: A digital computer system automatically creates an Instruction Set Architecture (ISA) that potentially exploits VLIW instructions, vector operations, fused operations, and specialized operations with the goal of increasing the performance of a set of applications while keeping hardware cost below a designer specified limit, or with the goal of minimizing hardware cost given a required level of performance.

Type: Grant

Filed: August 18, 2005

Date of Patent: June 28, 2011

Assignee: Tensilica, Inc.

Inventors: David William Goodwin, Dror Maydan, Ding-Kai Chen, Darin Stamenov Petkov, Steven Weng-Kiang Tjiang, Peng Tu, Christopher Rowen
MULTI-STAGE RECONFIGURATION DEVICE AND RECONFIGURATION METHOD, LOGIC CIRCUIT CORRECTION DEVICE, AND RECONFIGURABLE MULTI-STAGE LOGIC CIRCUIT

Publication number: 20110153980

Abstract: To provide a device to reconfigure multi-level logic networks, which enable logic modification and reconfiguration of a multi-level logic network with small circuit area and low-power dissipation in a simple manner. For example, in the case of reconfiguring a multi-level logic network following logic modification for deleting an output vector F(b) of an objective logic function F(X) corresponding to an input vector b, unmodified pq elements are selected one by one from the nearest pq element EG to an output side. At this time, among output values of pq elements closer to an input side than selected pq elements, output values corresponding to the input vector, which equal an output value corresponding to any input variable X other than the input vector b are considered modified and thus not selected. Then, a selected output value corresponding to the input vector b is rewritten to an “invalid value”.

Type: Application

Filed: March 2, 2007

Publication date: June 23, 2011

Applicant: KYUSHU INSTITUTE OF TECHNOLOGY

Inventors: Tsutomu Sasao, Kazuto Ishida
GENERATE PREDICTES INSTRUCTION FOR PROCESSING VECTORS

Publication number: 20110113217

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a first input vector, a second input vector, and optionally receiving a predicate vector (each of which includes N elements) as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector was received, for each element of the result vector for which the corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor determines elements that are to be set in the result vector based on values in elements in the first input vector and the second input vector. The processor then sets the determined elements of the result vector to a first predetermined value.

Type: Application

Filed: January 13, 2011

Publication date: May 12, 2011

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Method of operation for parallel LCP solver

Patent number: 7937359

Abstract: A method of operating a Linear Complementarity Problem (LCP) solver is disclosed, where the LCP solver is characterized by multiple execution units operating in parallel to implement a competent computational method adapted to resolve physics-based LCPs in real-time.

Type: Grant

Filed: April 27, 2009

Date of Patent: May 3, 2011

Assignee: NVIDIA Corporation

Inventors: Lihua Zhang, Richard Tonge, Dilip Sequeira, Monier Maher
METHOD AND STRUCTURE OF USING SIMD VECTOR ARCHITECTURES TO IMPLEMENT MATRIX MULTIPLICATION

Publication number: 20110055517

Abstract: A structure (and method) including a plurality of coprocessing units and a controller that selectively loads data for processing on the plurality of coprocessing units, using a compound loading instruction. The compound loading instruction includes a plurality of low-level software instructions that preliminarily processes input data in a manner predetermined to simulate an effect of a single hardware loading instruction that would provide optimal loading of complex matrix data by loading input data in accordance with the effect of multiplying i·i=?1.

Type: Application

Filed: August 26, 2009

Publication date: March 3, 2011

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alexandre E. Eichenberger, Michael Karl Gschwind, John A. Gunnels, Fred Gehrung Gustavson, Brett Olsson
Logic controller having hard-coded control logic and programmable override control store entries

Patent number: 7895379

Abstract: Control logic of a node controller receives an input vector and produces an output vector. The control logic includes a plurality of tied control store entries including hard-coded logic to identify unique values of the input vector and to produce the output vector from a hard-coded output vector when the input vector is identified and when the tied control store is enabled. The control logic also includes a plurality of spare control store entries including programmable logic configurable to identify values of the input vector and to produce the output vector from a programmable output vector when the input vector is identified and when the spare control store is enabled. One of the spare control store entries that is configured to identify a value of the input vector that none of the tied control store entries that are enabled by the entry-enables register are configured to identify is enabled.

Type: Grant

Filed: December 23, 2008

Date of Patent: February 22, 2011

Assignee: Unisys Corporation

Inventors: Ross M. Weber, David R. Spatafore
SELECT FIRST AND SELECT LAST INSTRUCTIONS FOR PROCESSING VECTORS

Publication number: 20110035568

Abstract: The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a vector instruction that uses a first input vector, a second input vector, and a control vector, and optionally a predicate vector as inputs, wherein each of the vectors includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, the processor determines a key element position. If the predicate vector is received, the key element position is a predetermined active element position in the predicate vector, otherwise, the key element position is in a predetermined element position. The processor then uses the key element position to copy a result value into a result variable.

Type: Application

Filed: October 19, 2010

Publication date: February 10, 2011

Applicant: APPLE INC.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Work-efficient parallel prefix sum algorithm for graphics processing units

Patent number: 7877573

Abstract: One embodiment of the present invention sets forth a technique for computing a parallel prefix sum using one or more cooperative thread arrays (CTA) within a graphics processing unit. The prefix sum input list is partitioned and distributed to each CTA. Within each CTA, the input list is further partitioned for processing by individual threads in a way that avoids access conflicts to memory. Each list partition within the CTA is assigned to one of a plurality of concurrent threads, which executes a prefix sum operation the partition. The final values of the prefix sum operations form a list that is then subjected to a second prefix sum operation. Each element of the second prefix sum operation is added to each element of the subsequent partition, completing the prefix sum operation within the CTA. This technique may be extended to prefix sum operations that span two or more CTAs.

Type: Grant

Filed: August 8, 2007

Date of Patent: January 25, 2011

Assignee: NVIDIA Corporation

Inventor: Scott M. Le Grand
Method and system for efficient matrix multiplication in a SIMD processor architecture

Patent number: 7873812

Abstract: The new system provides for efficient implementation of matrix multiplication in a SIMD processor. The new system provides ability to map any element of a source vector register to be paired with any element of a second source vector register for vector operations, and specifically vector multiply and vector-multiply-accumulate operations to implement a variety of matrix multiplications without the additional permute or data re-ordering instructions. Operations such as DCT and Color-space transformations for video processing could be very efficiently implemented using this system.

Type: Grant

Filed: April 5, 2004

Date of Patent: January 18, 2011

Inventor: Tibet Mimar
Repartitioning parallel SVM computations using dynamic timeout

Patent number: 7865898

Abstract: A system that reduces execution time of a parallel SVM application. During operation, the system partitions an input data set into chunks of data. Next, the system distributes the partitioned chunks of data across a plurality of available computing nodes and executes the parallel SVM application on the chunks of data in parallel across the plurality of available computing nodes. The system then determines if a first timeout period has been exceeded before all of the plurality of available computing nodes have finished processing their respective chunks of data. If so, the system (1) repartitions the input data set into different chunks of data; (2) redistributes the repartitioned chunks of data across some or all of the plurality of available computing nodes; and (3) executes the parallel SVM application on the repartitioned chunks of data in parallel across some or all of the available computing nodes.

Type: Grant

Filed: January 27, 2006

Date of Patent: January 4, 2011

Assignee: Oracle America, Inc.

Inventors: Kalyanaraman Vaidyanathan, Kenny C. Gross
SYSTEM AND METHOD FOR MANAGING PROCESSOR-IN-MEMORY (PIM) OPERATIONS

Publication number: 20100318764

Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for operations that are vectorizable. The vectorizable operations are examined to determine whether they should be executed at least in part in a vector atomic memory operation (AMO) functional unit attached to memory. If so, the compiled code includes vector AMO instructions.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Cray Inc.

Inventor: Terry D. Greyzck
Data processing apparatus and method for handling vector instructions

Publication number: 20100312988

Abstract: A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes.

Type: Application

Filed: January 19, 2010

Publication date: December 9, 2010

Applicant: ARM LIMITED

Inventors: Andreas BJÖRKLUND, Erik Persson, Ola Hugosson
Data processing apparatus and method for performing a predetermined rearrangement operation

Publication number: 20100313060

Abstract: A data processing apparatus and method are provided for performing a predetermined rearrangement operation. The data processing apparatus comprises a vector register bank having a plurality of vector registers, with each vector register comprising a plurality of storage cells such that the plurality of vector registers provide a matrix of storage cells. Each storage cell is arranged to store a data element. A vector processing unit is provided for executing a sequence of vector instructions in order to apply operations to the data elements held in the vector register bank. Responsive to a vector matrix rearrangement instruction specifying a predetermined rearrangement operation to be performed on the data elements in the matrix of storage cells, the vector processing unit is arranged to issue a set rearrangement enable signal to the vector register bank.

Type: Application

Filed: January 19, 2010

Publication date: December 9, 2010

Applicant: ARM LIMITED

Inventors: Andreas Björklund, Erik Persson, Ola Hugosson
Runtime optimization of distributed execution graph

Patent number: 7844959

Abstract: A general purpose high-performance distributed execution engine for coarse-grained data-parallel applications is proposed that allows developers to easily create large-scale distributed applications without requiring them to master concurrency techniques beyond being able to draw a graph of the data-dependencies of their algorithms. Based on the graph, a job manager intelligently distributes the work load so that the resources of the execution engine are used efficiently. During runtime, the job manager (or other entity) can automatically modify the graph to improve efficiency. The modifications are based on runtime information, topology of the distributed execution engine, and/or the distributed application represented by the graph.

Type: Grant

Filed: September 29, 2006

Date of Patent: November 30, 2010

Assignee: Microsoft Corporation

Inventor: Michael A. Isard
Arrangement, system and method for vector permutation in single-instruction multiple-data mircoprocessors

Patent number: 7809931

Abstract: A vector permutation system (100) for a single-instruction multiple-data microprocessor has a set of vector registers (110) which feed vectors to permutation logic (120) and then to a negate block (130) where they are permuted and selectively negated according to control parameters received from a selected one of a set of control registers (140). A control arrangement (145, 150) selects which control register is to provide the control parameters. In this way no separate permutation instructions are necessary or need to be executed, and no permutation parameters need to be stored in the vector registers (10). This leads to higher performance, a smaller vector registers file and hence a smaller size of the microprocessor and better program code density.

Type: Grant

Filed: October 6, 2003

Date of Patent: October 5, 2010

Assignee: Freescale Semiconductor, Inc.

Inventor: Martin Raubuch
Processing unit incorporating vectorizable execution unit

Patent number: 7809925

Abstract: A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units.

Type: Grant

Filed: December 7, 2007

Date of Patent: October 5, 2010

Assignee: International Business Machines Corporation

Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs
Method and arrangement for bringing together data on parallel data paths

Patent number: 7779229

Abstract: A processor arrangement having a strip structure for parallel data processing is configured so that local data from the individual processing units or strips is brought together in a rapid manner. Input data, intermediate data and/or output data from various processing units are linked together in an operation which is at least partially combinatory. The data linking operation is not clock controlled. The linking of the local data from various strips in this manner reduces delays in parallel data processing in the processor arrangement. The combinatory data linking operation can provide an overall data linking outcome within an individual clock cycle.

Type: Grant

Filed: February 12, 2003

Date of Patent: August 17, 2010

Assignee: NXP B.V.

Inventor: Wolfram Drescher
Method for providing physics simulation data

Patent number: 7739479

Abstract: A method of providing physics data within a game program or simulation using a hardware-based physics processing unit having unique architecture designed to efficiently calculate physics related data.

Type: Grant

Filed: November 19, 2003

Date of Patent: June 15, 2010

Assignee: NVIDIA Corporation

Inventors: Jean Pierre Bordes, Curtis Davis, Monier Maher, Manju Hegde, Otto A. Schmid
On demand software contract modification and termination in running component assemblies

Patent number: 7735090

Abstract: A method, apparatus and article of manufacture to dynamically modify, terminate, or replace software components and connections (i.e., contracts) between components in a running assembly. Information about the component and contracts between components in a running assembly is used to determine an allowable sequence of management commands to transition the assembly of components from a current state to a specified goal state. At the same time, other components may continue to perform an operational workflow.

Type: Grant

Filed: December 15, 2005

Date of Patent: June 8, 2010

Assignee: International Business Machines Corporation

Inventors: James E. Carey, Scott N. Gerard

prev 1 2 3 4 5 6 next