Pipeline Patents (Class 708/521)

Power reduction in processor pipeline by detecting zeros

Patent number: 10901492

Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.

Type: Grant

Filed: March 29, 2019

Date of Patent: January 26, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Nafea Bshara, Ron Diamant, Randy Renfu Huang, Ali Ghassan Saidi
Encrypted and authenticated data frame

Patent number: 10182039

Abstract: At a source network device, data is compiled into a plurality of data blocks for transmission in a data frame over a network to a destination network device. The plurality of data blocks are arranged into a plurality of data block groups such that each data block group comprises a predetermined number of data blocks. Encryption information is generated for each of the plurality of data blocks groups. The encryption information identifies an encryption key for each of the plurality of data block groups. Overhead data configured to allow the destination network device to align and decode the data frame is generated. The data frame is transmitted from the source network device to the destination network device such that the encryption information for each of the plurality of data block groups is transmitted consecutively with a respective data block group, and a portion of the overhead data is transmitted prior to each consecutive transmission of encryption information with a data block group.

Type: Grant

Filed: February 4, 2016

Date of Patent: January 15, 2019

Assignee: Cisco Technology, Inc.

Inventors: Gilberto Loprieno, Davide Codella
Barrel with substantially constant torque

Patent number: 9671754

Abstract: A barrel for a timepiece. The barrel has a device for limiting the number of running rotations of the barrel, wherein this device still allows any number of winding rotations of the barrel.

Type: Grant

Filed: March 2, 2016

Date of Patent: June 6, 2017

Assignee: Glashütter Uhrenbetrieb GmbH

Inventors: Patrick Streubel, Silko Goldmann
Processing fixed and variable length numbers

Patent number: 9507564

Abstract: Embodiments of a processor are disclosed for performing arithmetic operations on variable-length and fixed-length machine independent numbers. The processor may include a floating point unit, and a logic circuit. The number unit may be configured to receive an operation, and first and second operands. Each of the first and second operands may include a sign byte, and multiple mantissa bytes, and may be processed in response to a determination that the operands are fixed-length numbers. The logic circuit may be further configured to perform the received operation on the processed first and second operands.

Type: Grant

Filed: April 14, 2014

Date of Patent: November 29, 2016

Assignee: Oracle International Corporation

Inventors: Jeffrey S Brooks, Christopher H Olson, Eugene Karichkin
Optimized 3D lighting computations using a logarithmic number system

Patent number: 9304739

Abstract: Embodiments of the present invention set forth a technique for optimizing the performance and efficiency of complex, software-based computations, such as lighting computations. Data entering a graphics application programming interface (API) in a conventional arithmetic representation, such as floating-point or fixed-point, is converted to an internal logarithmic representation for greater computational efficiency. Lighting computations are then performed using logarithmic space arithmetic routines that, on average, execute more efficiently than similar routines performed in a native floating-point format. The lighting computation results, represented as logarithmic space numbers, are converted back to floating-point numbers before being transmitted to a graphics processing unit (GPU) for further processing. Because of efficiencies of logarithmic space arithmetic, performance improvements may be realized relative to prior art approaches to performing software-based floating-point operations.

Type: Grant

Filed: December 11, 2006

Date of Patent: April 5, 2016

Assignee: NVIDIA Corporation

Inventor: Norbert Juffa
Apparatus implementing instructions that impose pipeline interdependencies

Patent number: 9183611

Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.

Type: Grant

Filed: July 3, 2013

Date of Patent: November 10, 2015

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Terence M. Potter
ARITHMETIC PROCESSING DEVICE

Publication number: 20140317164

Abstract: An arithmetic processing device includes: an arithmetic unit configured to execute an arithmetic operation; and a stream engine configured to execute stream processing, wherein a data bus of the arithmetic unit and a data bus of the stream engine are tightly coupled with each other.

Type: Application

Filed: February 21, 2014

Publication date: October 23, 2014

Applicant: FUJITSU LIMITED

Inventors: Kazuhiro Yoshimura, Yi GE, Kazuo HORIO
Systems and methods for reducing logic switching noise in parallel pipelined hardware

Patent number: 8739101

Abstract: A method of configuring a hardware design for a pipelined parallel stream processor includes obtaining a scheduled graph representing a processing operation in the time domain as a function of clock cycles. The graph includes a data path to be implemented in hardware as part of the stream processor, an input, an output, and parallel branches to enable data values to be streamed therethrough from the input to the output as a function of increasing clock cycle. The data path is partitioned into a plurality of discrete regions, each region operating on a different clock phase and having discrete control logic elements. Phase transition registers to align data separated by a boundary between regions having different clock phases are introduced into the data path at the boundary. The graph and control logic elements define a hardware design for the pipelined parallel stream processor.

Type: Grant

Filed: November 21, 2012

Date of Patent: May 27, 2014

Assignee: Maxeler Technologies Ltd.

Inventor: Robert Gwilym Dimond
Method for enhancing the computation of CSS and accuracy of computing hardware and to promote the computation speed

Patent number: 8719303

Abstract: This invention proposed a new algorithm. By multiply the proposed weight coefficients of this invention, CSP and CSS can be computed without computing for the mean(s) of the data. After the proposed weight coefficients of this invention undergo factorization, it can promote a new recursive and real time updatable computation method. To test the accuracy of the new invention, the StRD data were separately tested using SAS ver 9.0, SPSS ver15.0 and EXCEL 2007 for comparison. The results showed that the accuracy of the results of the proposed invention exceeds the level of accuracy of SAS ver9.0, SPSS ver15.0 and EXCEL 2007. Aside from an accurate computation, this new invented algorithm can also produce efficient computations.

Type: Grant

Filed: December 24, 2008

Date of Patent: May 6, 2014

Inventors: Juei-Chao Chen, Kuo-Hung Lo, Tien-Lung Sun
System and method of bypassing unrounded results in a multiply-add pipeline unit

Patent number: 8671129

Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.

Type: Grant

Filed: March 8, 2011

Date of Patent: March 11, 2014

Assignee: Oracle International Corporation

Inventors: Jeffrey S. Brooks, Christopher H. Olson
Method and apparatus for QR-factorizing matrix on a multiprocessor system

Patent number: 8543626

Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.

Type: Grant

Filed: July 27, 2012

Date of Patent: September 24, 2013

Assignee: International Business Machines Corporation

Inventors: Hui Li, Bai Ling Wang
Logic structures and methods supporting pipelined multi-operand adders

Patent number: 8510357

Abstract: Circuitry for adding together three long numbers may include the formation of redundant form sum bit signals and redundant form carry bit signals. These signals may be finally combined in a ripple carry adder chain that produces sum bit output signals and ripple carry bit signals. Both a ripple carry bit signal and a redundant form carry bit signal must be passed from the circuitry performing each place of the addition to the circuitry performing the next-more-significant place of the addition. Various techniques are disclosed for facilitating subdividing long chains of such circuitry, as well as possibly including (between such subdivisions) “pipeline” registers for both ripple and redundant form carry bit signals.

Type: Grant

Filed: January 31, 2013

Date of Patent: August 13, 2013

Assignee: Altera Corporation

Inventor: Martin Langhammer
EFFICIENT FIR FILTERS

Publication number: 20130191431

Abstract: A processor for calculating a convolution of a first input sequence of numbers with a second input sequence of numbers to generate an output sequence is provided. The processor includes multipliers, each multiplying two real numbers to generate an output; multiplexers to direct the numbers in the first and second input sequences or parts of the numbers to the multipliers; and control circuitry to control the multiplexers to direct the first and second input sequences of numbers to the multipliers dependent on whether the numbers are complex or real. An accumulator adds partial products from multiplications performed by the multipliers to calculate the convolution.

Type: Application

Filed: January 19, 2012

Publication date: July 25, 2013

Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
Matrix decomposition in an integrated circuit device

Patent number: 8396914

Abstract: Circuitry speeds up the Cholesky decomposition of a matrix. The circuitry can be provided in a fixed logic device, or can be configured into a programmable integrated circuit device such as a programmable logic device. The circuitry implements the following equation: l ij = a ij - ? L i , L j ? a jj - ? L j , L j ? When any lij term is calculated this way, the latency in calculating the ljj term in the denominator has little or no effect on the lij term calculation. And if the calculations are properly pipelined, once the pipeline is filled, a new term can be output on each clock cycle or every few clock cycles.

Type: Grant

Filed: September 11, 2009

Date of Patent: March 12, 2013

Assignee: Altera Corporation

Inventor: Martin Langhammer
Logic structures and methods supporting pipelined multi-operand adders

Patent number: 8392488

Abstract: Circuitry for adding together three long numbers may include the formation of redundant form sum bit signals and redundant form carry bit signals. These signals may be finally combined in a ripple carry adder chain that produces sum bit output signals and ripple carry bit signals. Both a ripple carry bit signal and a redundant form carry bit signal must be passed from the circuitry performing each place of the addition to the circuitry performing the next-more-significant place of the addition. Various techniques are disclosed for facilitating subdividing long chains of such circuitry, as well as possibly including (between such subdivisions) “pipeline” registers for both ripple and redundant form carry bit signals.

Type: Grant

Filed: September 11, 2009

Date of Patent: March 5, 2013

Assignee: Altera Corporation

Inventor: Martin Langhammer
Method and apparatus for QR-factorizing matrix on multiprocessor system

Patent number: 8296350

Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.

Type: Grant

Filed: March 12, 2009

Date of Patent: October 23, 2012

Assignee: International Business Machines Corporation

Inventors: Hui Li, Bai Ling Wang
SYSTEM AND METHOD OF BYPASSING UNROUNDED RESULTS IN A MULTIPLY-ADD PIPELINE UNIT

Publication number: 20120233234

Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.

Type: Application

Filed: March 8, 2011

Publication date: September 13, 2012

Inventors: Jeffrey S. Brooks, Christopher H. Olson
Processor Pipeline which Implements Fused and Unfused Multiply-Add Instructions

Publication number: 20120221614

Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.

Type: Application

Filed: May 11, 2012

Publication date: August 30, 2012

Inventors: Jeffrey S. Brooks, Christopher H. Olson
Power supply control method and circuit in communication equipment

Patent number: 8161308

Abstract: A circuit includes: an input buffer for storing input data; a plurality of processing sections connected in series including a head processing section and a tail-end processing section to sequentially process the input data; and a power supply controller for controlling power supply to each of the plurality of processing sections depending on a lapse of time during which no input data is stored in the input buffer.

Type: Grant

Filed: March 27, 2009

Date of Patent: April 17, 2012

Assignee: NEC Corporation

Inventor: Hidenori Hisamatsu
High-speed precoders for communication systems

Patent number: 7769099

Abstract: The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.

Type: Grant

Filed: September 13, 2005

Date of Patent: August 3, 2010

Assignee: Leanics Corporation

Inventors: Keshab K. Parhi, Yongru Gu
Technique for implementing a security algorithm

Patent number: 7747020

Abstract: Performing a hash algorithm in a processor architecture to alleviate performance bottlenecks and improve overall algorithm performance. In one embodiment of the invention, the hash algorithm is pipelined within the processor architecture.

Type: Grant

Filed: December 4, 2003

Date of Patent: June 29, 2010

Assignee: Intel Corporation

Inventor: Wajdi K. Feghali
Methods and apparatus for efficient complex long multiplication and covariance matrix implementation

Publication number: 20100121899

Abstract: Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used.

Type: Application

Filed: January 19, 2010

Publication date: May 13, 2010

Applicant: Altera Corporation

Inventors: Gerald G. Pechanek, Ricardo Rodriguez, Matthew Plonski, David Strube, Kevin Coopman
Apparatus and method for cryptographic key expansion

Patent number: 7711955

Abstract: An apparatus and method for cryptographic key expansion. According to a first embodiment, a cryptographic unit may include key storage configured to store an expanded set of cipher keys for a cipher algorithm, and a key expansion pipeline comprising a plurality of pipeline stages. During a key expansion mode of operation, each pipeline stage may be configured to perform a corresponding step of generating a member of the expanded set of cipher keys according to a key expansion algorithm. During a cipher mode of operation, a portion of the key expansion pipeline may be configured to perform a step of the cipher algorithm.

Type: Grant

Filed: September 13, 2004

Date of Patent: May 4, 2010

Assignee: Oracle America, Inc.

Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
Method for enhancing the computation of CSS and accuracy of computing hardware and to promote the computation speed

Publication number: 20090172064

Abstract: This invention proposed a new algorithm. By multiply the proposed weight coefficients of this invention, CSP and CSS can be computed without computing for the mean(s) of the data. After the proposed weight coefficients of this invention undergo factorization, it can promote a new recursive and real time updatable computation method. To test the accuracy of the new invention, the StRD data were separately tested using SAS ver 9.0, SPSS ver 15.0 and EXCEL 2007 for comparison. The results showed that the accuracy of the results of the proposed invention exceeds the level of accuracy of SAS ver 9.0, SPSS ver 15.0 and EXCEL 2007. Aside from an accurate computation, this new invented algorithm can also produce efficient computations.

Type: Application

Filed: December 24, 2008

Publication date: July 2, 2009

Inventor: Juei-chao Chen
Method for Performing Decimal Division

Publication number: 20090132628

Abstract: A method for performing decimal division including receiving a scaled divisor and a scaled dividend into input registers. A subset of multiples of the scaled divisor is stored in a plurality of multiples registers. Quotient digits are calculated in response to the scaled divisor and the scaled dividend. Each quotient digit is calculated in three clock cycles by a pipeline mechanism. The calculating includes selecting a new quotient digit, and calculating a new remainder. Input to the calculating a new remainder includes data from one or more of the multiples registers.

Type: Application

Filed: January 23, 2009

Publication date: May 21, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Steven R. Carlough, Paulomi Kadakia, Wen H. Li, Eric M. Schwarz
METHOD FOR IMPLEMENTING MONTGOMERY MODULAR MULTIPLICATION AND DEVICE THEREFORE

Publication number: 20090037508

Abstract: Device for implementing modular multiplication, characterized in that it comprises at least one computation cell comprising a multiplier-adder comprising p pipelined logic-register pairs, receiving several digits to be added together and multiplied, at least two outputs corresponding to the low order and to the high order, an adder receiving the two outputs of the multiplier-adder, the number p being chosen in such a way that the maximum frequency of the multiplier-adder is greater than or equal to the maximum frequency of the adder.

Type: Application

Filed: March 31, 2006

Publication date: February 5, 2009

Inventors: Florent Bernard, Alain Sauzet, Eric Garrido
Method of packet encryption that allows for pipelining

Patent number: 7406595

Abstract: A method of packet encryption and decryption that allows for pipelining. The first step is to identify the packets in a message to be encrypted. Then, a unique number is assigned to each packet. A value R is acquired. Then, a first register is initialized. An initialization vector IV is generated. Then, the first register is stepped a user-definable number of times. Then, a packet is selected. R and the unique number are combined. Then, a second register is initialized. A checksum is generated. Then, the packet is divided into blocks. A block is selected. Then, the checksum is combined with the block and designated the checksum. The block is encrypted. Then, the first and second registers are stepped. These steps are repeated for each block. Then, the checksum is encrypted. After the blocks are encrypted, the unique number, IV, the ciphertext of each block, and the encrypted checksum are transmitted. If there are any other packets to encrypt then the steps are repeated.

Type: Grant

Filed: May 5, 2004

Date of Patent: July 29, 2008

Assignee: The United States of America as represented by the Director, National Security Agency

Inventors: Vincent Michael Boyle, Jr., Christopher Mark Salter
Digital image compositing using a programmable graphics processor

Patent number: 7274369

Abstract: Digital Image compositing using a programmable graphics processor is described. The programmable graphics processor supports high-precision data formats and can be programmed to complete a plurality of compositing operations in a single pass through a fragment processing pipeline within the programmable graphics processor. Source images for one or more compositing operations are stored in graphics memory, and a resulting composited image is output or stored in graphics memory. More-complex compositing operations, such as blur, warping, morphing, and the like, can be completed in multiple passes through the fragment processing pipeline. A composited image produced during a pass through the fragment processing pipeline is stored in graphics memory and is available as a source image for a subsequent pass.

Type: Grant

Filed: June 9, 2005

Date of Patent: September 25, 2007

Assignee: NVIDIA Corporation

Inventors: Rui M. Bastos, Daniel Elliott Wexler, Larry Gritz, Jonathan Rice, Harold Robert Feldman Zatz, Matthew N. Papakipos, David Kirk
Conditional vector arithmetic method and conditional vector arithmetic unit

Patent number: 7062633

Abstract: It is decided whether a first source data from the memory 101 is a data which is to be subjected to arithmetic or not by a state flag detection means 150, the result of the decision is retained as a state flag, and it is decided by a condition decision means 109 whether or not the state flag satisfies a condition for performing the arithmetic. A control means 110 controls whether an ALU 100 should perform the arithmetic or not on the basis of the condition satisfaction/dissatisfaction information.

Type: Grant

Filed: December 15, 1999

Date of Patent: June 13, 2006

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Mana Hamada, Shunichi Kuromaru, Tomonori Yonezawa, Tsuyoshi Nakamura
Arithmetic logic unit over finite field GF(2m)

Publication number: 20040158598

Abstract: Disclosed herein is an arithmetic logic unit over a finite field GF(2m). Arithmetic logic units consistent with the present invention are disclosed as implemented using a division algorithm based on a binary greatest common divisor algorithm and a Most Significant Bit-first multiplication algorithm. The arithmetic logic unit can perform both a multiplication and a division using shared logic. Since the arithmetic logic unit has no limitations in the selection of an irreducible polynomial, and it is very regular and easily formed as a module, the arithmetic logic unit of the present invention has high expansibility and flexibility with respect to the size m of a field. Further, since the arithmetic logic unit of the present invention can perform a multiplication and a division using shared logic, it is very suitable to implement an encryption system for application products requiring a small size, such as smart cards or wireless communication devices.

Type: Application

Filed: February 3, 2004

Publication date: August 12, 2004

Inventors: Chun Pyo Hong, ChangHoon Kim
Microprocessor with instruction for saturating and packing data

Patent number: 6748521

Abstract: A data processing system is provided with a digital signal processor which has an instruction for saturating multiple fields of a selected set of source operands and storing the separate saturated results in a selected destination register. A first 32-bit operand (600) and a second 32-bit operand (602) are treated as four 16-bit fields and the sixteen bits in each field are saturated separately. Multi-field saturation circuitry is operable to treat a source operand as a number of fields, such that a multi-field saturated (610) result is produced that includes a number of saturated results each corresponding to each field. One instruction is provided which treats an operand pair as having two packed fields, and another instruction is provided that treats the operand pair has having four packed fields. Saturation circuitry is operable to selectively treat a field as either a signed value or an unsigned value.

Type: Grant

Filed: October 31, 2000

Date of Patent: June 8, 2004

Assignee: Texas Instruments Incorporated

Inventor: David Hoyle
Processor and image processing device

Patent number: 6671708

Abstract: An image processing apparatus according to the present invention comprises a general arithmetic circuit 101 comprising a program control circuit 103, a first address generator 104, a first data memory 105, a first pipeline operation circuit 106, a second address generator 113, a second data memory 114 and a second pipeline operation circuit 112, and a dedicated arithmetic circuit 102 comprising a control circuit 115, a first dedicated pipeline operation circuit 107, a second dedicated pipeline operation circuit 108, . . . , an N-th dedicated pipeline operation circuit 110, as shown in FIG. 1. The arithmetic unit having the above-described structure, for example, can realize an arithmetic unit which can be applied to various applications. Further, considering the age of IP (Intellectual Property) which will come in the future, the arithmetic unit can exhibit the flexibility toward the applications.

Type: Grant

Filed: August 31, 2000

Date of Patent: December 30, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Shunichi Kuromaru, Mana Hamada, Tomonori Yonezawa, Masatoshi Matsuo, Tsuyoshi Nakamura, Masahiro Oohashi
High performance pipelined data path for a media processor

Patent number: 6282556

Abstract: A pipelined data path architecture for use, in one embodiment, in a multimedia processor. The data path architecture requires a maximum of two execution pipestages to perform all instructions including wide data format multiply instructions and specially adapted multimedia instructions, such as the sum of absolute differences (SABD) instruction and other multiply with add (MADD) instructions. The data path architecture includes two wide data format input registers that feed four partitioned 32×32 multiplier circuits. Within two pipestages, the multiply circuit can perform one 128×128 multiply operation, four 32×32 multiply operations, eight 16×16 multiply operations or sixteen 8×8 multiply operations in parallel. The multiply circuit contains a compressor tree which generates a 256-bit sum and a 256-bit carry vector. These vectors are supplied to four 64-bit carry propagate adder circuits which generate the multiply results.

Type: Grant

Filed: November 30, 1999

Date of Patent: August 28, 2001

Assignees: Sony Corporation of Japan, Sony Electronics, Inc.

Inventors: Farzad Chehrazi, Vojin G. Oklobdzija
Video signal processor with triple port memory

Patent number: 6052705

Abstract: A digital video signal processor using parallel processing includes an input serial-access memory having memory cells in which data is inputted into successive ones of the memory cells in response to a programmed-controlled pointer and a three or more port data memory unit for writing-in data read out from the serial-access memory. An arithmetic logic unit responds to stored-program control to read out data from the data memory, perform a program-prescribed arithmetic operation, and write the result of the arithmetic operation back to the data memory. An output serial-access memory is controlled so that the arithmetic result will be outputted under program control in a sequential manner. Operation of the interconnected components is effected by a stored-program control unit connected to the input serial-access memory, the data memory, the arithmetic logic unit, and the output serial-access memory.

Type: Grant

Filed: August 23, 1996

Date of Patent: April 18, 2000

Assignee: Sony Corporation

Inventors: Seiichiro Iwase, Masuyoshi Kurokawa, Takao Yamazaki, Mitsuharu Ohki
Pipeline stop circuit for external memory access

Patent number: 6049816

Abstract: A pipeline stop circuit for an external memory access which is capable of effectively performing a pipeline operation by temporarily stopping a pipeline operation, which is being operated, until data are prepared in the memory accessed, when accessing an external memory or a slow internal memory.

Type: Grant

Filed: December 30, 1997

Date of Patent: April 11, 2000

Assignee: LG Electronics, Inc.

Inventors: Bong-Kyun Kim, Jin-Hyeock Im
Elastic self-timed interface for data flow elements embodied as selective bypass of stages in an asynchronous microprocessor pipeline

Patent number: 5964866

Abstract: The invention relates to a processor having a data flow unit for processing data in a plurality of steps. In one version, the data flow unit includes a plurality of consecutive stages which include logic for performing steps of the data processing, the stages being coupled together by a data path, at least one stage being coupled to a transceiver which causes data to be provided to the stage for processing or to bypass the stage unprocessed in response to a stage enable signal; a synchronizer which receives processed data from the stages and causes the processed data to be provided to external logic in synchronization with a clock signal.

Type: Grant

Filed: October 24, 1996

Date of Patent: October 12, 1999

Assignee: International Business Machines Corporation

Inventors: Christopher McCall Durham, Peter Juergen Klim