Pipeline Patents (Class 708/521)
  • Patent number: 10901492
    Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: January 26, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Nafea Bshara, Ron Diamant, Randy Renfu Huang, Ali Ghassan Saidi
  • Patent number: 10182039
    Abstract: At a source network device, data is compiled into a plurality of data blocks for transmission in a data frame over a network to a destination network device. The plurality of data blocks are arranged into a plurality of data block groups such that each data block group comprises a predetermined number of data blocks. Encryption information is generated for each of the plurality of data blocks groups. The encryption information identifies an encryption key for each of the plurality of data block groups. Overhead data configured to allow the destination network device to align and decode the data frame is generated. The data frame is transmitted from the source network device to the destination network device such that the encryption information for each of the plurality of data block groups is transmitted consecutively with a respective data block group, and a portion of the overhead data is transmitted prior to each consecutive transmission of encryption information with a data block group.
    Type: Grant
    Filed: February 4, 2016
    Date of Patent: January 15, 2019
    Assignee: Cisco Technology, Inc.
    Inventors: Gilberto Loprieno, Davide Codella
  • Patent number: 9671754
    Abstract: A barrel for a timepiece. The barrel has a device for limiting the number of running rotations of the barrel, wherein this device still allows any number of winding rotations of the barrel.
    Type: Grant
    Filed: March 2, 2016
    Date of Patent: June 6, 2017
    Assignee: Glashütter Uhrenbetrieb GmbH
    Inventors: Patrick Streubel, Silko Goldmann
  • Patent number: 9507564
    Abstract: Embodiments of a processor are disclosed for performing arithmetic operations on variable-length and fixed-length machine independent numbers. The processor may include a floating point unit, and a logic circuit. The number unit may be configured to receive an operation, and first and second operands. Each of the first and second operands may include a sign byte, and multiple mantissa bytes, and may be processed in response to a determination that the operands are fixed-length numbers. The logic circuit may be further configured to perform the received operation on the processed first and second operands.
    Type: Grant
    Filed: April 14, 2014
    Date of Patent: November 29, 2016
    Assignee: Oracle International Corporation
    Inventors: Jeffrey S Brooks, Christopher H Olson, Eugene Karichkin
  • Patent number: 9304739
    Abstract: Embodiments of the present invention set forth a technique for optimizing the performance and efficiency of complex, software-based computations, such as lighting computations. Data entering a graphics application programming interface (API) in a conventional arithmetic representation, such as floating-point or fixed-point, is converted to an internal logarithmic representation for greater computational efficiency. Lighting computations are then performed using logarithmic space arithmetic routines that, on average, execute more efficiently than similar routines performed in a native floating-point format. The lighting computation results, represented as logarithmic space numbers, are converted back to floating-point numbers before being transmitted to a graphics processing unit (GPU) for further processing. Because of efficiencies of logarithmic space arithmetic, performance improvements may be realized relative to prior art approaches to performing software-based floating-point operations.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: April 5, 2016
    Assignee: NVIDIA Corporation
    Inventor: Norbert Juffa
  • Patent number: 9183611
    Abstract: Techniques are disclosed relating to implementation of gradient-type graphics instructions. In one embodiment, an apparatus includes first and second execution pipelines and a register file. In this embodiment, the register file is coupled to the first and second execution pipelines and configured to store operands for the first and second execution pipelines. In this embodiment, the apparatus is configured to determine that a graphics instruction imposes a dependency between the first and second pipeline. In response to this determination, the apparatus is configured to read a plurality of operands from the register file including an operand assigned to the second execution pipeline and to select the operand assigned to the second execution pipeline as an input operand for the first execution pipeline. The apparatus may be configured such that operands assigned to the second execution pipeline are accessible by the first execution pipeline only via the register file and not from other locations.
    Type: Grant
    Filed: July 3, 2013
    Date of Patent: November 10, 2015
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Terence M. Potter
  • Publication number: 20140317164
    Abstract: An arithmetic processing device includes: an arithmetic unit configured to execute an arithmetic operation; and a stream engine configured to execute stream processing, wherein a data bus of the arithmetic unit and a data bus of the stream engine are tightly coupled with each other.
    Type: Application
    Filed: February 21, 2014
    Publication date: October 23, 2014
    Applicant: FUJITSU LIMITED
    Inventors: Kazuhiro Yoshimura, Yi GE, Kazuo HORIO
  • Patent number: 8739101
    Abstract: A method of configuring a hardware design for a pipelined parallel stream processor includes obtaining a scheduled graph representing a processing operation in the time domain as a function of clock cycles. The graph includes a data path to be implemented in hardware as part of the stream processor, an input, an output, and parallel branches to enable data values to be streamed therethrough from the input to the output as a function of increasing clock cycle. The data path is partitioned into a plurality of discrete regions, each region operating on a different clock phase and having discrete control logic elements. Phase transition registers to align data separated by a boundary between regions having different clock phases are introduced into the data path at the boundary. The graph and control logic elements define a hardware design for the pipelined parallel stream processor.
    Type: Grant
    Filed: November 21, 2012
    Date of Patent: May 27, 2014
    Assignee: Maxeler Technologies Ltd.
    Inventor: Robert Gwilym Dimond
  • Patent number: 8719303
    Abstract: This invention proposed a new algorithm. By multiply the proposed weight coefficients of this invention, CSP and CSS can be computed without computing for the mean(s) of the data. After the proposed weight coefficients of this invention undergo factorization, it can promote a new recursive and real time updatable computation method. To test the accuracy of the new invention, the StRD data were separately tested using SAS ver 9.0, SPSS ver15.0 and EXCEL 2007 for comparison. The results showed that the accuracy of the results of the proposed invention exceeds the level of accuracy of SAS ver9.0, SPSS ver15.0 and EXCEL 2007. Aside from an accurate computation, this new invented algorithm can also produce efficient computations.
    Type: Grant
    Filed: December 24, 2008
    Date of Patent: May 6, 2014
    Inventors: Juei-Chao Chen, Kuo-Hung Lo, Tien-Lung Sun
  • Patent number: 8671129
    Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.
    Type: Grant
    Filed: March 8, 2011
    Date of Patent: March 11, 2014
    Assignee: Oracle International Corporation
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Patent number: 8543626
    Abstract: A method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Grant
    Filed: July 27, 2012
    Date of Patent: September 24, 2013
    Assignee: International Business Machines Corporation
    Inventors: Hui Li, Bai Ling Wang
  • Patent number: 8510357
    Abstract: Circuitry for adding together three long numbers may include the formation of redundant form sum bit signals and redundant form carry bit signals. These signals may be finally combined in a ripple carry adder chain that produces sum bit output signals and ripple carry bit signals. Both a ripple carry bit signal and a redundant form carry bit signal must be passed from the circuitry performing each place of the addition to the circuitry performing the next-more-significant place of the addition. Various techniques are disclosed for facilitating subdividing long chains of such circuitry, as well as possibly including (between such subdivisions) “pipeline” registers for both ripple and redundant form carry bit signals.
    Type: Grant
    Filed: January 31, 2013
    Date of Patent: August 13, 2013
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Publication number: 20130191431
    Abstract: A processor for calculating a convolution of a first input sequence of numbers with a second input sequence of numbers to generate an output sequence is provided. The processor includes multipliers, each multiplying two real numbers to generate an output; multiplexers to direct the numbers in the first and second input sequences or parts of the numbers to the multipliers; and control circuitry to control the multiplexers to direct the first and second input sequences of numbers to the multipliers dependent on whether the numbers are complex or real. An accumulator adds partial products from multiplications performed by the multipliers to calculate the convolution.
    Type: Application
    Filed: January 19, 2012
    Publication date: July 25, 2013
    Inventors: Srinivasan Iyer, Carsten Aagaard Pedersen
  • Patent number: 8396914
    Abstract: Circuitry speeds up the Cholesky decomposition of a matrix. The circuitry can be provided in a fixed logic device, or can be configured into a programmable integrated circuit device such as a programmable logic device. The circuitry implements the following equation: l ij = a ij - ? L i , L j ? a jj - ? L j , L j ? When any lij term is calculated this way, the latency in calculating the ljj term in the denominator has little or no effect on the lij term calculation. And if the calculations are properly pipelined, once the pipeline is filled, a new term can be output on each clock cycle or every few clock cycles.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 12, 2013
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8392488
    Abstract: Circuitry for adding together three long numbers may include the formation of redundant form sum bit signals and redundant form carry bit signals. These signals may be finally combined in a ripple carry adder chain that produces sum bit output signals and ripple carry bit signals. Both a ripple carry bit signal and a redundant form carry bit signal must be passed from the circuitry performing each place of the addition to the circuitry performing the next-more-significant place of the addition. Various techniques are disclosed for facilitating subdividing long chains of such circuitry, as well as possibly including (between such subdivisions) “pipeline” registers for both ripple and redundant form carry bit signals.
    Type: Grant
    Filed: September 11, 2009
    Date of Patent: March 5, 2013
    Assignee: Altera Corporation
    Inventor: Martin Langhammer
  • Patent number: 8296350
    Abstract: The present invention provides a method and apparatus for QR-factorizing matrix on a multiprocessor system, wherein the multiprocessor system comprises at least one core processor and a plurality of accelerators, the method comprises the steps of: iteratively factorizing each panel in the matrix until the whole matrix is factorized; wherein in each iteration, the method comprises: partitioning an unprocessed matrix part in the matrix into a plurality of blocks according to a predetermined block size; partitioning a current processed panel in the unprocessed matrix part into at least two sub panels, wherein the current processed panel is composed of a plurality of blocks; and performing QR factorization one by one on the at least two sub panels with the plurality of accelerators, and updating the data of the sub panel(s) on which no QR factorization has been performed among the at least two sub panels by using the factorization result.
    Type: Grant
    Filed: March 12, 2009
    Date of Patent: October 23, 2012
    Assignee: International Business Machines Corporation
    Inventors: Hui Li, Bai Ling Wang
  • Publication number: 20120233234
    Abstract: A processing unit, system, and method for performing a multiply operation in a multiply-add pipeline. To reduce the pipeline latency, the unrounded result of a multiply-add operation is bypassed to the inputs of the multiply-add pipeline for use in a subsequent operation. If it is determined that rounding is required for the prior operation, then the rounding will occur during the subsequent operation. During the subsequent operation, a Booth encoder not utilized by the multiply operation will output a rounding correction factor as a selection input to a Booth multiplexer not utilized by the multiply operation. When the Booth multiplexer receives the rounding correction factor, the Booth multiplexer will output a rounding correction value to a carry save adder (CSA) tree, and the CSA tree will generate the correct sum from the rounding correction value and the other partial products.
    Type: Application
    Filed: March 8, 2011
    Publication date: September 13, 2012
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Publication number: 20120221614
    Abstract: Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit.
    Type: Application
    Filed: May 11, 2012
    Publication date: August 30, 2012
    Inventors: Jeffrey S. Brooks, Christopher H. Olson
  • Patent number: 8161308
    Abstract: A circuit includes: an input buffer for storing input data; a plurality of processing sections connected in series including a head processing section and a tail-end processing section to sequentially process the input data; and a power supply controller for controlling power supply to each of the plurality of processing sections depending on a lapse of time during which no input data is stored in the input buffer.
    Type: Grant
    Filed: March 27, 2009
    Date of Patent: April 17, 2012
    Assignee: NEC Corporation
    Inventor: Hidenori Hisamatsu
  • Patent number: 7769099
    Abstract: The invention relates to techniques for implementing high-speed precoders, such as Tomlinson-Harashima (TH) precoders. In one aspect of the invention, look-ahead techniques are utilized to pipeline a TH precoder, resulting in a high-speed TH precoder. These techniques may be applied to pipeline various types of TH precoders, such as Finite Impulse Response (FIR) precoders and Infinite Impulse Response (IIR) precoders. In another aspect of the invention, parallel processing multiple non-pipelined TH precoders results in a high-speed parallel TH precoder design. Utilization of high-speed TH precoders may enable network providers to for example, operate 10 Gigabit Ethernet with copper cable rather than fiber optic cable.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: August 3, 2010
    Assignee: Leanics Corporation
    Inventors: Keshab K. Parhi, Yongru Gu
  • Patent number: 7747020
    Abstract: Performing a hash algorithm in a processor architecture to alleviate performance bottlenecks and improve overall algorithm performance. In one embodiment of the invention, the hash algorithm is pipelined within the processor architecture.
    Type: Grant
    Filed: December 4, 2003
    Date of Patent: June 29, 2010
    Assignee: Intel Corporation
    Inventor: Wajdi K. Feghali
  • Publication number: 20100121899
    Abstract: Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used.
    Type: Application
    Filed: January 19, 2010
    Publication date: May 13, 2010
    Applicant: Altera Corporation
    Inventors: Gerald G. Pechanek, Ricardo Rodriguez, Matthew Plonski, David Strube, Kevin Coopman
  • Patent number: 7711955
    Abstract: An apparatus and method for cryptographic key expansion. According to a first embodiment, a cryptographic unit may include key storage configured to store an expanded set of cipher keys for a cipher algorithm, and a key expansion pipeline comprising a plurality of pipeline stages. During a key expansion mode of operation, each pipeline stage may be configured to perform a corresponding step of generating a member of the expanded set of cipher keys according to a key expansion algorithm. During a cipher mode of operation, a portion of the key expansion pipeline may be configured to perform a step of the cipher algorithm.
    Type: Grant
    Filed: September 13, 2004
    Date of Patent: May 4, 2010
    Assignee: Oracle America, Inc.
    Inventors: Christopher H. Olson, Leonard D. Rarick, Gregory F. Grohoski
  • Publication number: 20090172064
    Abstract: This invention proposed a new algorithm. By multiply the proposed weight coefficients of this invention, CSP and CSS can be computed without computing for the mean(s) of the data. After the proposed weight coefficients of this invention undergo factorization, it can promote a new recursive and real time updatable computation method. To test the accuracy of the new invention, the StRD data were separately tested using SAS ver 9.0, SPSS ver 15.0 and EXCEL 2007 for comparison. The results showed that the accuracy of the results of the proposed invention exceeds the level of accuracy of SAS ver 9.0, SPSS ver 15.0 and EXCEL 2007. Aside from an accurate computation, this new invented algorithm can also produce efficient computations.
    Type: Application
    Filed: December 24, 2008
    Publication date: July 2, 2009
    Inventor: Juei-chao Chen
  • Publication number: 20090132628
    Abstract: A method for performing decimal division including receiving a scaled divisor and a scaled dividend into input registers. A subset of multiples of the scaled divisor is stored in a plurality of multiples registers. Quotient digits are calculated in response to the scaled divisor and the scaled dividend. Each quotient digit is calculated in three clock cycles by a pipeline mechanism. The calculating includes selecting a new quotient digit, and calculating a new remainder. Input to the calculating a new remainder includes data from one or more of the multiples registers.
    Type: Application
    Filed: January 23, 2009
    Publication date: May 21, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Steven R. Carlough, Paulomi Kadakia, Wen H. Li, Eric M. Schwarz
  • Publication number: 20090037508
    Abstract: Device for implementing modular multiplication, characterized in that it comprises at least one computation cell comprising a multiplier-adder comprising p pipelined logic-register pairs, receiving several digits to be added together and multiplied, at least two outputs corresponding to the low order and to the high order, an adder receiving the two outputs of the multiplier-adder, the number p being chosen in such a way that the maximum frequency of the multiplier-adder is greater than or equal to the maximum frequency of the adder.
    Type: Application
    Filed: March 31, 2006
    Publication date: February 5, 2009
    Inventors: Florent Bernard, Alain Sauzet, Eric Garrido
  • Patent number: 7406595
    Abstract: A method of packet encryption and decryption that allows for pipelining. The first step is to identify the packets in a message to be encrypted. Then, a unique number is assigned to each packet. A value R is acquired. Then, a first register is initialized. An initialization vector IV is generated. Then, the first register is stepped a user-definable number of times. Then, a packet is selected. R and the unique number are combined. Then, a second register is initialized. A checksum is generated. Then, the packet is divided into blocks. A block is selected. Then, the checksum is combined with the block and designated the checksum. The block is encrypted. Then, the first and second registers are stepped. These steps are repeated for each block. Then, the checksum is encrypted. After the blocks are encrypted, the unique number, IV, the ciphertext of each block, and the encrypted checksum are transmitted. If there are any other packets to encrypt then the steps are repeated.
    Type: Grant
    Filed: May 5, 2004
    Date of Patent: July 29, 2008
    Assignee: The United States of America as represented by the Director, National Security Agency
    Inventors: Vincent Michael Boyle, Jr., Christopher Mark Salter
  • Patent number: 7274369
    Abstract: Digital Image compositing using a programmable graphics processor is described. The programmable graphics processor supports high-precision data formats and can be programmed to complete a plurality of compositing operations in a single pass through a fragment processing pipeline within the programmable graphics processor. Source images for one or more compositing operations are stored in graphics memory, and a resulting composited image is output or stored in graphics memory. More-complex compositing operations, such as blur, warping, morphing, and the like, can be completed in multiple passes through the fragment processing pipeline. A composited image produced during a pass through the fragment processing pipeline is stored in graphics memory and is available as a source image for a subsequent pass.
    Type: Grant
    Filed: June 9, 2005
    Date of Patent: September 25, 2007
    Assignee: NVIDIA Corporation
    Inventors: Rui M. Bastos, Daniel Elliott Wexler, Larry Gritz, Jonathan Rice, Harold Robert Feldman Zatz, Matthew N. Papakipos, David Kirk
  • Patent number: 7062633
    Abstract: It is decided whether a first source data from the memory 101 is a data which is to be subjected to arithmetic or not by a state flag detection means 150, the result of the decision is retained as a state flag, and it is decided by a condition decision means 109 whether or not the state flag satisfies a condition for performing the arithmetic. A control means 110 controls whether an ALU 100 should perform the arithmetic or not on the basis of the condition satisfaction/dissatisfaction information.
    Type: Grant
    Filed: December 15, 1999
    Date of Patent: June 13, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Mana Hamada, Shunichi Kuromaru, Tomonori Yonezawa, Tsuyoshi Nakamura
  • Publication number: 20040158598
    Abstract: Disclosed herein is an arithmetic logic unit over a finite field GF(2m). Arithmetic logic units consistent with the present invention are disclosed as implemented using a division algorithm based on a binary greatest common divisor algorithm and a Most Significant Bit-first multiplication algorithm. The arithmetic logic unit can perform both a multiplication and a division using shared logic. Since the arithmetic logic unit has no limitations in the selection of an irreducible polynomial, and it is very regular and easily formed as a module, the arithmetic logic unit of the present invention has high expansibility and flexibility with respect to the size m of a field. Further, since the arithmetic logic unit of the present invention can perform a multiplication and a division using shared logic, it is very suitable to implement an encryption system for application products requiring a small size, such as smart cards or wireless communication devices.
    Type: Application
    Filed: February 3, 2004
    Publication date: August 12, 2004
    Inventors: Chun Pyo Hong, ChangHoon Kim
  • Patent number: 6748521
    Abstract: A data processing system is provided with a digital signal processor which has an instruction for saturating multiple fields of a selected set of source operands and storing the separate saturated results in a selected destination register. A first 32-bit operand (600) and a second 32-bit operand (602) are treated as four 16-bit fields and the sixteen bits in each field are saturated separately. Multi-field saturation circuitry is operable to treat a source operand as a number of fields, such that a multi-field saturated (610) result is produced that includes a number of saturated results each corresponding to each field. One instruction is provided which treats an operand pair as having two packed fields, and another instruction is provided that treats the operand pair has having four packed fields. Saturation circuitry is operable to selectively treat a field as either a signed value or an unsigned value.
    Type: Grant
    Filed: October 31, 2000
    Date of Patent: June 8, 2004
    Assignee: Texas Instruments Incorporated
    Inventor: David Hoyle
  • Patent number: 6671708
    Abstract: An image processing apparatus according to the present invention comprises a general arithmetic circuit 101 comprising a program control circuit 103, a first address generator 104, a first data memory 105, a first pipeline operation circuit 106, a second address generator 113, a second data memory 114 and a second pipeline operation circuit 112, and a dedicated arithmetic circuit 102 comprising a control circuit 115, a first dedicated pipeline operation circuit 107, a second dedicated pipeline operation circuit 108, . . . , an N-th dedicated pipeline operation circuit 110, as shown in FIG. 1. The arithmetic unit having the above-described structure, for example, can realize an arithmetic unit which can be applied to various applications. Further, considering the age of IP (Intellectual Property) which will come in the future, the arithmetic unit can exhibit the flexibility toward the applications.
    Type: Grant
    Filed: August 31, 2000
    Date of Patent: December 30, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Shunichi Kuromaru, Mana Hamada, Tomonori Yonezawa, Masatoshi Matsuo, Tsuyoshi Nakamura, Masahiro Oohashi
  • Patent number: 6282556
    Abstract: A pipelined data path architecture for use, in one embodiment, in a multimedia processor. The data path architecture requires a maximum of two execution pipestages to perform all instructions including wide data format multiply instructions and specially adapted multimedia instructions, such as the sum of absolute differences (SABD) instruction and other multiply with add (MADD) instructions. The data path architecture includes two wide data format input registers that feed four partitioned 32×32 multiplier circuits. Within two pipestages, the multiply circuit can perform one 128×128 multiply operation, four 32×32 multiply operations, eight 16×16 multiply operations or sixteen 8×8 multiply operations in parallel. The multiply circuit contains a compressor tree which generates a 256-bit sum and a 256-bit carry vector. These vectors are supplied to four 64-bit carry propagate adder circuits which generate the multiply results.
    Type: Grant
    Filed: November 30, 1999
    Date of Patent: August 28, 2001
    Assignees: Sony Corporation of Japan, Sony Electronics, Inc.
    Inventors: Farzad Chehrazi, Vojin G. Oklobdzija
  • Patent number: 6052705
    Abstract: A digital video signal processor using parallel processing includes an input serial-access memory having memory cells in which data is inputted into successive ones of the memory cells in response to a programmed-controlled pointer and a three or more port data memory unit for writing-in data read out from the serial-access memory. An arithmetic logic unit responds to stored-program control to read out data from the data memory, perform a program-prescribed arithmetic operation, and write the result of the arithmetic operation back to the data memory. An output serial-access memory is controlled so that the arithmetic result will be outputted under program control in a sequential manner. Operation of the interconnected components is effected by a stored-program control unit connected to the input serial-access memory, the data memory, the arithmetic logic unit, and the output serial-access memory.
    Type: Grant
    Filed: August 23, 1996
    Date of Patent: April 18, 2000
    Assignee: Sony Corporation
    Inventors: Seiichiro Iwase, Masuyoshi Kurokawa, Takao Yamazaki, Mitsuharu Ohki
  • Patent number: 6049816
    Abstract: A pipeline stop circuit for an external memory access which is capable of effectively performing a pipeline operation by temporarily stopping a pipeline operation, which is being operated, until data are prepared in the memory accessed, when accessing an external memory or a slow internal memory.
    Type: Grant
    Filed: December 30, 1997
    Date of Patent: April 11, 2000
    Assignee: LG Electronics, Inc.
    Inventors: Bong-Kyun Kim, Jin-Hyeock Im
  • Patent number: 5964866
    Abstract: The invention relates to a processor having a data flow unit for processing data in a plurality of steps. In one version, the data flow unit includes a plurality of consecutive stages which include logic for performing steps of the data processing, the stages being coupled together by a data path, at least one stage being coupled to a transceiver which causes data to be provided to the stage for processing or to bypass the stage unprocessed in response to a stage enable signal; a synchronizer which receives processed data from the stages and causes the processed data to be provided to external logic in synchronization with a clock signal.
    Type: Grant
    Filed: October 24, 1996
    Date of Patent: October 12, 1999
    Assignee: International Business Machines Corporation
    Inventors: Christopher McCall Durham, Peter Juergen Klim