Matrix Array Patents (Class 708/514)
-
Patent number: 12175253Abstract: According to one embodiment, a calculating device includes a first memory, a second memory, a third memory, a first arithmetic module, a second arithmetic module, a first conductive line electrically connecting a first output terminal of the first memory and a first input terminal of the first arithmetic module, a second conductive line electrically connecting a second output terminal of the first memory and a first input terminal of the second arithmetic module, a third conductive line electrically connecting a first output terminal of the second memory and a second input terminal of the second arithmetic module, a fourth conductive line electrically connecting a first output terminal of the third memory and a third input terminal of the second arithmetic module, and a fifth conductive line electrically connecting a first output terminal of the second arithmetic module and a second input terminal of the first arithmetic module.Type: GrantFiled: March 21, 2023Date of Patent: December 24, 2024Assignee: Kabushiki Kaisha ToshibaInventors: Kosuke Tatsumura, Hayato Goto
-
Patent number: 12068745Abstract: Various implementations described herein are directed to a device having a scan chain that receives a multi-bit input, provides a multi-bit output, and provides a multi-bit multiplexer output based on the multi-bit input and the multi-bit output. The device may have an error-bit generator that receives the multi-bit multiplexer output, receives a portion of the multi-bit input, receives a portion of the multi-bit output, and provides an error-bit output based on the multi-bit multiplexer output, the portion of the multi-bit input, and the portion of the multi-bit output.Type: GrantFiled: August 31, 2021Date of Patent: August 20, 2024Assignee: Arm LimitedInventors: Anil Kumar Baratam, Yves Thomas Laplanche
-
Patent number: 11989258Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: GrantFiled: November 9, 2020Date of Patent: May 21, 2024Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 11983530Abstract: Systems and methods described herein may relate to providing a dynamically configurable circuitry able to process data associated with a variety of matrix dimensions using one or more complex number operations, one or more real number operations, or both. Configurations may be applied to the configurable circuitry to program the configurable circuitry for a next operation. The configurable circuitry may process data according to a variety of operations based at least in part on operation of a repeated processing element coupled in a compute network of processing elements.Type: GrantFiled: March 27, 2020Date of Patent: May 14, 2024Assignee: Intel CorporationInventors: Sumeet Singh Nagi, Farhana Sheikh, Scott Jeremy Weber, Uneeb Yaqub Rathore
-
Patent number: 11741345Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.Type: GrantFiled: September 25, 2020Date of Patent: August 29, 2023Assignee: Amazon Technologies, Inc.Inventors: Randy Huang, Ron Diamant
-
Patent number: 11704124Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixed-sized elements of the identified first and second sources, execute the decoded instruction to generate a double-sized product of each pair of fixed-sized elements, the double-sized product being represented by at least twice a number of bits of the fixed size, and generate an unsigned fixed-sized result by rounding the most significant fixed-sized portion of the double-sized product to fit into the identified destination.Type: GrantFiled: January 11, 2022Date of Patent: July 18, 2023Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Robert Valentine, Jesus Corbal
-
Patent number: 11120101Abstract: The present disclosure advantageously provides a system method for efficiently multiplying matrices with elements that have a value of 0. A bitmap is generated for each matrix. Each bitmap includes a bit position for each matrix element. The value of each bit is set to 0 when the value of the corresponding matrix element is 0, and to 1 when the value of the corresponding matrix element is not 0. Each matrix is compressed into a compressed matrix, which will have fewer elements with a value of 0 than the original matrix. Each bitmap is then adjusted based on the corresponding compressed matrix. The compressed matrices are then multiplied to generate an output matrix. For each element i,j in the output matrix, a dot product of the ith row of the first compressed matrix and the jth column of the second compressed matrix is calculated based on the bitmaps.Type: GrantFiled: September 27, 2019Date of Patent: September 14, 2021Assignee: Arm LimitedInventors: Zhi-Gang Liu, Matthew Mattina, Paul Nicholas Whatmough
-
Patent number: 11010130Abstract: The present invention discloses a floating point processor prototype of multi-channel data. An architecture comprises the following steps: arranging structural data, semi-structured data and unstructured data into a three-way array; decomposing the three-way array into a matrix pattern of a second-order tensor by using higher-order singular value decomposition; and converting the matrix pattern into a sparse domain to conduct block floating point quantization. A floating point processor prototype of multi-channel data is built.Type: GrantFiled: August 9, 2017Date of Patent: May 18, 2021Assignee: Shanghai DataCenter Science Co., LTDInventors: Jun Zhang, Ke Xu, Xiaofeng Chen
-
Patent number: 10970201Abstract: A system, apparatus and method for utilizing a transpose function to generate a two-dimensional array from three-dimensional input data. The use of the transpose function reduces redundant elements in the resultant two-dimensional array thereby increasing efficiency and decreasing power consumption.Type: GrantFiled: October 24, 2018Date of Patent: April 6, 2021Assignee: Arm LimitedInventor: Paul Nicholas Whatmough
-
Patent number: 10831862Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: GrantFiled: March 20, 2020Date of Patent: November 10, 2020Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 10803379Abstract: Provided are systems, methods, and integrated circuits for a neural network processing system. In various implementations, the system can include a first array of processing engines coupled to a first set of memory banks and a second array of processing engines coupled to a second set of memory banks. The first and second set of memory banks be storing all the weight values for a neural network, where the weight values are stored before any input data is received. Upon receiving input data, the system performs a task defined for the neural network. Performing the task can include computing an intermediate result using the first array of processing engines, copying the intermediate result to the second set of memory banks, and computing a final result using the second array of processing engines, where the final result corresponds to an outcome of performing the task.Type: GrantFiled: December 12, 2017Date of Patent: October 13, 2020Assignee: Amazon Technologies, Inc.Inventors: Randy Huang, Ron Diamant
-
Patent number: 10769238Abstract: Techniques facilitating matrix multiplication on a systolic array are provided. A computer-implemented method can comprise populating, by a system operatively coupled to a processor, respective first registers of one or more processing elements of a systolic array structure with respective input data bits of a first data matrix. The one or more processing elements can comprise a first processing element that comprises a first input data bit of the first data matrix and a first activation bit of a second data matrix. The method can also include determining, by the system, at the first processing element, a first partial sum of a third data matrix. Further, the method can include streaming, by the system, the first partial sum of the third data matrix from the first processing element.Type: GrantFiled: September 19, 2019Date of Patent: September 8, 2020Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Chia-Yu Chen, Jungwook Choi, Kailash Gopalakrishnan, Victor Han, Vijayalakshmi Srinivasan, Jintao Zhang
-
Patent number: 10768899Abstract: A configurable circuit configurable according to the data width of elements of a matrix is described that includes a memory array, logic to write a matrix to the memory array having elements with a data width which can be specified using configuration data, logic for a transpose read of the matrix as-written and logic for normal read of the matrix as-written. The memory array includes first and second read ports operable in parallel. Transpose read logic and normal read logic can be coupled to the first and second read ports, respectively, allowing transpose and normal read of a matrix simultaneously.Type: GrantFiled: January 29, 2019Date of Patent: September 8, 2020Assignee: SambaNova Systems, Inc.Inventors: David Alan Koeplinger, Raghu Prabhakar, Ram Sivaramakrishnan, David Brian Jackson, Mark Luttrell
-
Patent number: 10621269Abstract: Methods, systems, and apparatus for performing a matrix multiplication using a hardware circuit are described. An example method begins by obtaining an input activation value and a weight input value in a first floating point format. The input activation value and the weight input value are multiplied to generate a product value in a second floating point format that has higher precision than the first floating point format. A partial sum value is obtained in a third floating point format that has a higher precision than the first floating point format. The partial sum value and the product value are combined to generate an updated partial sum value that has the third floating point format.Type: GrantFiled: May 17, 2018Date of Patent: April 14, 2020Assignee: Google LLCInventors: Andrew Everett Phelps, Norman Paul Jouppi
-
Patent number: 10452744Abstract: Techniques related to memory management for sparse matrix multiplication are disclosed. Computing device(s) may perform a method for multiplying a row of a first sparse matrix with a second sparse matrix to generate a product matrix row. A compressed representation of the second sparse matrix is stored in main memory. The compressed representation comprises a values array that stores non-zero value(s). Tile(s) corresponding to row(s) of second sparse matrix are loaded into scratchpad memory. The tile(s) comprise set(s) of non-zero value(s) of the values array. A particular partition of an uncompressed representation of the product matrix row is generated in the scratchpad memory. The particular partition corresponds to a partition of the second sparse matrix comprising non-zero value(s) included in the tile(s). When a particular tile is determined to comprise non-zero value(s) that are required to generate the particular partition, the particular tile is loaded into the scratchpad memory.Type: GrantFiled: March 27, 2017Date of Patent: October 22, 2019Assignee: Oracle International CorporationInventors: Sandeep R. Agrawal, Sam Idicula, Nipun Agarwal
-
Patent number: 10191739Abstract: A microprocessor unit (MPU) connected to external sensors is provided with an interface unit that acquires detection information acquired by the external sensors and a digital signal processor (DSP) that estimates the state of a target object on the basis of the detection information acquired by the interface part and generates state information. The DSP is provided with a SIMD type arithmetic processing circuitry that processes a plurality of information with one command and is provided with single precision floating point computing units. The interface part outputs the state information generated by the DSP to an externally provided main processor. Therefore, power consumption can be reduced.Type: GrantFiled: September 22, 2016Date of Patent: January 29, 2019Assignee: MEGACHIPS CORPORATIONInventors: Mahito Matsumoto, Tomoshige Kato, Takehiro Yoshimura, Takio Yamaoka, Yusuke Sasaki, Shingo Hamaguchi
-
Patent number: 9619152Abstract: Techniques for transforming character delimited values are presented herein. An input module may be configured to read a set of character delimited values. A generation module may be configured to generate, in real-time, a synchronization block for the set of values that includes a nibble for each value in the set of values. The nibbles may represent either a byte size of the associated value or may be a flag representing a predetermined value. An output module may be configured to sequentially output the synchronization block and the set of values to a binary data output stream for output in a device dependent byte order according to the respective byte sizes of the values in the set of values.Type: GrantFiled: December 19, 2014Date of Patent: April 11, 2017Assignee: eBay Inc.Inventors: Gang Ye, Thennarasu Ponnusamy, Belinda Liu, Enlin Wang, Mallikarjun Bhaigond, Amit Desai, Xin Zhuang, Preeta Joshi, Hong-Yen Nguyen
-
Patent number: 9329936Abstract: A system, processor and method to increase computational reliability by using underutilized portions of a data path with a SuperFMA ALU. The method allows the reuse of underutilized hardware to implement spatial redundancy by using detection during the dispatch stage to determine if the operation may be executed by redundant hardware in the ALU. During execution, if determination is made that the correct conditions exists as determined by the redundant execution modes, the SuperFMA ALU performs the operation with redundant execution and compares the results for a match in order to generate a computational result. The method to increase computational reliability by using redundant execution is advantageous because the hardware cost of adding support for redundant execution is low and the complexity of implementation of the disclosed method is minimal due to the reuse of existing hardware.Type: GrantFiled: December 31, 2012Date of Patent: May 3, 2016Assignee: Intel CorporationInventor: Brian J. Hickman
-
Patent number: 9098460Abstract: A matrix calculation system for calculating funny matrix multiplication (FMM) of a matrix A and a matrix B, including: sequentially calculating a permutation of indices {ai} in which values are arranged in a non-decreasing order with respect to each i-th row where i=1 to the number of rows of the matrix A; storing a value, which is greater than expected as a value of a matrix, for C[i, j] with respect to each j-th column where j=1 to the number of columns of the matrix A in the i-th row; sequentially calculating a permutation of indices {bj} in which values are arranged in a non-decreasing order with respect to each j-th column where j=1 to the number of columns of the matrix B; and setting the values of C[i, j], which are i and j components of the matrix C.Type: GrantFiled: August 22, 2012Date of Patent: August 4, 2015Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Hiroki Yanagisawa
-
Patent number: 8943119Abstract: A system and a method are configured to improve the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128 b by 128 b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.Type: GrantFiled: May 2, 2012Date of Patent: January 27, 2015Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, Bruce Bateman, John Moussouris
-
Patent number: 8782115Abstract: A matrix decomposition circuit is described. In one implementation, the matrix decomposition circuit includes a processing element to process a plurality of processing cells and a scheduler coupled to the processing element, where the scheduler instructs the processing element to process only required processing cells of the plurality of processing cells. In one specific implementation, the required processing cells are processing cells with non-zero inputs.Type: GrantFiled: April 18, 2008Date of Patent: July 15, 2014Assignee: Altera CorporationInventor: Kulwinder Dhanoa
-
Patent number: 8649508Abstract: A system and method for implementing the Elliptic Curve scalar multiplication method in cryptography, where the Double Base Number System is expressed in decreasing order of exponents and further on using it to determine Elliptic curve scalar multiplication over a finite elliptic curve.Type: GrantFiled: September 29, 2008Date of Patent: February 11, 2014Assignee: Tata Consultancy Services Ltd.Inventor: Natarajan Vijayarangan
-
Patent number: 8620976Abstract: A machine-implemented method for computerized digital signal processing including obtaining a digital signal from data storage or from conversion of an analog signal, and determining, from the digital signal, one or more measuring matrices. Each measuring matrix has a plurality of cells, and each cell has an amplitude corresponding to the signal energy in a frequency bin for a time slice. Cells in each measuring matrix having maximum amplitudes along a time slice and/or frequency bin are identified as maximum cells. Maxima that coincide in time and frequency are identified and a correlated maxima matrix, called a “Precision Measuring Matrix” is constructed showing the coinciding maxima and the adjacent marked maxima are linked into partial chains.Type: GrantFiled: May 11, 2011Date of Patent: December 31, 2013Assignee: Paul Reed Smith Guitars Limited PartnershipInventors: Paul Reed Smith, Frederick M. Slay, Ernestine M. Smith
-
Patent number: 8612502Abstract: Systems and methodologies are described that facilitate equalization of received signals in a wireless communication environment. Multiple transmit and/or receive antennas and utilize MIMO technology to enhance performance. A single tile of transmitted data, including a set of modulation symbols, can be received at multiple receive antennas, resulting in multiple tiles of received modulation symbols. Corresponding modulation symbols from multiple received tiles can be processed as a function of channel and interference estimates to generate a single equalized modulation symbol. Typically, the equalization process is computationally expensive. However, the channels are highly correlated. This correlation is reflected in the channel estimates and can be utilized to reduce complex equalization operations. In particular, a subset of the equalizers can be generated based upon the equalizer function and the remainder can be generated using interpolation. In addition, the equalizer function itself can be simplified.Type: GrantFiled: March 20, 2008Date of Patent: December 17, 2013Assignee: QUALCOMM IncorporatedInventors: Petru Cristian Budianu, Hermanth Sampath, Alexei Gorokhov, Dhananjay A. Gore
-
Patent number: 8554820Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.Type: GrantFiled: April 20, 2012Date of Patent: October 8, 2013Assignee: International Business Machines CorporationInventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
-
Patent number: 8533251Abstract: A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers.Type: GrantFiled: May 23, 2008Date of Patent: September 10, 2013Assignee: International Business Machines CorporationInventors: Daniel A. Brokenshire, John A. Gunnels, Michael D. Kistler
-
Patent number: 8521799Abstract: Disclosed are a row-vector norm comparison method and a row-vector norm comparison apparatus for an inverse matrix. A row-vector norm comparison apparatus includes: an input matrix processing module that receives and combines constituent elements of a matrix; a cofactor operation module that multiplexes the combination result of the constituent elements to calculate factors constituting an adjoint matrix; a square calculation module that squares the calculated factors; a summation module that selects a predetermined number of factors among the squared factors and sums the selected factors to calculate the norms of row vectors in an inverse matrix; and a norm comparison module that outputs a comparison result of the calculated norms of the row vectors.Type: GrantFiled: June 30, 2008Date of Patent: August 27, 2013Assignees: Samsung Electronics Co., Ltd., Electronics and Telecommunications Research InstituteInventors: Young Ha Lee, Seung Jae Bahng, Youn-Ok Park
-
Patent number: 8473533Abstract: A system, computer-readable storage medium, and method directly solves non-linear systems that have the HB Jacobian as the coefficient matrix. The direct solve method can be used to efficiently simulate non-linear circuits in RF or microwave applications. Additionally, the direct solve method can be applied to Fourier envelope applications. Furthermore, the direct solve method can be used together with preconditioners to provide a more efficient iterative solve technique.Type: GrantFiled: June 17, 2010Date of Patent: June 25, 2013Assignee: Berkeley Design Automation, Inc.Inventors: Amit Mehrotra, Abhishek Somani
-
Patent number: 8412757Abstract: Non-negative matrix factorization, NMF, is combined with identification of a maximum margin classifier by minimizing a cost function that contains a generative component and the discriminative component. The relative weighting between the generative component and the discriminative component are adjusting during subsequent iterations such that initially, when confidence is low, the generative model is favored. But as the iterations proceed, confidence increases and the weight of the discriminative component is steadily increased until it is of equal weight as the generative model. Preferably, the cost function to be minimized is: min F , G ? 0 ? ? X - FG ? 2 + ? ? ( ? w ? 2 + C ? ? i = 1 n ? L ? ( y i , w · g i + b ) ) .Type: GrantFiled: December 9, 2009Date of Patent: April 2, 2013Assignee: Seiko Epson CorporationInventors: Mithun Das Gupta, Jing Xiao
-
Publication number: 20130073599Abstract: Hardware for performing sequences of arithmetic operations. The hardware comprises a scheduler operable to generate a schedule of instructions from a bitmap denoting whether an entry in a matrix is zero or not. An arithmetic circuit is provided which is configured to perform arithmetic operations on the matrix in accordance with the schedule.Type: ApplicationFiled: January 7, 2011Publication date: March 21, 2013Applicant: LINEAR ALGEBRA TECHNOLOGIES, LIMITEDInventor: David Maloney
-
Patent number: 8341204Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: GrantFiled: July 2, 2009Date of Patent: December 25, 2012Assignee: Renesas Electronics CorporationInventors: Fumio Arakawa, Tetsuya Yamada
-
Patent number: 8285773Abstract: A signal separating device includes an iterative estimator, a repeating calculator, a result output unit, and a repetition controller. The repeating calculator repeatedly causes the iterative estimator to iteratively perform independent component analysis on an observed signal matrix, and to further perform independent component analysis on the source signal matrix obtained as a result. The result output unit outputs the product of the respective mixing matrices obtained during each repetition as a mixing matrix with respect to the observed signal matrix, while also outputting the source signal matrix obtained during the final repetition as a source signal matrix with respect to the observed signal matrix. The repetition controller causes the repeating calculator to repeat the calculation control until all mixing matrices and all source signal matrices satisfy a convergence condition. The iterative estimator may perform a fixed number of iterations, or perform iterations until convergence is obtained.Type: GrantFiled: April 27, 2007Date of Patent: October 9, 2012Assignee: RikenInventors: Andrzej Cichocki, Rafal Zdunek, Shunichi Amari, Gen Hori, Ken Umeno
-
Patent number: 8195735Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.Type: GrantFiled: December 9, 2008Date of Patent: June 5, 2012Assignee: Microunity Systems Engineering, Inc.Inventors: Craig Hansen, Bruce Bateman, John Moussouris
-
Patent number: 8055607Abstract: A system and method for autonomic problem determination. Events and problems associated with the events are received from a computing resource and are expressed as entries in an event-problem matrix. Expert knowledge is expressed as entries in one or more multi-level structure dictionaries. The system and method enables dynamic interaction between the events in the matrix and the current dictionaries with its entries being updated continuously to maximize correlation among the events and problems. The index of each term in the dictionary is used to calculate the weight of each event in the matrix wherein events having frequent association with a specific problem will be given a higher weight in the matrix. Using singular value decomposition (SVD), the weighted events enable an accelerated and accurate convergence to a set of specific associated problems.Type: GrantFiled: March 3, 2008Date of Patent: November 8, 2011Assignee: International Business Machines CorporationInventors: Hoi Y. Chan, Thomas Y. Kwok
-
Patent number: 7933404Abstract: Techniques are disclosed to enable efficient implementation of secure hash functions and/or stream ciphers. More specifically, a family of graphs is described that has relatively large girth, large claw, and/or rapid mixing properties. The graphs are suitable for construction of cryptographic primitives such as collision resistant hash functions and stream ciphers, which allow efficient software implementation.Type: GrantFiled: October 16, 2007Date of Patent: April 26, 2011Assignee: Microsoft CorporationInventors: Ramarathnam Venkatesan, Matthew Cary
-
Publication number: 20100138468Abstract: Methods and apparatus are provided for a digital signal processor having an instruction set with one or more non-linear complex functions. A method is provided for a processor. One or more non-linear complex software instructions are obtained from a program. The non-linear complex software instructions have at least one complex number as an input. One or more non-linear complex functions are applied from a predefined instruction set to the at least one complex number. An output is generated comprised of one complex number or two real numbers. A functional unit can implement the one or more non-linear complex functions. In one embodiment, a vector-based digital signal processor is disclosed that processes a complex vector comprised of a plurality of complex numbers. The processor can process the plurality of complex numbers in parallel.Type: ApplicationFiled: November 28, 2008Publication date: June 3, 2010Inventors: Kameran Azadet, Jian-Guo Chen, Samer Hijazi, Joseph Williams
-
Patent number: 7567996Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: GrantFiled: August 29, 2005Date of Patent: July 28, 2009Assignee: Renesas Technology Corp.Inventors: Fumio Arakawa, Tetsuya Yamada
-
Publication number: 20090094309Abstract: The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources.Type: ApplicationFiled: December 9, 2008Publication date: April 9, 2009Applicant: MICROUNITY SYSTEMS ENGINEERING, INC.Inventors: Craig HANSEN, Bruce Bateman, John Moussouris
-
Publication number: 20080104161Abstract: An integrated transformation apparatus is provided. The apparatus includes a first multiplexer, a second multiplexer, and a transformation unit. The first multiplexer retrieves point data from columns or rows of a multi-dimensional matrix and input data. The second multiplexer retrieves transformation coefficients corresponding to the point data. The transformation unit transforms data blocks of the multi-dimensional matrix to a plurality of sub data blocks according to the input data, the point data, and the transformation coefficients.Type: ApplicationFiled: August 24, 2007Publication date: May 1, 2008Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventors: Yi-Jung Wang, Guo-Zua Wu, Chih-Chi Chang, Oscal Tzyh Chiang Chen
-
Patent number: 7363200Abstract: A matrix includes samples associated with a first signal and samples associated with a second signal. The second signal includes a first portion associated with the first signal and a second portion associated with at least one disturbance, such as white noise or colored noise. A projection of the matrix is produced using canonical QR-decomposition. Canonical QR-decomposition of the matrix produces an orthogonal matrix and an upper triangular matrix, where each value in the diagonal of the upper triangular matrix is greater than or equal to zero. The projection at least substantially separates the first portion of the second signal from the second portion of the second signal.Type: GrantFiled: February 5, 2004Date of Patent: April 22, 2008Assignee: Honeywell International Inc.Inventor: Joseph Z. Lu
-
Patent number: 7137005Abstract: A method of introducing a non-perceptional signal (watermark) to a digital media data is disclosed. The method is based on the representation of source digital data using a special matrix, insertion of a digital watermark into the special matrix to receive the watermarked matrix, and generation of the watermarked data using the source data and the watermarked matrix. In addition, watermark detection of the watermarked data is performed by calculating the special matrix from the watermarked data.Type: GrantFiled: March 27, 2002Date of Patent: November 14, 2006Assignee: LG Electronics Inc.Inventors: Mikhail Anatolyevich Sall, Alexander Leonidovich Mayboroda, Viktor Vikrorovich Redkov, Anatoly Igorevich Tikhotsky
-
Patent number: 7089159Abstract: A matrix reordering method performs reordering of elements of a coefficient matrix created based on coefficients of linear simultaneous equations whose solutions are to be produced by parallel processing of processors of a computer in accordance with Gaussian elimination. Herein, degrees corresponding to numbers of non-zero elements are calculated with respect to all pivots included in the coefficient matrix. Then, a first pivot whose degree is under a threshold (mindeg+?) is selected from among the pivots of the coefficient matrix, while a second pivot whose critical path length is minimum is also selected from among the pivots of the coefficient matrix. Replacement of elements is performed between the first pivot and second pivot to complete reordering with respect to the first pivot. In addition, non-zero elements, which are newly produced by the Gaussian elimination of the first pivot, are added to the coefficient matrix.Type: GrantFiled: April 2, 2001Date of Patent: August 8, 2006Assignee: NEC Electronics CorporationInventor: Koutaro Hachiya
-
Patent number: 7028066Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with single instruction multiple data (SIMD) and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: GrantFiled: March 8, 2001Date of Patent: April 11, 2006Assignee: Renesas Technology Corp.Inventors: Fumio Arakawa, Tetsuya Yamada
-
Patent number: 7010760Abstract: An autofill algorithm provides tools for defining and automatically executing batch based procedures in an adaptive hierarchical workflow environment, and may be suitable for a large variety of applications including laboratory procedure planning, execution, documentation, as wells ad driving robotic apparatus.Type: GrantFiled: March 11, 2004Date of Patent: March 7, 2006Assignee: Teranode CorporationInventors: Lawrence F. Arnstein, Zheng Li, John M. Hill, Michael R. Kellen, Christophe Poulain, Neil A. Fanger, Kuang Chen
-
Patent number: 6820074Abstract: A method and software are disclosed for processing data values of a data array at equally spaced locations in two dimensions where the desired data values are nulls in the data array. The method and software first searches for linear ranges of contiguous nulls, and then performs incidental interpolation of all points in such range.Type: GrantFiled: July 7, 1999Date of Patent: November 16, 2004Assignee: Landmark Graphics CorporationInventor: Anne L. Simpson
-
Patent number: 6662125Abstract: An electromagnetic wave analyzer and program which can handle non-uniform cells with smaller computation errors. A given computational domain is divided into a plurality of cells for the purpose of finite difference approximation. For each space point, a cell size identification unit identifies the uniformity of surrounding cells. When the surrounding cells are identified as being uniform in size, a first calculation unit calculates electromagnetic field components at that space point with a first calculation method. When the surrounding cells are identified as being non-uniform in size, a second calculation unit calculates the same with a second calculation method which has smaller computational errors than the first calculation method. A data output unit then outputs the calculated electromagnetic field values.Type: GrantFiled: December 20, 2001Date of Patent: December 9, 2003Assignee: Fujitsu LimitedInventor: Takefumi Namiki
-
Publication number: 20010051969Abstract: A multimedia execution unit configured to perform vectored floating point and integer instructions. The execution unit may include an add/subtract pipeline having far and close data paths. The far path is configured to handle effective addition operations and effective subtraction operations for operands having an absolute exponent difference greater than one. The close path is configured to handle effective subtraction operations for operands having an absolute exponent difference less than or equal to one. The close path is configured to generate two output values, wherein one output value is the first input operand plus an inverted version of the second input operand, while the second output value is equal to the first output value plus one. Selection of the first or second output value in the close path effectuates the round-to-nearest operation for the output of the adder.Type: ApplicationFiled: February 6, 2001Publication date: December 13, 2001Inventors: Stuart F. Oberman, Norbert Juffa, Fred Weber, Krishnan Ramani, Ravi Krishna Cherukuri
-
Publication number: 20010021941Abstract: A data processor whose level of operation parallelism is enhanced by composing floating-point inner product execution units to be compatible with SIMD and thereby enhancing the operation processing capability is made possible. An operating system that can significantly enhance the level of operation parallelism per instruction while maintaining the efficiency of the floating-point length-4 vector inner product execution units is to be implemented. The floating-point length-4 vector inner product execution units are defined in the minimum width (32 bits for single precision) even where an extensive operating system becomes available, and compose the inner product execution units to be compatible with SIMD. The mutually augmenting effects of the inner product execution units and SIMD-compatible composition enhances the level of operation parallelism dramatically.Type: ApplicationFiled: March 8, 2001Publication date: September 13, 2001Inventors: Fumio Arakawa, Tetsuya Yamada
-
Publication number: 20010011291Abstract: A data processor includes an arithmetic portion incorporated in a floating point unit, in which the arithmetic portion includes a plurality of multipliers supplied mantissa part of floating point number from respectively different data input signal line group and performing mutual multiplication of supplied mantissa parts, an aligner receiving outputs of respective multipliers and performing alignment shift, an exponent processing portion for generating number of alignment shift of the aligner and an exponent before normalization on the basis of generation an exponent part of the floating point number, a multi-input adder and the exponent before normalization, reducing scale of the circuit and performing inner product operation and the like with the floating point numbers in high speed and high accuracy.Type: ApplicationFiled: March 19, 2001Publication date: August 2, 2001Inventors: Fumio Arakawa, Norio Nakagawa, Tetsuya Yamada, Yonetaro Totsuka
-
Patent number: 6188240Abstract: A programmable function block comprises a core logic circuit having a first argument input group consisting of first through fourth argument input terminals, a second argument input group consisting of first through fourth argument input terminals, first through third configuration input terminals, a core logic carry output terminal, a core logic carry generation output terminal, a core logic carry propagation output terminal, a ripple-core logic carry input terminal, and a sum output terminal. Connected to interconnection wires and the first and the second argument input groups, an input block includes eighth input selection units for selecting, as eight input selected signals, eight ones of signals on the interconnection wires, a fixed logic value of “1”, and a fixed logic value of “0”. Connected to the first through the third configuration input terminals, respectively, first through third memory circuits stores, as first through third stored logic values, a logic value of one bit.Type: GrantFiled: June 4, 1999Date of Patent: February 13, 2001Assignees: NEC Corporation, Real World Computing PartnershipInventor: Shogo Nakaya