Multiple Parallel Operations Patents (Class 708/524)

Neuromorphic device and driving method thereof

Patent number: 12050886

Abstract: A neuromorphic device includes a first resistor line having a plurality of first resistors that are serially connected to each other, a second resistor line having a plurality of second resistors that are serially connected to each other, one or more current sources to control a current flowing in each of the first resistor line and the second resistor line to a respective current value, a first capacitor electrically connectable to the first resistor line, and a second capacitor electrically connectable to the second resistor line.

Type: Grant

Filed: October 21, 2020

Date of Patent: July 30, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Sungmeen Myung, Sangjoon Kim, Seungchul Jung
Cache footprint management

Patent number: 11947462

Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

Type: Grant

Filed: March 3, 2022

Date of Patent: April 2, 2024

Assignee: Apple Inc.

Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
Filtering method and system of parallel computing results

Patent number: 11886534

Abstract: The present invention discloses a filtering method and system of parallel computing results, through simultaneously generating the input value of the first valid position fvp of each fragment, and simultaneously computing to obtain the output result corresponding to the input value of each first valid position fvp with the respective first valid position fvp of each fragment, and according to the output result of the first valid position fvp of the first fragment, the parallel computing results are filtered through the manner of selecting the output results of the second to the S-th fragments in sequence, to finally obtain correct parallel computing results. In the present invention, through adopting the manner of parallel filtering, the original serial filtering computation is changed to parallel computation of S fragments, the computing time is only one S-th of the original time, thereby improving the computing efficiency and satisfying the timing requirements of parallel computation.

Type: Grant

Filed: September 29, 2019

Date of Patent: January 30, 2024

Assignee: Inspur Electronic Information Industry Co., Ltd.

Inventors: Hongzhi Shi, Haiwei Liu, Jian Zhao
Electronic device performing outlier-aware approximation coding and method thereof

Patent number: 11782498

Abstract: An electronic device includes a coding module that determines whether a parameter of an artificial neural network is an outlier, depending on a value of the parameter and compresses the parameter by truncating a first bit of the parameter when the parameter is a non-outlier and truncating a second bit of the parameter when the parameter is the outlier, and a decoding module that decodes a compressed parameter.

Type: Grant

Filed: February 17, 2020

Date of Patent: October 10, 2023

Assignee: University-Industry Cooperation Group of Kyung Hee University

Inventors: Ik Joon Chang, Ho Nguyen Dong, Minhson Le
Barrierless and fenceless shared memory synchronization with write flag toggling

Patent number: 11620169

Abstract: When communicating through shared memory, a producer thread generates a value that is written to a location in a shared memory. The value is read from the shared memory by a consumer thread. The challenge is to ensure that the consumer thread reads the location only after the value is written and is thereby synchronized. When a memory location is written by a producer thread, a flag that is simultaneously stored in the memory location along with the value is toggled. The consumer thread tracks information to determine whether the flag stored in the location indicates whether the producer has written the value to the location. The flag is read and written simultaneously with reading and writing the location in memory, thereby eliminating the need for a memory fence. After all of the consumer threads read the value, the location may be reused to write additional value(s) and simultaneously toggle the flag.

Type: Grant

Filed: March 13, 2020

Date of Patent: April 4, 2023

Assignee: NVIDIA Corporation

Inventor: Vasily Volkov
Method and apparatus for generating and subscribing to notifications

Patent number: 11539812

Abstract: Provided are a method and device for generating a notification and a method and device for subscribing a notification message. The method for generating a notification (300) includes: receiving a subscription request (S310); creating a first subscription resource according to the subscription request, the first subscription resource including a plurality of first event notification criteria and a second event notification criterion (S320); receiving a plurality of first events generated according to the plurality of first event notification criteria (S330); determining whether the plurality of first events satisfy the second event notification criterion (S340); and generating a notification in a case where the plurality of first events satisfies the second event notification criterion, the notification indicating a second event (S350).

Type: Grant

Filed: November 14, 2017

Date of Patent: December 27, 2022

Assignee: BOE TECHNOLOGY GROUP CO., LTD.

Inventors: Zhenpeng Guo, Junjie Zhao
Security processor performing remainder calculation by using random number and operating method of the security processor

Patent number: 11392725

Abstract: Provided are a security processor for performing a remainder operation by using a random number and an operating method of the security processor. The security processor includes a random number generator configured to generate a first random number; a modular calculator configured to generate a first random operand based on first data and the first random number and generate output data through a remainder operation on the first random operand, wherein a result value of the remainder operation on the first input data is identical to a result value of the remainder operation on the first random operand.

Type: Grant

Filed: August 9, 2019

Date of Patent: July 19, 2022

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jae-hyeok Kim, Jong-hoon Shin, Ji-su Kang, Hyun-il Kim, Hye-soo Lee, Hong-mook Choi
Circular accumulator for floating point addition

Patent number: 11275559

Abstract: Certain aspects of the present disclosure are directed to methods and apparatus for circular floating point addition. An example method generally includes obtaining a first floating point number represented by a first significand and a first exponent, obtaining a second floating point number represented by a second significand and second exponent, and adding the first floating point number and the second floating point number using a circular accumulator device.

Type: Grant

Filed: July 2, 2020

Date of Patent: March 15, 2022

Assignee: Qualcomm Incorproated

Inventor: Aaron Douglass Lamb
Dynamic, variable bit-width numerical precision on field-programmable gate arrays for machine learning tasks

Patent number: 11216250

Abstract: A method includes providing a set of one or more computational units implemented in a set of one or more field programmable gate array (FPGA) devices, where the set of one or more computational units is configured to generate a plurality of output values based on one or more input values. The method further includes, for each computational unit of the set of computational units, performing a first calculation in the computational unit using a first number representation, where a first output of the plurality of output values is based on the first calculation, determining a second number representation based on the first output value, and performing a second calculation in the computational unit using the second number representation, where a second output of the plurality of output values is based on the second calculation.

Type: Grant

Filed: December 6, 2017

Date of Patent: January 4, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Nicholas P. Malaya, Elliot H. Mednick
Reconfigurable crypto-processor

Patent number: 11157275

Abstract: The present disclosure relates to systems and methods that provide a reconfigurable cryptographic coprocessor. An example system includes an instruction memory configured to provide ARX instructions and mode control instructions. The system also includes an adjustable-width arithmetic logic unit, an adjustable-width rotator, and a coefficient memory. A bit width of the adjustable-width arithmetic logic unit and a bit width of the adjustable-width rotator are adjusted according to the mode control instructions. The coefficient memory is configured to provide variable-width words to the arithmetic logic unit and the rotator. The arithmetic logic unit and the rotator are configured to carry out the ARX instructions on the provided variable-width words. The systems and methods described herein could accelerate various applications, such as deep learning, by assigning one or more of the disclosed reconfigurable coprocessors to work as a central computation unit in a neural network.

Type: Grant

Filed: July 3, 2018

Date of Patent: October 26, 2021

Assignees: The Board of Trustees of the University of Illinois, University of Virginia Patent Foundation

Inventors: Mohamed E Aly, Wen-Mei W. Hwu, Kevin Skadron
Semiconductor memory device and method of operating the same

Patent number: 11145338

Abstract: A semiconductor memory device includes a storage, a buffer, and a control logic. The storage stores a first algorithm data. The buffer stores a second algorithm data that is at least partially different from the first algorithm data. The control logic is configured to selectively receive the first algorithm data and the second algorithm data.

Type: Grant

Filed: April 28, 2020

Date of Patent: October 12, 2021

Assignee: SK hynix Inc.

Inventors: Geonu Kim, Yong Soon Park, Won Sun Park
Systems and methods for stream-dataflow acceleration wherein a delay is implemented so as to equalize arrival times of data packets at a destination functional unit

Patent number: 11048661

Abstract: A dataflow accelerator including a control/command core, a scratchpad and a coarse grain reconfigurable array (CGRA) according to an exemplary embodiment is disclosed. The scratchpad may include a write controller to transmit data to an input vector port interface and to receive data from the input vector port interface. The CGRA may receive data from the input vector port interface and includes a plurality of interconnects and a plurality of functional units.

Type: Grant

Filed: April 15, 2019

Date of Patent: June 29, 2021

Assignee: SIMPLE MACHINES INC.

Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar, Preyas Shah, Newsha Ardalani
Integrated circuits with machine learning extensions

Patent number: 10970042

Abstract: An integrated circuit with specialized processing blocks is provided. A specialized processing block may be optimized for machine learning algorithms and may include a multiplier data path that feeds an adder data path. The multiplier data path may be decomposed into multiple partial product generators, multiple compressors, and multiple carry-propagate adders of a first precision. Results from the carry-propagate adders may be added using a floating-point adder of the first precision. Results from the floating-point adder may be optionally cast to a second precision that is higher or more accurate than the first precision. The adder data path may include an adder of the second precision that combines the results from the floating-point adder with zero, with a general-purpose input, or with other dot product terms. Operated in this way, the specialized processing block provides a technical improvement of greatly increasing the functional density for implementing machine learning algorithms.

Type: Grant

Filed: September 27, 2018

Date of Patent: April 6, 2021

Assignee: Intel Corporation

Inventors: Martin Langhammer, Dongdong Chen, Kevin Hurd
Two dimensional masked shift instruction

Patent number: 10915319

Abstract: An image processor is described. The image processor includes a two dimensional shift register array that couples certain ones of its array locations to support execution of a shift instruction. The shift instruction is to include mask information. The mask information is to specify which of the array locations are to be written to with information being shifted. The two dimensional shift register array includes masking logic circuitry to write the information being shifted into specified ones of the array locations in accordance with the mask information.

Type: Grant

Filed: May 15, 2017

Date of Patent: February 9, 2021

Assignee: Google LLC

Inventor: Albert Meixner
Instructions and logic for vector bit field compression and expansion

Patent number: 10705845

Abstract: A processor includes a core to execute an instruction for conversion between an element array and a packed bit array. The core includes logic to identify one or more bit-field lengths to be used by the packed bit array, identify a width of elements of the element array, and simultaneously for elements of the element array and for bit-fields of the packed bit array, convert between the element array and the packed bit array based upon the bit-field length and the width of elements of the element array.

Type: Grant

Filed: June 18, 2018

Date of Patent: July 7, 2020

Assignee: Intel IP Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Robert Valentine
Pipeline including separate hardware data paths for different instruction types

Patent number: 10656951

Abstract: A processing element is implemented in a stage of a pipeline and configured to execute an instruction. A first array of multiplexers is to provide information associated with the instruction to the processing element in response to the instruction being in a first set of instructions. A second array of multiplexers is to provide information associated with the instruction to the first processing element in response to the instruction being in a second set of instructions. A control unit is to gate at least one of power or a clock signal provided to the first array of multiplexers in response to the instruction being in the second set.

Type: Grant

Filed: October 20, 2017

Date of Patent: May 19, 2020

Assignees: ADVANCED MICRO DEVICES, INC., ADVANCED MICRO DEVICES (SHANGHAI) CO., LTD.

Inventors: Jiasheng Chen, YunXiao Zou, Bin He, Angel E. Socarras, QingCheng Wang, Wei Yuan, Michael Mantor
Inspection system, inspection device, and inspection method

Patent number: 10649830

Abstract: It is determined whether an arithmetic operation function of a device to be inspected is normal or not. A MCU 13 to be inspected acquires a constant to be used for an arithmetic problem from a power source IC 12 on an inspection side. The MCU 13 sequentially selects a plurality of the arithmetic problems and carries out an arithmetic operation using the acquired constant according to the selected arithmetic problem. A monitoring circuit 23 of the power source IC 12 receives the result of the arithmetic operation of the arithmetic problem from the MCU 13. The monitoring circuit 23 compares the received arithmetic operation result with the arithmetic operation result of the arithmetic problem calculated at the side of the monitoring circuit 23. The monitoring circuit 23 determines whether the arithmetic operation function of the MCU 13 works normally or not based on the comparison result.

Type: Grant

Filed: February 20, 2018

Date of Patent: May 12, 2020

Assignee: RENESAS ELECTRONICS CORPORATION

Inventor: Seiichi Kousokabe
Operand cache coherence for SIMD processor supporting predication

Patent number: 10613987

Abstract: In some embodiments, a system includes an execution unit, a register file, an operand cache, and a predication control circuit. Operands identified by an instruction may be stored in the operand cache. One or more entries of the operand cache that store the operands may be marked as dirty. The predication control circuit may identify an instruction as having an unresolved predication state. Subsequent to initiating execution of the instruction, the predication control circuit may receive results of the at least one unresolved conditional instruction. In response to the results indicating the instruction has a known-to-execute predication state, the predication control circuit may initiate writing, in the operand cache, results of executing the instruction. In response to the results indicating the instruction has a known-not-to-execute predication state, the predication control circuit may prevent the results from executing the instruction from being written in the operand cache.

Type: Grant

Filed: September 23, 2016

Date of Patent: April 7, 2020

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Terence M. Potter
Multiply-accumulate “0” data gating

Patent number: 10606559

Abstract: In an example, an apparatus comprises a plurality of execution units and logic, at least partially including hardware logic, to gate at least one of a multiply unit or an accumulate unit in response to an input of value zero. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: June 12, 2019

Date of Patent: March 31, 2020

Assignee: INTEL CORPORATION

Inventors: Yaniv Fais, Tomer Bar-On, Jacob Subag, Jeremie Dreyfuss, Lev Faivishevsky, Michael Behar, Amit Bleiweiss, Guy Jacob, Gal Leibovich, Itamar Ben-Ari, Galina Ryvchin, Eyal Yaacoby
Integrated circuit and method for processing synchronized network frames using a hardware synchronization circuit

Patent number: 10530560

Abstract: In an embodiment, an integrated circuit (IC) device is disclosed. In the embodiment, the IC device includes an Ethernet frame processor, at least one Ethernet port coupled to the Ethernet frame processor, and a hardware synchronization circuit coupled to the Ethernet frame processor and to the at least one Ethernet port, the hardware synchronization circuit including a controller, a local clock, a media-independent peripheral coupled to the controller, and a media-dependent peripheral coupled to the media-independent peripheral, wherein power can be provided to the hardware synchronization circuit independent of the Ethernet frame processor.

Type: Grant

Filed: June 20, 2016

Date of Patent: January 7, 2020

Assignee: NXP B.V.

Inventors: Hubertus Gerardus Hendrikus Vermeulen, Nicola Concer
Instructions and logic to perform floating-point and integer operations for machine learning

Patent number: 10474458

Abstract: One embodiment provides for a machine-learning hardware accelerator comprising a compute unit having an adder and a multiplier that are shared between integer data path and a floating-point datapath, the upper bits of input operands to the multiplier to be gated during floating-point operation.

Type: Grant

Filed: October 18, 2017

Date of Patent: November 12, 2019

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
Method and apparatus for performing logical compare operations

Patent number: 10416997

Abstract: A method and apparatus for including in processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.

Type: Grant

Filed: October 18, 2018

Date of Patent: September 17, 2019

Assignee: Intel Corporation

Inventors: Rajiv Kapoor, Ronen Zohar, Mark J. Buxton, Zeev Sperber, Koby Gottlieb
Hierarchical synthesis of computer machine instructions

Patent number: 10157164

Abstract: Aspects disclosed herein relate to aggregating functionality of computer machine instructions to generate additional computer machine instructions and including the additional computer machine instructions in an instruction set architecture (ISA). An exemplary method includes selecting at least first and second computer machine instructions from an instruction set, aggregating functionality of the first and second computer machine instructions to generate a third computer machine instruction, and adding the third computer machine instruction to the instruction set.

Type: Grant

Filed: September 20, 2016

Date of Patent: December 18, 2018

Assignee: QUALCOMM Incorporated

Inventors: Sangyeol Kang, Ovidiu Cristian Miclea, Stephen Michael Verrall
Decimal multiply and shift instruction

Patent number: 10127015

Abstract: An instruction to perform a multiply and shift operation is executed. The executing includes multiplying a first value and a second value obtained by the instruction to obtain a product. The product is shifted in a specified direction by a user-defined selected amount to provide a result, and the result is placed in a selected location. The result is to be used in processing within the computing environment.

Type: Grant

Filed: September 30, 2016

Date of Patent: November 13, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Jonathan D. Bradbury, Steven R. Carlough, Reid T. Copeland, Silvia Melitta Mueller
8x8 binary digital multiplier

Patent number: 10025557

Abstract: An 8×8 binary digital multiplier reduces the height of partial product columns to be no more than 7 bits high. The six 7-bit high middle columns are each input to a (7:3) counter. An ascending triangle compressor operates on the lesser significant bit columns. A descending triangle compressor operates on the greater significant bit columns. The counter and compressor outputs are combined for a final stage of compression, followed by partial product addition.

Type: Grant

Filed: December 5, 2015

Date of Patent: July 17, 2018

Assignee: Firefly DSP LLC

Inventors: Craig Franklin, David Cureton Baker
Register file structures combining vector and scalar data with global and local accesses

Patent number: 10007518

Abstract: The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit.

Type: Grant

Filed: July 9, 2014

Date of Patent: June 26, 2018

Assignee: TEXAS INSTRUMENTS INCORPORATED

Inventors: Timothy David Anderson, Duc Quang Bui, Mel Alan Phipps, Todd T. Hahn, Joseph Zbiciak
Method for disabling a legacy loop detect circuit in a beta node

Patent number: 9882739

Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.

Type: Grant

Filed: August 30, 2016

Date of Patent: January 30, 2018

Assignee: DAP Holding B.U.

Inventor: Richard Mourn
Unified integer and floating-point compare circuitry

Patent number: 9846579

Abstract: Techniques are disclosed relating to comparison circuitry. In some embodiments, compare circuitry is configured to generate comparison results for sets of inputs in both one or more integer formats and one or more floating-point formats. In some embodiments, the compare circuitry includes padding circuitry configured to add one or more bits to each of first and second input values to generate first and second padded values. In some embodiments, the compare circuitry also includes integer subtraction circuitry configured to subtract the first padded value from the second padded value to generate a subtraction result. In some embodiments, the compare circuitry includes output logic configured to generate the comparison result based on the subtraction result. In various embodiments, using at least a portion of the same circuitry (e.g., the subtractor) for both integer and floating-point comparisons may reduce processor area.

Type: Grant

Filed: June 13, 2016

Date of Patent: December 19, 2017

Assignee: Apple Inc.

Inventors: Liang-Kai Wang, Terence M. Potter, Andrew M. Havlir
Endian configuration memory and ECC protecting processor endianess mode circuit

Patent number: 9710318

Abstract: An electronic circuit includes a microcontroller processor (410), a peripheral (420) coupled with the processor, an endian circuit (470) coupled with the processor and the peripheral to selectively provide different endianess modes of operation, and a detection circuit (140) to detect a failure to select a given endianess, whereby inadvertent switch of endianess due to faults is avoided. Other circuits, devices, systems, methods of operation and processes of manufacture are also disclosed.

Type: Grant

Filed: January 22, 2015

Date of Patent: July 18, 2017

Assignee: Texas Instruments Incorporated

Inventors: Yanyang Xiao, Alexandre Pierre Palus, Karl Friedrich Greb, Kevin Patrick Lavery, Paul Krause
Apparatus and method for performing absolute difference operation

Patent number: 9678716

Abstract: An apparatus comprises processing circuitry for performing an absolute difference operation for generating an absolute difference value in response to the first operand the second operand. The processing circuitry supports variable data element sizes for data elements of the first and second operands and the absolute difference value. Each data element of the absolute difference value represents an absolute difference between corresponding data elements of the first and second operands. The processing circuitry has an adding stage for performing at least one addition to generate at least one intermediate value and an inverting stage for inverting selected bits of each intermediate value. Control circuitry generates control information based on the current data element size and status information generated in the adding stage, to identify the selected bits to be inverted in the inverting stage to convert each intermediate value into a corresponding portion of the absolute difference value.

Type: Grant

Filed: December 22, 2014

Date of Patent: June 13, 2017

Assignee: ARM Limited

Inventors: Neil Burgess, David Raymond Lutz
Method for disabling a legacy loop detect circuit in a Beta node

Patent number: 9602302

Abstract: A method for disabling or removing a Legacy loop detect circuit to eliminate the circuit erroneously detecting a legacy loop during a IEEE-1394 serial bus initialization. The method includes providing a programmable code to the Legacy loop detect circuit for increasing a reset count to a value greater than three (3) to the Legacy loop circuit thus reducing the probability of an erroneous disconnect of a Beta node connection. This method provides for more robust Beta loop node operation during high frequency bus resets.

Type: Grant

Filed: June 17, 2014

Date of Patent: March 21, 2017

Assignee: DAP Holding B.U.

Inventor: Richard Mourn
Collective communications apparatus and method for parallel systems

Patent number: 9477628

Abstract: A collective communication apparatus and method for parallel computing systems. For example, one embodiment of an apparatus comprises a plurality of processor elements (PEs); collective interconnect logic to dynamically form a virtual collective interconnect (VCI) between the PEs at runtime without global communication among all of the PEs, the VCI defining a logical topology between the PEs in which each PE is directly communicatively coupled to a only a subset of the remaining PEs; and execution logic to execute collective operations across the PEs, wherein one or more of the PEs receive first results from a first portion of the subset of the remaining PEs, perform a portion of the collective operations, and provide second results to a second portion of the subset of the remaining PEs.

Type: Grant

Filed: September 28, 2013

Date of Patent: October 25, 2016

Assignee: Intel Corporation

Inventors: Allan D. Knies, David Pardo Keppel, Dong Hyuk Woo, Joshua B. Fryman
Low power computation architecture

Patent number: 9411726

Abstract: An embodiment includes a system, comprising a first memory; a plurality of first circuits, wherein each first circuit is coupled to the memory; and includes a second circuit configured to generate a first output value in response to an input value received from the first memory; and an accumulator configured to receive the first output value and generate a second output value; and a controller coupled to the memory and the first circuits, and configured to determine the input values to be transmitted from the memory to the first circuits.

Type: Grant

Filed: May 14, 2015

Date of Patent: August 9, 2016

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ilia Ovsiannikov, Zhengping Ji, Yibing Michelle Wang, Hongyu Wang
Parallelization method, system, and program

Patent number: 9311273

Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.

Type: Grant

Filed: July 26, 2013

Date of Patent: April 12, 2016

Assignee: International Business Machines Corporation

Inventors: Shuhichi Shimizu, Takeo Yoshizawa
Parallelization method, system, and program

Patent number: 9218317

Abstract: A segment including a set of blocks necessary to calculate blocks having internal states and blocks having no outputs is extracted by tracing from blocks for use in calculating inputs into the blocks having internal states and from the blocks having no outputs in the reverse direction of dependence. To newly extract segments in which blocks contained in the extracted segments are removed, a set of nodes to be temporarily removed is determined on the basis of parallelism. Segments executable independently of other segments are extracted by tracing from nodes whose child nodes are lost by removal of the nodes in the upstream direction. Segments are divided into upstream segments representing the newly extracted segments and downstream segments representing nodes temporarily removed. Upstream and downstream segments are merged so as to reduce overlapping blocks between segments such that the number of segments is reduced to the number of parallel executions.

Type: Grant

Filed: August 21, 2013

Date of Patent: December 22, 2015

Assignee: International Business Machines Corporation

Inventors: Shuhichi Shimizu, Takeo Yoshizawa
Systems, apparatuses, and methods for reducing the number of short integer multiplications

Patent number: 9207941

Abstract: Systems, methods, and apparatuses for calculating a square of a data value of a first source operand, a square of a data value of a second source operand, and a multiplication of the data of the first and second operands only using one multiplication are described.

Type: Grant

Filed: March 15, 2013

Date of Patent: December 8, 2015

Assignee: Intel Corporation

Inventors: Ilya Albrekht, Elmoustapha Ould-Ahmed-Vall
Multi-function floating point unit

Patent number: 9104510

Abstract: Arithmetic units and methods for floating point processing are provided. In exemplary embodiments, data paths to and from multiple multipliers and adders are flexibly combined through crossbars and alignment units to allow a wide range of mathematical operations, including affine and SIMD operations. The micro-architecture for a high-performance flexible vector floating point arithmetic unit is provided, which can perform a single-cycle throughput complex multiply-and-accumulate operation, as well as a Fast Fourier Transform (radix-2 decimation-in-time) Butterfly operation.

Type: Grant

Filed: April 30, 2010

Date of Patent: August 11, 2015

Assignee: Audience, Inc.

Inventors: Leonardo Rub, Dana Massie, Samuel Dicker
Processor for performing multiply-add operations on packed data

Patent number: 8793299

Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.

Type: Grant

Filed: March 13, 2013

Date of Patent: July 29, 2014

Assignee: Intel Corporation

Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
Dual-multiply-accumulator operation optimized for even and odd multisample calculations

Patent number: 8756267

Abstract: According to some embodiments, a device is configured to perform a dual multiply-accumulate operation. In one embodiment, the device includes a functional unit configured to calculate, in parallel, a first multiplication product of a first coefficient and a first sample; and a second multiplication product of the first coefficient and a second sample. The first sample is an (n)th sample and the second sample is an (n+2)th sample in a plurality of sequential samples. The functional unit outputs and stores the first multiplication product and the second multiplication product in different storage locations in at least one storage device.

Type: Grant

Filed: October 31, 2011

Date of Patent: June 17, 2014

Assignee: Marvell International Ltd.

Inventors: Bradley Aldrich, Nigel C. Paver, William T. Maghielse
Processor for performing multiply-add operations on packed data

Patent number: 8745119

Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.

Type: Grant

Filed: March 13, 2013

Date of Patent: June 3, 2014

Assignee: Intel Corporation

Inventors: Alexander Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
Processor for performing multiply-add operations on packed data

Patent number: 8725787

Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.

Type: Grant

Filed: April 26, 2012

Date of Patent: May 13, 2014

Assignee: Intel Corporation

Inventors: Alexander D. Peleg, Millind Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf Witt
Processing with Compact Arithmetic Processing Element

Publication number: 20140095571

Abstract: A processor or other device, such as a programmable and/or massively parallel processor or other device, includes processing elements designed to perform arithmetic operations (possibly but not necessarily including, for example, one or more of addition, multiplication, subtraction, and division) on numerical values of low precision but high dynamic range (“LPHDR arithmetic”). Such a processor or other device may, for example, be implemented on a single chip. Whether or not implemented on a single chip, the number of LPHDR arithmetic elements in the processor or other device in certain embodiments of the present invention significantly exceeds (e.g., by at least 20 more than three times) the number of arithmetic elements, if any, in the processor or other device which are designed to perform high dynamic range arithmetic of traditional precision (such as 32 bit or 64 bit floating point arithmetic).

Type: Application

Filed: March 25, 2013

Publication date: April 3, 2014

Inventor: Joseph Bates
Latency tolerant system for executing video processing operations

Patent number: 8687008

Abstract: A latency tolerant system for executing video processing operations. The system includes a host interface for implementing communication between the video processor and a host CPU, a scalar execution unit coupled to the host interface and configured to execute scalar video processing operations, and a vector execution unit coupled to the host interface and configured to execute vector video processing operations. A command FIFO is included for enabling the vector execution unit to operate on a demand driven basis by accessing the memory command FIFO. A memory interface is included for implementing communication between the video processor and a frame buffer memory. A DMA engine is built into the memory interface for implementing DMA transfers between a plurality of different memory locations and for loading the command FIFO with data and instructions for the vector execution unit.

Type: Grant

Filed: November 4, 2005

Date of Patent: April 1, 2014

Assignee: NVIDIA Corporation

Inventors: Ashish Karandikar, Shirish Gadre, Stephen D. Lew
Configuring a programmable integrated circuit device to perform matrix multiplication

Patent number: 8626815

Abstract: In a matrix multiplication in which each element of the resultant matrix is the dot product of a row of a first matrix and a column of a second matrix, each row and column can be broken into manageable blocks, with each block loaded in turn to compute a smaller dot product, and then the results can be added together to obtain the desired row-column dot product. The earliest results for each dot product are saved for a number of clock cycles equal to the number of portions into which each row or column is divided. The results are then added to provide an element of the resultant matrix. To avoid repeated loading and unloading of the same data, all multiplications involving a particular row-block can be performed upon loading that row-block, with the results cached until other multiplications for the resultant elements that use the cached results are complete.

Type: Grant

Filed: March 3, 2009

Date of Patent: January 7, 2014

Assignee: Altera Corporation

Inventor: Martin Langhammer
Method and apparatus for performing multiply-add operations on packed data

Patent number: 8626814

Abstract: A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data.

Type: Grant

Filed: July 1, 2011

Date of Patent: January 7, 2014

Assignee: Intel Corporation

Inventors: Alexander Peleg, Milland Mittal, Larry M. Mennemeier, Benny Eitan, Carole Dulong, Eiichi Kowashi, Wolf C. Witt
Image processing apparatus for allocating job data for rip processing, image processing method for allocating job data for rip rocessing, and computer-readable storage medium

Patent number: 8619312

Abstract: An image processing apparatus includes: a receiving unit that receives job data of plural pages; plural RIP processors that interpret and expand the job data into raster images; and an allocating unit that allocates the plural pages of the job data to the plural RIP processors for RIP processing, the allocating unit dividing the job data based on a predetermined data size regardless of page breaks, allocating job data that is to be RIP processed to the plural RIP processors, sending job data, corresponding to and after the pages of the data that is to be RIP processed, to the plural RIP processors, and when a head part of the job data that is to be RIP processed allocated by the allocating unit is in the middle of a page, the plural RIP processors RIP processing the job data from the beginning of the next page.

Type: Grant

Filed: September 9, 2010

Date of Patent: December 31, 2013

Assignee: Fuji Xerox Co., Ltd.

Inventor: Takuya Mizuguchi
Method, system, and product for performing uniformly fine-grain data parallel computing

Patent number: 8620985

Abstract: A method is disclosed that includes computing, using at least one uniformly fine-grain data parallel computing unit, a mean-square error regression within a regression clustering algorithm. The mean-square error regression is represented in the form of at least one summation of a vector-vector multiplication. A computer program product and a computer system are also disclosed.

Type: Grant

Filed: October 14, 2010

Date of Patent: December 31, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Bin Zhang, Ren Wu, Meichun Hsu
Extended-precision integer arithmetic and logical instructions

Patent number: 8615541

Abstract: The invention set forth herein describes a mechanism for efficiently performing extended precision operations on multi-word source operands. Corresponding data words of the source operands are processed together via each instruction of a cascading sequence of instructions. State information generated when each instruction is processed is stored in condition code flags. The state information is optionally used in the processing of subsequent instructions in the sequence and/or accumulated with previously set state information.

Type: Grant

Filed: September 23, 2010

Date of Patent: December 24, 2013

Assignee: NVIDIA Corporation

Inventors: Richard Craig Johnson, John R. Nickolls
Arithmetic logic unit

Patent number: 8589464

Abstract: An arithmetic logic unit is provided. The arithmetic logic unit preferably includes a minimum of routing delays. An arithmetic logic unit according to the invention preferably receives a plurality of operands from a plurality of operand registers, performs an arithmetic operation on the operands, obtains a result of the arithmetic operation and that transmits the result to a result register. The arithmetic logic unit includes a signal propagation path that includes no greater than two routing paths that connect non-immediately adjacent logic elements.

Type: Grant

Filed: May 8, 2008

Date of Patent: November 19, 2013

Assignee: Altera Corporation

Inventor: Paul J. Metzgen
Digital signal processing circuit blocks with support for systolic finite-impulse-response digital filtering

Patent number: 8589465

Abstract: Digital signal processing (“DSP”) block circuitry on an integrated circuit (“IC”) is adapted for use (e.g., in multiple instances of the DSP block circuitry on the IC) for implementing finite-impulse-response (“FIR”) digital filters in systolic form. Each DSP block may include (1) first and second multiplier circuitry and (2) adder circuitry for adding (a) outputs of the multipliers and (b) signals chained in from a first other instance of the DSP block circuitry. Systolic delay circuitry is provided for either the outputs of the first multiplier (upstream from the adder) or at least one of the sets of inputs to the first multiplier. Additional systolic delay circuitry is provided for outputs of the adder, which are chained out to a second other instance of the DSP block circuitry.

Type: Grant

Filed: May 8, 2013

Date of Patent: November 19, 2013

Assignee: Altera Corporation

Inventors: Suleyman Sirri Demirsoy, Hyun Yi

1 2 3 next