Shifting Patents (Class 708/209)

Patent number: 10409592Abstract: An apparatus has processing circuitry comprising an L×M multiplier array. An instruction decoder associated with the processing circuitry supports a multiplyandaccumulateproduct (MAP) instruction for generating at least one result element corresponding to a sum of respective E×F products of Ebit and Fbit portions of Jbit and Kbit operands respectively, where 1<E<J?L and 1<F<K?M. In response to the MAP instruction, the instruction decoder controls the processing circuitry to rearrange Fbit portions of the second Kbit operand to form a transformed Kbit operand, and to control the L×M multiplier array in dependence on the first Jbit operand and the transformed Kbit operand to add the respective E×F products using a subset of the adders used for accumulating partial products for a conventional multiplication.Type: GrantFiled: April 24, 2017Date of Patent: September 10, 2019Assignee: ARM LimitedInventors: Neil Burgess, David Raymond Lutz, Javier Diaz Bruguera

Patent number: 10366741Abstract: Circuitry comprises: a set of bit processing circuitries to apply two or more successive instances of bitwise processing to an ordered bit array; each bit processing circuitry for a given bit position within the ordered bit array comprising: bit shifting circuitry to selectively apply a bit shift of a respective input bit to a next bit processing circuitry in a first direction relative to the ordered bit array, in response to an active state of a bit shift control signal, the bit shifting circuitry not applying the bit shift in response to an inactive state of the bit shift control signal; and bit shift control circuitry to selectively allow or inhibit a bit shifting operation in response to one or more inhibit control signals; in which: the bit shift control circuitry is configured to selectively propagate an output inhibit control signal, indicating that a bit shifting operation should be inhibited, as an inhibit control signal to bit processing circuitry applying a next instance of the bitwise processing aType: GrantFiled: September 21, 2017Date of Patent: July 30, 2019Assignee: ARM LimitedInventors: Neil Burgess, Nigel John Stephens, Lee Evan Eisen, Jaime Ferragut MartinezVara De Rey

Patent number: 10318298Abstract: An apparatus and method for performing leftshifting operations on packed quadword data.Type: GrantFiled: September 29, 2017Date of Patent: June 11, 2019Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha OuldAhmedVall, Robert Valentine, Mark Charney, Jesus Corbal

Patent number: 10296333Abstract: Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size.Type: GrantFiled: December 27, 2016Date of Patent: May 21, 2019Assignee: Intel CorporationInventors: Asaf Rubinstein, Tom Aviram

Patent number: 10289382Abstract: An apparatus for mathematical manipulation is described allowing the selective combination of shifters to shift binary numbers of various widths. Selective combination allows onthefly adjustment of shifters from independent to coordinated shifting operations. Selective combination allows adjustable hardwarebased shifting while saving space and resources. Multiple eightbit shifters can be configured for a variety of operand widths, such as a 32bit width, a 24bit width, a 16bit width, or an eightbit width. Multiplexers route the appropriate input data to the appropriate shifters. Bidirectional shifting is configured through a selector tree, including both shift left and shift right operations. Opcodes configure the shifters for the desired type of shift and a shifted result is generated.Type: GrantFiled: March 30, 2018Date of Patent: May 14, 2019Assignee: Wave Computing, Inc.Inventor: Samit Chaudhuri

Patent number: 10216705Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: April 30, 2018Date of Patent: February 26, 2019Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark

Patent number: 10162633Abstract: An apparatus has processing circuitry comprising multiplier circuitry for performing multiplication on a pair of input operands. In response to a shift instruction specifying at least one shift amount and a source operand comprising at least one data element, the source operand and a shift operand determined in dependence on the shift amount are provided as input operands to the multiplier circuitry and the multiplier circuitry is controlled to perform at least one multiplication which is equivalent to shifting a corresponding data element of the source operand by a number of bits specified by a corresponding shift amount to generate a shift result value.Type: GrantFiled: April 24, 2017Date of Patent: December 25, 2018Assignee: ARM LimitedInventors: François Christopher Jacques Botman, Thomas Christopher Grocutt

Patent number: 10126976Abstract: An address and a data size are provided to a rotator. The rotator stores, based on the address and the data size, a data element in a location having a defined number of positions. The data element includes one or more data units and the one or more data units are aligned correctly in one or more positions of the location based on a predefined position in the location to receive a selected data unit of the one or more data units. The rotator replicates a value of a chosen data unit of the one or more data units to one or more other positions of the location.Type: GrantFiled: February 17, 2017Date of Patent: November 13, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind

Patent number: 10108397Abstract: Embodiments of the inventive concept include a fast close path solution and circuit of a three path fused multiplyadder circuit. The fast close path circuit can include one or more compressors that can receive multiple operands and produce a result sum and a result carry. The close path circuit can include one or more leading zero anticipators (LZAs). The one or more LZAs can receive and process the result sum and the result carry. The close path circuit can include one or more adders. The one or more adders can receive and add the result sum and the result carry in parallel with the one or more LZAs processing the result sum and the result carry. Since the close path is the critical timing path, by performing the addition operations in parallel with the LZA and/or priority encode (PENC) operations, the logic depth and latency of the close path are reduced.Type: GrantFiled: February 1, 2016Date of Patent: October 23, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Ashraf Ahmed

Patent number: 10013258Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.Type: GrantFiled: September 29, 2014Date of Patent: July 3, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind

Patent number: 9959247Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: April 25, 2017Date of Patent: May 1, 2018Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark

Patent number: 9959094Abstract: An arithmetic apparatus comprises a plurality of cascadeconnected arithmetic units. Each of the plurality of arithmetic units comprises: a calculator configured to operate in one of a rotation mode of performing a rotation calculation, and a vectoring mode of calculating a rotation angle; and a holding unit configured to hold rotational direction information output from the calculator in the vectoring mode. In addition, when operating in the rotation mode, the calculator performs the rotation calculation on data input from an arithmetic unit in a preceding stage, based on the rotational direction information held in the holding unit.Type: GrantFiled: June 3, 2015Date of Patent: May 1, 2018Assignee: CANON KABUSHIKI KAISHAInventors: Tadayoshi Nakayama, Koki Mitsunami

Patent number: 9933996Abstract: An apparatus for mathematical manipulation is described allowing the selective combination of shifters to shift binary numbers of various widths. Selective combination allows onthefly adjustment of shifters from independent to coordinated shifting operations. Selective combination allows adjustable hardwarebased shifting while saving space and resources. Multiple eightbit shifters can be configured for a variety of operand widths, such as a 32bit width, a 24bit width, a 16bit width, or an eightbit width. Multiplexers route the appropriate input data to the appropriate shifters. Opcodes configure the shifters for the desired type of shift and a shifted result is generated.Type: GrantFiled: December 20, 2013Date of Patent: April 3, 2018Assignee: Wave Computing, Inc.Inventor: Samit Chaudhuri

Patent number: 9933998Abstract: In a novel computation device, a plurality of partial product generators is communicatively coupled to a binary number multiplier. The binary number is partitioned in the computation device into nonoverlapping subsets of binary bits and each subset is coupled to one of the plurality of partial product generators. Each partial product generator, upon receiving a subset of binary bits representing a number, generates a multiplication product of the number and a predetermined constant. The multiplication products from all partial product generators are summed to generate the final product between the predetermined constant and the binary number. The partial product generators are constructed by logic gates and wires connected the logic gates including a AND gate. The partial product generators are free of memory elements.Type: GrantFiled: February 6, 2017Date of Patent: April 3, 2018Inventors: KuoTseng Tseng, Parkson Wong

Patent number: 9904511Abstract: An improved shifter design for highspeed data processors is described. The shifter may include a first stage, in which the input bits are shifted by increments of N bits where N>1, followed by a second stage, in which all bits are shifted by a residual amount. A preshift may be removed from an input to the shifter and replaced by a shift adder at the second stage to further increase the speed of the shifter.Type: GrantFiled: February 6, 2015Date of Patent: February 27, 2018Assignee: Cavium, Inc.Inventors: Nitin Mohan, Ilan Pragaspathy

Patent number: 9841979Abstract: A method and corresponding apparatus for processing a shuffle instruction are provided. Shuffle units are configured in a hierarchical structure, and each of the shuffle units generates a shuffled data element array by performing shuffling on an input data element array. In the hierarchical structure, which includes an upper shuffle unit and a lower shuffle unit, the shuffled data element array output from the lower shuffle unit is input to the upper shuffle unit as a portion of the input data element array for the upper shuffle unit.Type: GrantFiled: July 14, 2014Date of Patent: December 12, 2017Assignee: Samsung Electronics Co., Ltd.Inventors: Keshava Prasad, Navneet Basutkar, Young Hwan Park, Ho Yang, Yeon Bok Lee

Patent number: 9805819Abstract: Described herein are techniques, systems, and circuits for addressing image data according to blocks. For example, in some cases, the address space may be divided into high order address bits and low order address bits. In these cases, an address circuit may twist an address space by shifting the high order bits and low order bits of an address in a rightward direction, shifting the low order bits of the address in a leftward direction, and shifting the high order bits and the low order bits of the address in the leftward direction. The circuit may modify the address value and untwist the address space. For example, the untwisting may include shifting the high order bits and the low order bits of an address in the rightward direction, shifting the low order bits of the address in the rightward direction, and shifting the high order bits and the low order bits of the address in the leftward direction.Type: GrantFiled: October 14, 2016Date of Patent: October 31, 2017Assignee: Amazon Technologies, Inc.Inventor: Carl Ryan Kelso

Patent number: 9762365Abstract: It is possible to provide a radio communication terminal device and a radio transmission method which can improve reception performance of a CQI and a reference signal. A phase table storage unit stores a phase table which correlates the amount of cyclic shift to complex coefficients {w1, w2} to be multiplied on the reference signal. A complex coefficient multiplication unit reads out a complex coefficient corresponding to the amount of cyclic shift indicated by resource allocation information, from the phase table storage unit and multiplies the readout complex coefficient on the reference signal so as to change the phase relationship between the reference signals in a slot.Type: GrantFiled: December 12, 2016Date of Patent: September 12, 2017Assignee: Sun Patent TrustInventors: Tomofumi Takata, Daichi Imamura, Seigo Nakao, Sadaki Futagi, Takashi Iwai, Yoshihiko Ogawa

Patent number: 9665346Abstract: Mechanisms are provided for performing a floating point arithmetic operation in a data processing system. A plurality of floating point operands of the floating point arithmetic operation are received and bits in a mantissa of at least one floating point operand of the plurality of floating point operands are shifted. One or more bits of the mantissa that are shifted outside a range of bits of the mantissa of at least one floating point operand are stored and a vector value is generated based on the stored one or more bits of the mantissa that are shifted outside of the range of bits of the mantissa of the at least one floating point operand. A resultant value is generated for the floating point arithmetic operation based on the vector value and the plurality of floating point operands.Type: GrantFiled: October 28, 2014Date of Patent: May 30, 2017Assignee: International Business Machines CorporationInventors: John B. Carter, Bruce G. Mealey, Karthick Rajamani, Eric E. Retter, Jeffrey A. Stuecheli

Patent number: 9658986Abstract: A data processing apparatus includes a computing unit that performs a matrix computation between data streams whose unit data is of a matrix format; a determining unit that for each matrix obtained by the matrix computation by the computing unit, determines based on the value of each element included in the matrix, an exponent value for expressing each element included in the matrix as a floating decimal point value; a converting unit that converts the value of each element into a significand value of the element, according to the exponent value determined by the determining unit; and an output unit that correlates and outputs the exponent value and each matrix after conversion in which the value of each element in the matrix has been converted by the converting unit.Type: GrantFiled: January 29, 2014Date of Patent: May 23, 2017Assignee: FUJITSU LIMITEDInventors: Yi Ge, Noboru Kobayashi, Hiroshi Hatano, Yasuhiro Oyama

Patent number: 9600194Abstract: An address and a data size are provided to a rotator. The rotator stores, based on the address and the data size, a data element in a location having a defined number of positions. The data element includes one or more data units and the one or more data units are aligned correctly in one or more positions of the location based on a predefined position in the location to receive a selected data unit of the one or more data units. The rotator replicates a value of a chosen data unit of the one or more data units to one or more other positions of the location.Type: GrantFiled: November 25, 2015Date of Patent: March 21, 2017Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind

Patent number: 9554376Abstract: It is possible to provide a radio communication terminal device and a radio transmission method which can improve reception performance of a CQI and a reference signal. A phase table storage unit stores a phase table which correlates the amount of cyclic shift to complex coefficients {w1, w2} to be multiplied on the reference signal. A complex coefficient multiplication unit reads out a complex coefficient corresponding to the amount of cyclic shift indicated by resource allocation information, from the phase table storage unit and multiplies the readout complex coefficient on the reference signal so as to change the phase relationship between the reference signals in a slot.Type: GrantFiled: March 1, 2016Date of Patent: January 24, 2017Assignee: Sun Patent TrustInventors: Tomofumi Takata, Daichi Imamura, Seigo Nakao, Sadaki Futagi, Takashi Iwai, Yoshihiko Ogawa

Patent number: 9537510Abstract: A variable shifter includes: a plurality of shifters that cyclically shift input data having a plurality of bits or cyclically shifted data; and a control unit that selects a shift amount for each of the plurality of shifters in accordance with a predetermined cyclic shift amount. The number of types of the predetermined cyclic shift amount is smaller than the number of bits in the input data, each shifter selects one of a plurality of shift amounts in accordance with the predetermined cyclic shift amount, and the plurality of shift amounts have a combination of shift amounts that differ from one shifter to another.Type: GrantFiled: February 2, 2015Date of Patent: January 3, 2017Assignee: Panasonic Intellectual Property Managament Co., Ltd.Inventors: Hiroyuki Motozuka, Hiroyuki Yoshikawa

Patent number: 9529591Abstract: Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size.Type: GrantFiled: December 30, 2011Date of Patent: December 27, 2016Assignee: Intel CorporationInventors: Asaf Rubinstein, Tom Aviram

Patent number: 9519483Abstract: A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured to generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data.Type: GrantFiled: December 22, 2011Date of Patent: December 13, 2016Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Srikanth Arekapudi, Saurabh Gupta

Patent number: 9478312Abstract: Described herein are techniques, systems, and circuits for addressing image data according to blocks. For example, in some cases, the address space may be divided into high order address bits and low order address bits. In these cases, an address circuit may twist an address space by shifting the high order bits and low order bits of an address in a rightward direction, shifting the low order bits of the address in a leftward direction, and shifting the high order bits and the low order bits of the address in the leftward direction. The circuit may modify the address value and untwist the address space. For example, the untwisting may include shifting the high order bits and the low order bits of an address in the rightward direction, shifting the low order bits of the address in the rightward direction, and shifting the high order bits and the low order bits of the address in the leftward direction.Type: GrantFiled: December 23, 2014Date of Patent: October 25, 2016Assignee: Amazon Technologies, Inc.Inventor: Carl Ryan Kelso

Patent number: 9398617Abstract: Embodiments of the present invention relate to a method and an apparatus for processing random access in a wireless communication network, and a processing method of a user equipment and an apparatus. The method for processing random access in the communication network includes: the base station receives a first ZadoffChu sequence and a second ZadoffChu sequence that are sent by a user equipment, a du of the first ZadoffChu sequence is smaller than a du of the second ZadoffChu sequence; the base station estimates an error range for a round trip delay RTD of the user equipment according to the first ZadoffChu sequence, estimates, according to the second ZadoffChu sequence, the RTD within the error range for the RTD or a frequency offset of an uplink signal of the user equipment. The problem that the user equipment with a frequency offset accesses a network is solved.Type: GrantFiled: July 8, 2014Date of Patent: July 19, 2016Assignee: Huawei Technologies Co., Ltd.Inventors: Changyu Guo, Li Wan, Chunhui Le, Jing Li, Yan Liu

Patent number: 9317283Abstract: A processor may generate a result vector when executing a RunningShiftForDivide1P or RunningShiftForDivide2P instruction. For example, upon executing a RunningShiftForDivide1P/2P instruction, the processor may receive a first input vector and a second input vector. The processor then may record a base value from an element at a key element position in the first input vector. Next, when generating the result vector, for each active element in the result vector to the right of the key element position, the processor may generate a shifted base value using shift values from the second input vector. The processor then may correct the shifted base value when a predetermined condition is met. Next, the processor may set the element of the result vector equal to the shifted base value.Type: GrantFiled: December 17, 2012Date of Patent: April 19, 2016Assignee: Apple Inc.Inventor: Jeffry E. Gonion

Patent number: 9318813Abstract: A QRD processor for computing input signals in a receiver for wireless communication relies upon a combination of multidimensional Givens Rotations, Householder Reflections and conventional twodimensional (2D) Givens Rotations, for computing the QRD of matrices. The proposed technique integrates the benefits of multidimensional annihilation capability of Householder reflections plus the lowcomplexity nature of the conventional 2D Givens rotations. Such integration increases throughput and reduces the hardware complexity, by first decreasing the number of rotation operations required and then by enabling their parallel execution. A pipelined architecture is presented (290) that uses unrolled pipelined CORDIC processors (245a to 245d) iteratively to improve throughput and resource utilization, while reducing the gate count.Type: GrantFiled: May 24, 2010Date of Patent: April 19, 2016Assignee: MaxLinear, Inc.Inventors: Dimpesh Patel, Glenn Gulak, Mahdi Shabany

Patent number: 9189237Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: December 27, 2012Date of Patent: November 17, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9189238Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: January 29, 2013Date of Patent: November 17, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9182987Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: January 29, 2013Date of Patent: November 10, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9182985Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: November 5, 2012Date of Patent: November 10, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9182988Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: March 7, 2013Date of Patent: November 10, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9170815Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: January 29, 2013Date of Patent: October 27, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9170814Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: November 5, 2012Date of Patent: October 27, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Matthew Holliman, Eric L. Debes, Minerva M. Yeung, Huy V. Nguyen, Julien Sebot

Patent number: 9152420Abstract: Method, apparatus, and program means for performing bitstream buffer manipulation with a SIMD merge instruction. The method of one embodiment comprises determining whether any unprocessed data bits for a partial variable length symbol exist in a first data block is made. A shift merge operation is performed to merge the unprocessed data bits from the first data block with a second data block. A merged data block is formed. A merged variable length symbol comprised of the unprocessed data bits and a plurality of data bits from the second data block is extracted from the merged data block.Type: GrantFiled: January 29, 2013Date of Patent: October 6, 2015Assignee: Intel CorporationInventors: YenKuang Chen, William W. Macy, Jr., Matthew Holliman, Eric L. Debes, Minerva M. Yeung

Patent number: 9134953Abstract: Microprocessor shifter circuits utilizing butterfly and inverse butterfly circuits, and control circuits therefor, are provided. The same shifter circuits can also perform complex bit manipulations at high speeds, including butterfly and inverse butterfly operations, parallel extract and deposit operations, group operations, mix operations, permutation operations, as well as instructions executed by existing microprocessors, including shift right, shift left, rotate, extract, deposit and multimedia mix operations. The shifter circuits can be provided in various combinations to provide microprocessor functional units which perform a plurality of bit manipulation operations.Type: GrantFiled: October 9, 2012Date of Patent: September 15, 2015Assignee: Teleputers, LLCInventors: Ruby B. Lee, Yedidya Hilewitz

Publication number: 20150149518Abstract: A barrel shifter uses a sign magnitude to 2's complement converter to generate decoder signals for its cascaded multiplexer selectors. The sign input receives the shift direction and the magnitude input receives the shift amount. The sign magnitude to 2's complement converter computes an output result as a 2's complement of the shift amount using the shift direction as a sign input, assigns a first portion (most significant bit half) of the output result to a first decoder signal, and assigns a second portion (least significant bit half) of the output result to a second decoder signal. This encoding scheme allows the decoder circuits to be relatively simple, for example, 3to8 decoders for an implementation adapted to shift a 64bit operand value rather than the 4to9 decoder required in a conventional barrel shifter, leading to faster operation, less area, and reduced power consumption.Type: ApplicationFiled: January 31, 2015Publication date: May 28, 2015Applicant: International Business Machines CorporationInventor: Takeo Yasuda

Publication number: 20150134713Abstract: Examples of the present disclosure provide apparatuses and methods for performing division operations in a memory. An example apparatus comprises a first address space comprising a first number of memory cells coupled to a sense line and to a first number of select lines wherein the first address space stores a dividend value. A second address space comprises a second number of memory cells coupled to the sense line and to a second number of select lines wherein the second address space stores a divisor value. A third address space comprises a third number of memory cells coupled to the sense line and to a third number of select lines wherein the third address space stores a remainder value. Sensing circuitry can be configured to receive the dividend value and the divisor value, divide the dividend value by the divisor value, and store a remainder result in the third number of memory cells.Type: ApplicationFiled: November 8, 2013Publication date: May 14, 2015Applicant: Micron Technology, Inc.Inventor: Kyle B. Wheeler

Patent number: 9021000Abstract: A barrel shifter uses a sign magnitude to 2's complement converter to generate decoder signals for its cascaded multiplexer selectors. The sign input receives the shift direction and the magnitude input receives the shift amount. The sign magnitude to 2's complement converter computes an output result as a 2's complement of the shift amount using the shift direction as a sign input, assigns a first portion (most significant bit half) of the output result to a first decoder signal, and assigns a second portion (least significant bit half) of the output result to a second decoder signal. This encoding scheme allows the decoder circuits to be relatively simple, for example, 3to8 decoders for an implementation adapted to shift a 64bit operand value rather than the 4to9 decoder required in a conventional barrel shifter, leading to faster operation, less area, and reduced power consumption.Type: GrantFiled: June 29, 2012Date of Patent: April 28, 2015Assignee: International Business Machines CorporationInventor: Takeo Yasuda

Patent number: 9015216Abstract: In one embodiment, a rotator, a mask generator, and circuitry configured to mask the rotated operand output by the rotator with the output mask generated by the mask generator perform a shift operation. The rotator is configured to rotate the input operand by the shift count. The mask generator is configured to generate an output mask by decoding a most significant bit (MSB) field of the shift count to generate a first mask, decoding a least significant bit (LSB) field of the shift count to generate a second mask, logically ANDing the bits of the second mask with the corresponding bit of the first mask and logically ORing the result with an adjacent bit of the first mask that is selected responsive to the shift direction.Type: GrantFiled: September 14, 2011Date of Patent: April 21, 2015Assignee: Apple Inc.Inventor: Honkai Tam

Publication number: 20150100612Abstract: A method and apparatus for processing numeric calculation are provided. The method includes determining a shift bit and an index bit that falls within an index range of a lookup table from among bits representing a divisor scaled up by an offset, obtaining a replacement value corresponding to an index value of the determined index bit by using the lookup table, multiplying a dividend scaled up by the offset by the obtained replacement value, and outputting a value corresponding to a division operation by correcting a scale of a result of the multiplication using a right shift operation.Type: ApplicationFiled: July 11, 2014Publication date: April 9, 2015Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jonghun LEE, Youngsu MOON, Junguk CHO, Yongmin TAI, Dohyung KIM, Sihwa LEE

Patent number: 9002915Abstract: A circuit for shifting bussed data includes a first column of shift blocks, a compare block, and a second column of multiplexer blocks. The first column shifts the bussed data by a number of bits specified by first bits of a shift control input. The compare block determines the value of a second bit of the shift control input and creates an output reflecting that value. The second column has a control input coupled to the output of the compare block, shifts the data by one byte when the second bit of the shift control input has a first value, and does not shift the data when the second bit has a second value. The shift, compare, and multiplexer blocks can be substantially similar logic blocks programmable to perform any of these functions, can include Nbit data inputs and outputs, and can operate on the bussed data as an Nbit bus.Type: GrantFiled: April 2, 2009Date of Patent: April 7, 2015Assignee: Xilinx, Inc.Inventors: Steven P. Young, Brian C. Gaide

Publication number: 20150095388Abstract: Field programmable gate arrays (FPGA) contain, in addition to random logic, also other components, such as processing units, multiplyaccumulate (MAC) units, analog circuits, and other elements, configurable with respect of the random logic, to enhance the capabilities of the FPGA. A circuit for a filed configurable MAC unit is provided to allow various configurations of ADD, SUBTRACT, MULTIPLY and SHIFT functions. Optionally, registered input and registered output support a multicycle path. A configuration of a constant facilitates the configuration of the circuit to perform infinite impulse response (IIR) and finite impulse response (FIR) functions in hardware.Type: ApplicationFiled: December 10, 2013Publication date: April 2, 2015Applicant: Scaleo ChipInventors: Loic Vezier, Farid Tahiri

Publication number: 20150088947Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiplyadd instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.Type: ApplicationFiled: December 3, 2014Publication date: March 26, 2015Applicant: INTEL CORPORATIONInventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein

Publication number: 20150067010Abstract: An integrated circuit is provided that performs floatingpoint addition or subtraction operations involving at least three floatingpoint numbers. The floatingpoint numbers are preprocessed by dynamically extending the number of mantissa bits, determining the floatingpoint number with the biggest exponent, and shifting the mantissa of the other floatingpoint numbers to the right. Each extended mantissa has at least twice the number of bits of the mantissa entering the floatingpoint operation. The exact bit extension is dependent on the number of floatingpoint numbers to be added. The mantissas of all floatingpoint numbers with an exponent smaller than the biggest exponent are shifted to the right. The number of right shift bits is dependent on the difference between the biggest exponent and the respective floatingpoint exponent.Type: ApplicationFiled: September 5, 2013Publication date: March 5, 2015Applicant: Altera CorporationInventor: Tomasz Czajkowski

Patent number: 8972469Abstract: A system and method for efficiently rotating data in a processor for multiple operand sizes. A processor comprises a rotator configured to support multiple operand sizes. The rotator receives a rotate amount and an input operand with a size less than a maximum operand size supported by the processor. The rotator generates a mask with a same size as the received input operand. The mask comprises a number of asserted mostsignificant bits equal to the rotate amount. The remaining bits in the mask are deasserted. For a given rotation result bit position with an associated asserted mask bit, the rotator selects a value in the input operand at a bit position with a distance from the given result bit position equal to the rotate amount plus a difference between the maximum operand size supported by the processor and the input operand size.Type: GrantFiled: June 30, 2011Date of Patent: March 3, 2015Assignee: Apple Inc.Inventors: Fang Liu, Honkai (John) Tam

Publication number: 20150058389Abstract: Techniques are disclosed relating to performing extended multiplies without a carry flag. In one embodiment, an apparatus includes a multiply unit configured to perform multiplications of operands having a particular width. In this embodiment, the apparatus also includes multiple storage elements configured to store operands for the multiply unit. In this embodiment, each of the storage elements is configured to provide a portion of a stored operand that is less than an entirety of the stored operand in response to a control signal from the apparatus. In one embodiment, the apparatus is configured to perform a multiplication of given first and second operands having a width greater than the particular width by performing a sequence of multiply operations using the multiply unit, using portions of the stored operands and without using a carry flag between any of the sequence of multiply operations.Type: ApplicationFiled: August 20, 2013Publication date: February 26, 2015Applicant: Apple Inc.Inventors: James S. Blomgren, Terence M. Potter

Publication number: 20150039662Abstract: A fused floatingpoint multiplyadd element includes a multiplier that generates a product, and a shifter that shifts an addend within a narrow range. Interpreting logic analyzes the magnitude of the addend relative to the product and then causes logic arrays to position the shifted addend within the left, center, or right portions of a composite register depending in the magnitude of the addend relative to the product. The interpreting logic also forces other portions of the composite register to zero. When the addend is zero, the interpreting logic forces all portions of the composite register to zero. Final combining logic then adds the contents of the composite register to the product.Type: ApplicationFiled: August 5, 2013Publication date: February 5, 2015Applicant: NVIDIA CORPORATIONInventors: Srinivasan IYER, David Conrad TANNENBAUM, Stuart F. OBERMAN, Ming (Michael) Y. SIU