Shifting Patents (Class 708/209)

Patent number: 11494331Abstract: A representative reconfigurable processing circuit and a reconfigurable arithmetic circuit are disclosed, each of which may include input reordering queues; a multiplier shifter and combiner network coupled to the input reordering queues; an accumulator circuit; and a control logic circuit, along with a processor and various interconnection networks. A representative reconfigurable arithmetic circuit has a plurality of operating modes, such as floating point and integer arithmetic modes, logical manipulation modes, Boolean logic, shift, rotate, conditional operations, and format conversion, and is configurable for a wide variety of multiplication modes. Dedicated routing connecting multiplier adder trees allows multiple reconfigurable arithmetic circuits to be reconfigurably combined, in pair or quad configurations, for larger adders, complex multiplies and general sum of products use, for example.Type: GrantFiled: September 9, 2020Date of Patent: November 8, 2022Assignee: Cornami, Inc.Inventors: Paul L. Master, Steven K. Knapp, Raymond J. Andraka, Alexei Beliaev, Martin A. Franz, Rene Meessen, Frederick Curtis Furtek

Patent number: 11288040Abstract: Systems, apparatuses and methods may provide for technology that conduct a first alignment between a plurality of floatingpoint numbers based on a first subset of exponent bits. The technology may also conduct, at least partially in parallel with the first alignment, a second alignment between the plurality of floatingpoint numbers based on a second subset of exponent bits, where the first subset of exponent bits are LSBs and the second subset of exponent bits are MSBs. In one example, technology adds the aligned plurality of floatingpoint numbers to one another. With regard to the second alignment, the technology may also identify individual exponents of a plurality of floatingpoint numbers, identify a maximum exponent across the individual exponents, and conduct a subtraction of the individual exponents from the maximum exponent, where the subtraction is conducted from MSB to LSB.Type: GrantFiled: June 7, 2019Date of Patent: March 29, 2022Assignee: Intel CorporationInventors: Himanshu Kaul, Mark Anders

Patent number: 11244718Abstract: A system comprises: a threedimensional flash memory comprising a plurality of cells; and a controller coupled to the threedimensional flash memory, configured to: select a block of cells in the threedimensional flash memory; perform a matrix multiplication on the matrix stored in the block of cells, including performing a vector multiplication in a single sensing step; and output a matrix multiplication result. A matrix is stored in the block of cells.Type: GrantFiled: September 8, 2020Date of Patent: February 8, 2022Inventors: Fei Xue, Dimin Niu, Shuangchen Li, Hongzhong Zheng

Patent number: 11209563Abstract: A data optimization method and an integral prestack depth migration method are provided, including acquiring a target matrix to be optimized; generating a first sequence according to the target matrix; rarefying the first sequence according to a preset grid density to obtain a value position of each element of a second sequence, and working out a value of each element of the second sequence on the basis of the principle of least squares; performing interpolation on the second sequence to obtain a third sequence; calculating a target matrix corresponding to the third sequence; calculating an error between the target matrix to be optimized and the target matrix corresponding to the third sequence; recording, when the error is less than the first error threshold, the target matrix corresponding to the above second sequence as an optimized target matrix of the target matrix to be optimized.Type: GrantFiled: August 30, 2019Date of Patent: December 28, 2021Assignee: INSTITUTE OF GEOLOGY AND GEOPHYSICSInventors: Linong Liu, Hongwei Gao, Wei Liu, Jianfeng Zhang

Patent number: 11106438Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate highlevel user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.Type: GrantFiled: March 27, 2020Date of Patent: August 31, 2021Assignee: INTEL CORPORATIONInventors: Dounia Khaldi, Rakesh Krishnaiyer, Rajiv Deodhar, Daniel Woodworth, Joshua Cranmer, Kent Glossop

Patent number: 11086624Abstract: In a processor supporting execution of a plurality of functions of an instruction, an instruction blocking value is set for blocking one or more of the plurality of functions, such that an attempt to execute one of the blocked functions, will result in a program exception and the instruction will not execute, however the same instruction will be able to execute any of the functions that are not blocked functions.Type: GrantFiled: August 13, 2019Date of Patent: August 10, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Dan Greiner, Damian Osisek, Timothy Slegel, Lisa Cranton Heller

Patent number: 11029921Abstract: Performing processing using hardware counters in a computer system includes storing, in association with greatest common divisor (GCD) processing of the system, a first variable in a first redundant binary representation and a second variable in a second redundant binary representation. Each such redundant binary representation includes a respective sum term and a respective carry term, and a numerical value being represented by a redundant binary representation is equal to a sum of the sum and carry terms of the redundant binary representation. The process performs redundant arithmetic operations of the GCD processing on the first variable and second variables using hardware counter(s), of the computer system, that take input values in redundant binary representation form and provide output values in redundant binary representation form. The process uses output of the redundant arithmetic operations of the GCD processing to obtain an output GCD of integer inputs to the GCD processing.Type: GrantFiled: February 14, 2019Date of Patent: June 8, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric M. Schwarz, Silvia M. Mueller, Ulrich Mayer

Patent number: 11029958Abstract: Systems, methods, and apparatuses relating to configurable operand size operation circuitry in an operation configurable spatial accelerator are described.Type: GrantFiled: December 28, 2019Date of Patent: June 8, 2021Assignee: Intel CorporationInventors: Chuanjun Zhang, Kermin E. Chofleming

Patent number: 11010159Abstract: Apparatus comprises counter and bitshift circuitry to provide a succession of processing stages each comprising a count operation stage and a corresponding bitshift stage, each processing stage operating with respect to a set of contiguous nbit groups of bit positions, where n is 1 for a first processing stage and n doubles from one processing stage in the succession of processing stages to a next processing stage in the succession of processing stages; each count operation stage being configured to generate, for a first set of alternate instances of the nbit groups of bit positions, count values indicating a respective number of bits of a predetermined bit value in a mask data word; and each bitshift stage being configured to generate a bitshifted data word by bitshifting bits of a data word to be processed, for a second set of alternate instances of the nbit groups of bit positions complementary to the first set, by respective numbers of bit positions dependent upon the count values generated by theType: GrantFiled: August 31, 2018Date of Patent: May 18, 2021Assignee: ARM LIMITEDInventors: Xiaoyang Shen, Cedric Denis Robert Airaud, Luca Nassi, Damien Robin Martin

Patent number: 11002836Abstract: A time of flight sensor device is capable of generating accurate propagation time information for emitted light pulses using a small number of measurement cycles by using multiple measuring capacitors to capture more return pulse information per pulse period. To mitigate the effects of mismatched measuring capacitors and reading paths, embodiments of the time of flight sensor device perform multiple measuring sequences per measurement operation, permutating the roles of the measuring capacitors for each of the measuring sequences. The data collected by the measuring capacitors for the multiple measuring sequences is then aggregated and used to compute the propagation time and corresponding distance. This technique mitigate yields accurate measurements despite mismatches between reading paths and measuring capacitors without the need to implement pixellevel calibration and compensation, thereby saving calibration time, memory space, and computing time.Type: GrantFiled: May 14, 2018Date of Patent: May 11, 2021Assignee: Rockwell Automation Technologies, Inc.Inventor: Frederic Boutaud

Patent number: 10931497Abstract: A user equipment comprises receiving circuitry configured to receive bit map information indicating time domain positions, within a measurement window, of synchronization signal block(s) (SSB(s)) used for an intra and/or an interfrequency measurement, the SSB(s) comprising at least a primary synchronization signal (PPS), a secondary synchronization signal (SSS), and a physical broadcast channel (PBCH), wherein the bitmap information comprises a bit string, and different lengths of the bit string are defined for different frequency bands.Type: GrantFiled: May 3, 2018Date of Patent: February 23, 2021Assignees: SHARP KABUSHIKI KAISHA, FG Innovation Company LimitedInventors: Jia Sheng, Tatsushi Aiba, Toshizo Nogami

Patent number: 10877729Abstract: Systems and methods that provide reconfigurable shifter configurations supporting multiple instruction, multiple data (MIMD) are described. Shifters implemented according to embodiments support multiple data shifts with respect to an instance of data shifting, wherein multiple individual different data shifts are implemented at a time in parallel. Reconfigurable segmented scalable shifters of embodiments, in addition being reconfigurable for scalability in supporting data shifting with respect to various bit lengths of data, are configured to support data shifting of differing bit lengths in parallel. The data shifters of embodiments implement segmentation for facilitating data shifting with respect to differing bit lengths. Different data shift commands may be provided with respect to each such segment, thereby facilitating multiple data shifts in parallel with respect to various bit lengths of data.Type: GrantFiled: January 31, 2019Date of Patent: December 29, 2020Assignee: Hong Kong Applied Science and Technology Research Institute Co., Ltd.Inventors: HingMo Lam, ManWai Kwan, ChingHong Leung, KongChau Tsang

Patent number: 10867580Abstract: A data segmenter is configured to determine indices using numbers of most significant bits (MSBs) of fractional values of floatingpoint representations of component values of an input color that are selected based on exponent values of the floatingpoint representations. The component values are defined according to a source gamut. The data segmenter is also configured to determine offsets associated with the indices using subsets of the fractional values. An interpolator configured to map the input color to an output color defined according to a destination gamut based on a location in a threedimensional (3D) look up table (LUT) indicated by the indices and offsets.Type: GrantFiled: January 24, 2019Date of Patent: December 15, 2020Assignee: ATI TECHNOLOGIES ULCInventor: Yuxin Chen

Patent number: 10869336Abstract: Random access channel access and validity procedures are disclosed. In one aspect, the medium access control (MAC) indications multiple random access occasions (ROs) to user equipments (UEs) for random access transmissions. In such aspect, random access failure would only be declared if listen before talk (LBT) procedures for the random access transmission fail on all of the ROs indicated by the MAC layer. Similarly, in additional aspects, a UE will not apply a backoff value for any LBT failures for random access attempts that occur within an LBT time window. In further aspects, a UE may determine the validity of ROs that overlap with a discovery reference signal measurement timing configuration (DMTC) window. In such aspects, the UE may not use overlapping ROs or may determine a portion of the DMTC window that is not used for base station transmissions and declare the overlapping ROs with the unused portion valid.Type: GrantFiled: February 13, 2020Date of Patent: December 15, 2020Assignee: QUALCOMM IncorporatedInventors: Pravjyot Singh Deogun, Xiaoxia Zhang, Ozcan Ozturk, Jing Sun, Kapil Bhattad, Ananta Narayanan Thyagarajan

Patent number: 10817802Abstract: An architecture and associated techniques of an apparatus for hardware accelerated machine learning are disclosed. The architecture features multiple memory banks storing tensor data. The tensor data may be concurrently fetched by a number of execution units working in parallel. Each operational unit supports an instruction set specific to certain primitive operations for machine learning. An instruction decoder is employed to decode a machine learning instruction and reveal one or more of the primitive operations to be performed by the execution units, as well as the memory addresses of the operands of the primitive operations as stored in the memory banks. The primitive operations, upon performed or executed by the execution units, may generate some output that can be saved into the memory banks. The fetching of the operands and the saving of the output may involve permutation and duplication of the data elements involved.Type: GrantFiled: May 5, 2017Date of Patent: October 27, 2020Assignee: Intel CorporationInventors: Jeremy Bruestle, Choong Ng

Patent number: 10691410Abstract: A method including receiving, by a processor, a computing instruction for a neural network, wherein the computing instruction for the neural network includes a computing rule for the neural network and a connection weight of the neural network, and the connection weight is a power of 2; and inputting, for a multiplication operation in the computing rule for the neural network, a source operand corresponding to the multiplication operation to a shift register, and performing a shift operation based on a connection weight corresponding to the multiplication operation, wherein the shift register outputs a target result operand as a result of the multiplication operation. The neural network uses a shift operation, and a neural network computing speed is increased.Type: GrantFiled: July 18, 2018Date of Patent: June 23, 2020Assignee: Alibaba Group Holding LimitedInventors: Cong Leng, Hao Li, Zesheng Dou, Shenghuo Zhu, Rong Jin

Patent number: 10656914Abstract: Instructions for 32bit arithmetic support using 16bit multiply and 32bit addition without a barrel shifter. Illustrative instructions include operations that include receiving a first 32bit operand, receiving a second 32bit operand, shifting the second 32bit operand right 16 or 15 bits to obtain a shifted second 32bit operand, and adding the shifted second 32bit operand and the first 32bit operand to generate a 32bit sum.Type: GrantFiled: August 20, 2019Date of Patent: May 19, 2020Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Srinivas Lingam, SeokJun Lee, Manish Goel

Patent number: 10606587Abstract: The present disclosure includes apparatuses and methods related to microcode instructions. One example apparatus comprises a memory storing a set of microcode instructions. Each microcode instruction of the set can comprise a first field comprising a number of control data units, and a second field comprising a number of type select data units. Each microcode instruction of the set can have a particular instruction type defined by a value of the number of type select data units, and particular functions corresponding to the number of control data units are variable based on the particular instruction type.Type: GrantFiled: August 24, 2016Date of Patent: March 31, 2020Assignee: Micron Technology, Inc.Inventors: Shawn Rosti, Timothy P. Finkbeiner

Patent number: 10592583Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: February 25, 2019Date of Patent: March 17, 2020Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark

Patent number: 10592247Abstract: An arithmetic circuit comprises first to Nth, N being an integer equal to or larger than two, element circuits respectively including: input circuits which input first operand data and second operand data; and element data selectors which select operand data of any one of the element circuits on the basis of a request element signal; and a data bus which supplies the operand data from the input circuits to the element data selectors. When a control signal is in a first state, the element data selectors select, on the basis of the request element signal included in the second operand data, the first operand data of any of the element circuits and output the first operand data.Type: GrantFiled: August 24, 2015Date of Patent: March 17, 2020Assignee: FUJITSU LIMITEDInventor: Tomonori Tanaka

Patent number: 10579379Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machinereadable medium storing such an instruction are also disclosed.Type: GrantFiled: December 12, 2014Date of Patent: March 3, 2020Assignee: INTEL CORPORATIONInventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, Elmoustapha OuldAhmedVall, Vinodh Gopal

Patent number: 10579380Abstract: Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machinereadable medium storing such an instruction are also disclosed.Type: GrantFiled: December 12, 2014Date of Patent: March 3, 2020Assignee: INTEL CORPORATIONInventors: Maxim Loktyukhin, Eric W Mahurin, Bret L Toll, Martin G Dixon, Sean P Mirkes, David L Kreitzer, Elmoustapha OuldAhmedVall, Vinodh Gopal

Patent number: 10579334Abstract: A system for block floating point computation in a neural network receives a plurality of floating point numbers. An exponent value for an exponent portion of each floating point number of the plurality of floating point numbers is identified and mantissa portions of the floating point numbers are grouped. A shared exponent value of the grouped mantissa portions is selected according to the identified exponent values and then removed from the grouped mantissa portions to define multitiered shared exponent block floating point numbers. One or more dot product operations are performed on the grouped mantissa portions of the multitiered shared exponent block floating point numbers to obtain individual results. The individual results are shifted to generate a final dot product value, which is used to implement the neural network. The shared exponent block floating point computations reduce processing time with less reduction in system accuracy.Type: GrantFiled: May 8, 2018Date of Patent: March 3, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Daniel Lo, Eric Sen Chung

Patent number: 10572222Abstract: Integrated circuits with specialized processing blocks are provided. The specialized processing blocks may include floatingpoint multiplier circuits that can be configured to support variable precision. A multiplier circuit may include a first carrypropagate adder (CPA), a second carrypropagate adder (CPA), and an associated rounding circuit. The first CPA may be wide enough to handle the required precision of the mantissa. In a bridged mode, the first CPA may borrow an additional bit from the second CPA while the rounding circuit will monitor the appropriate bits to select the proper multiplier output. A parallel prefix tree operable in a nonbridged mode or the bridged mode may be used to compute multiple multiplier outputs. The multiplier circuit may also include exponent and exception handling circuitry using various masks corresponding to the desired precision width.Type: GrantFiled: June 25, 2019Date of Patent: February 25, 2020Assignee: Altera CorporationInventor: Martin Langhammer

Patent number: 10524292Abstract: Embodiments provide a method for generating a random access channel ZC sequence, and an apparatus. A method for generating a random access channel ZC sequence includes: generating, by a base station, notification signaling, where the notification signaling instructs user equipment UE to generate a random access ZC sequence by using a second restricted set in a random access set; and sending, by the base station, the notification signaling to the UE, so that the UE generates the random access ZC sequence by using the second restricted set, where the random access set includes an unrestricted set, a first restricted set, and the second restricted set; and the second restricted set is a random access set that the UE needs to use when a Doppler frequency shift of the UE is greater than or equal to a first predetermined value.Type: GrantFiled: December 2, 2016Date of Patent: December 31, 2019Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Qiang Wu, Zhiheng Guo, Jianqin Liu, Jianghua Liu, Leiming Zhang

Patent number: 10512107Abstract: Provided is a terminal, for example, user equipment (UE), including a processor and that performs a random access (RA) procedure with a base station, for example, for example, eNodeB, EUTRAN Node B, or also known as Evolved Node B, and is at least temporarily embodied by the processor. The terminal may be at least temporarily embodied by the processor. The terminal may include a generator configured to generate a preamble sequence using a first sequence corresponding to a first root index based on a preamble index that is randomly selected, and a determiner configured to determine a second root index using the preamble index as an input value of a root index function. Further, the generator may be configured to generate a tag sequence using a second sequence corresponding to the second root index based on a tag index that is randomly selected.Type: GrantFiled: December 12, 2016Date of Patent: December 17, 2019Assignee: KOREA ADVANCED INSTITUTE OF SCIENCE AND TECHNOLOGYInventors: Dan Keun Sung, Hong Shik Park, Han Seung Jang

Patent number: 10503474Abstract: Instructions for 32bit arithmetic support using 16bit multiply and 32bit addition without a barrel shifter. Illustrative instructions include operations that include receiving a first 32bit operand, receiving a second 32bit operand, shifting the second 32bit operand right 16 or 15 bits to obtain a shifted second 32bit operand, and adding the shifted second 32bit operand and the first 32bit operand to generate a 32bit sum.Type: GrantFiled: December 31, 2015Date of Patent: December 10, 2019Assignee: TEXAS INSTRUMENTS INCORPORATEDInventors: Srinivas Lingam, SeokJun Lee, Manish Goel

Patent number: 10496403Abstract: An apparatus and method for performing rightshifting operations on packed quadword data.Type: GrantFiled: December 21, 2017Date of Patent: December 3, 2019Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha OuldAhmedVall, Robert Valentine, Mark Charney

Patent number: 10481910Abstract: An apparatus and method for performing rightshifting operations on packed quadword data.Type: GrantFiled: September 29, 2017Date of Patent: November 19, 2019Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha OuldAhmedVall, Jesus Corbal, Mark Charney, Robert Valentine, Binwei Yang

Patent number: 10459961Abstract: A system for segmenting an input data stream using vector processing, comprising a processor adapted to repeat the following steps throughout an input data stream to create a segmented data stream consisting a plurality of segments: apply a rolling sequence over a sequence of consecutive data items of an input data stream, the rolling sequence includes a subset of consecutive data items of the sequence, calculate concurrently a plurality of partial hash values each by one of a plurality of processing pipelines of the processor, each for a respective one of a plurality of partial rolling sequences each including evenly spaced data items of the subset, determine compliance of each of the plurality of partial hash values with one or more respective partial segmentation criteria and designate the sequence as a variable size segment when at least some of the partial hash values comply with the respective partial segmentation criteria.Type: GrantFiled: August 2, 2017Date of Patent: October 29, 2019Assignee: Huawei Technologies Co., Ltd.Inventors: Yehonatan David, Yair Toaff, Michael Hirsch

Patent number: 10409592Abstract: An apparatus has processing circuitry comprising an L×M multiplier array. An instruction decoder associated with the processing circuitry supports a multiplyandaccumulateproduct (MAP) instruction for generating at least one result element corresponding to a sum of respective E×F products of Ebit and Fbit portions of Jbit and Kbit operands respectively, where 1<E<J?L and 1<F<K?M. In response to the MAP instruction, the instruction decoder controls the processing circuitry to rearrange Fbit portions of the second Kbit operand to form a transformed Kbit operand, and to control the L×M multiplier array in dependence on the first Jbit operand and the transformed Kbit operand to add the respective E×F products using a subset of the adders used for accumulating partial products for a conventional multiplication.Type: GrantFiled: April 24, 2017Date of Patent: September 10, 2019Assignee: ARM LimitedInventors: Neil Burgess, David Raymond Lutz, Javier Diaz Bruguera

Patent number: 10366741Abstract: Circuitry comprises: a set of bit processing circuitries to apply two or more successive instances of bitwise processing to an ordered bit array; each bit processing circuitry for a given bit position within the ordered bit array comprising: bit shifting circuitry to selectively apply a bit shift of a respective input bit to a next bit processing circuitry in a first direction relative to the ordered bit array, in response to an active state of a bit shift control signal, the bit shifting circuitry not applying the bit shift in response to an inactive state of the bit shift control signal; and bit shift control circuitry to selectively allow or inhibit a bit shifting operation in response to one or more inhibit control signals; in which: the bit shift control circuitry is configured to selectively propagate an output inhibit control signal, indicating that a bit shifting operation should be inhibited, as an inhibit control signal to bit processing circuitry applying a next instance of the bitwise processing aType: GrantFiled: September 21, 2017Date of Patent: July 30, 2019Assignee: ARM LimitedInventors: Neil Burgess, Nigel John Stephens, Lee Evan Eisen, Jaime Ferragut MartinezVara De Rey

Patent number: 10318298Abstract: An apparatus and method for performing leftshifting operations on packed quadword data.Type: GrantFiled: September 29, 2017Date of Patent: June 11, 2019Assignee: Intel CorporationInventors: Venkateswara Madduri, Elmoustapha OuldAhmedVall, Robert Valentine, Mark Charney, Jesus Corbal

Patent number: 10296333Abstract: Vector single instruction multiple data (SIMD) shift and rotate instructions are provided specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, and a second vector register. Vector data fields of a first element size are duplicated. Duplicate vector data fields are stored as corresponding data fields of twice the first element size. Control logic receives an element size for performing a SIMD shift or rotation operation. Through selectors corresponding to a vector element, portions are selected from the duplicated data fields, the selectors corresponding to any particular vector element select all portions similarly from the duplicated data fields for that particular vector element responsive to the first element size, but selectors corresponding to any particular vector element select at least two portions from the duplicated data fields differently for that particular vector element responsive to a second element size.Type: GrantFiled: December 27, 2016Date of Patent: May 21, 2019Assignee: Intel CorporationInventors: Asaf Rubinstein, Tom Aviram

Patent number: 10289382Abstract: An apparatus for mathematical manipulation is described allowing the selective combination of shifters to shift binary numbers of various widths. Selective combination allows onthefly adjustment of shifters from independent to coordinated shifting operations. Selective combination allows adjustable hardwarebased shifting while saving space and resources. Multiple eightbit shifters can be configured for a variety of operand widths, such as a 32bit width, a 24bit width, a 16bit width, or an eightbit width. Multiplexers route the appropriate input data to the appropriate shifters. Bidirectional shifting is configured through a selector tree, including both shift left and shift right operations. Opcodes configure the shifters for the desired type of shift and a shifted result is generated.Type: GrantFiled: March 30, 2018Date of Patent: May 14, 2019Assignee: Wave Computing, Inc.Inventor: Samit Chaudhuri

Patent number: 10216705Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: April 30, 2018Date of Patent: February 26, 2019Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark

Patent number: 10162633Abstract: An apparatus has processing circuitry comprising multiplier circuitry for performing multiplication on a pair of input operands. In response to a shift instruction specifying at least one shift amount and a source operand comprising at least one data element, the source operand and a shift operand determined in dependence on the shift amount are provided as input operands to the multiplier circuitry and the multiplier circuitry is controlled to perform at least one multiplication which is equivalent to shifting a corresponding data element of the source operand by a number of bits specified by a corresponding shift amount to generate a shift result value.Type: GrantFiled: April 24, 2017Date of Patent: December 25, 2018Assignee: ARM LimitedInventors: François Christopher Jacques Botman, Thomas Christopher Grocutt

Patent number: 10126976Abstract: An address and a data size are provided to a rotator. The rotator stores, based on the address and the data size, a data element in a location having a defined number of positions. The data element includes one or more data units and the one or more data units are aligned correctly in one or more positions of the location based on a predefined position in the location to receive a selected data unit of the one or more data units. The rotator replicates a value of a chosen data unit of the one or more data units to one or more other positions of the location.Type: GrantFiled: February 17, 2017Date of Patent: November 13, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind

Patent number: 10108397Abstract: Embodiments of the inventive concept include a fast close path solution and circuit of a three path fused multiplyadder circuit. The fast close path circuit can include one or more compressors that can receive multiple operands and produce a result sum and a result carry. The close path circuit can include one or more leading zero anticipators (LZAs). The one or more LZAs can receive and process the result sum and the result carry. The close path circuit can include one or more adders. The one or more adders can receive and add the result sum and the result carry in parallel with the one or more LZAs processing the result sum and the result carry. Since the close path is the critical timing path, by performing the addition operations in parallel with the LZA and/or priority encode (PENC) operations, the logic depth and latency of the close path are reduced.Type: GrantFiled: February 1, 2016Date of Patent: October 23, 2018Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventor: Ashraf Ahmed

Patent number: 10013258Abstract: Embodiments are directed to a method of adjusting an index, wherein the index identifies a location of an element within an array. The method includes executing, by a computer, a single instruction that adjusts a first parameter of the index to match a parameter of an array address. The single instruction further adjusts a second parameter of the index to match a parameter of the array element. The adjustment of the first parameter includes a sign extension.Type: GrantFiled: September 29, 2014Date of Patent: July 3, 2018Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Michael K. Gschwind

Patent number: 9959094Abstract: An arithmetic apparatus comprises a plurality of cascadeconnected arithmetic units. Each of the plurality of arithmetic units comprises: a calculator configured to operate in one of a rotation mode of performing a rotation calculation, and a vectoring mode of calculating a rotation angle; and a holding unit configured to hold rotational direction information output from the calculator in the vectoring mode. In addition, when operating in the rotation mode, the calculator performs the rotation calculation on data input from an arithmetic unit in a preceding stage, based on the rotational direction information held in the holding unit.Type: GrantFiled: June 3, 2015Date of Patent: May 1, 2018Assignee: CANON KABUSHIKI KAISHAInventors: Tadayoshi Nakayama, Koki Mitsunami

Patent number: 9959247Abstract: A circuit comprises an input register configured to receive an input vector of elements, a control register configured to receive a control vector of elements, wherein each element of the control vector corresponds to a respective element of the input vector, and wherein each element specifies a permutation of a corresponding element of the input vector, and a permute execution circuit configured to generate an output vector of elements corresponding to a permutation of the input vector. Generating each element of the output vector comprises accessing, at the input register, a particular element of the input vector, accessing, at the control register, a particular element of the control vector corresponding to the particular element of the input vector, and outputting the particular element of the input vector as an element at a particular position of the output vector that is selected based on the particular element of the control vector.Type: GrantFiled: April 25, 2017Date of Patent: May 1, 2018Assignee: Google LLCInventors: Dong Hyuk Woo, Gregory Michael Thorson, Andrew Everett Phelps, Olivier Temam, Jonathan Ross, Christopher Aaron Clark

Patent number: 9933998Abstract: In a novel computation device, a plurality of partial product generators is communicatively coupled to a binary number multiplier. The binary number is partitioned in the computation device into nonoverlapping subsets of binary bits and each subset is coupled to one of the plurality of partial product generators. Each partial product generator, upon receiving a subset of binary bits representing a number, generates a multiplication product of the number and a predetermined constant. The multiplication products from all partial product generators are summed to generate the final product between the predetermined constant and the binary number. The partial product generators are constructed by logic gates and wires connected the logic gates including a AND gate. The partial product generators are free of memory elements.Type: GrantFiled: February 6, 2017Date of Patent: April 3, 2018Inventors: KuoTseng Tseng, Parkson Wong

Patent number: 9933996Abstract: An apparatus for mathematical manipulation is described allowing the selective combination of shifters to shift binary numbers of various widths. Selective combination allows onthefly adjustment of shifters from independent to coordinated shifting operations. Selective combination allows adjustable hardwarebased shifting while saving space and resources. Multiple eightbit shifters can be configured for a variety of operand widths, such as a 32bit width, a 24bit width, a 16bit width, or an eightbit width. Multiplexers route the appropriate input data to the appropriate shifters. Opcodes configure the shifters for the desired type of shift and a shifted result is generated.Type: GrantFiled: December 20, 2013Date of Patent: April 3, 2018Assignee: Wave Computing, Inc.Inventor: Samit Chaudhuri

Patent number: 9904511Abstract: An improved shifter design for highspeed data processors is described. The shifter may include a first stage, in which the input bits are shifted by increments of N bits where N>1, followed by a second stage, in which all bits are shifted by a residual amount. A preshift may be removed from an input to the shifter and replaced by a shift adder at the second stage to further increase the speed of the shifter.Type: GrantFiled: February 6, 2015Date of Patent: February 27, 2018Assignee: Cavium, Inc.Inventors: Nitin Mohan, Ilan Pragaspathy

Patent number: 9841979Abstract: A method and corresponding apparatus for processing a shuffle instruction are provided. Shuffle units are configured in a hierarchical structure, and each of the shuffle units generates a shuffled data element array by performing shuffling on an input data element array. In the hierarchical structure, which includes an upper shuffle unit and a lower shuffle unit, the shuffled data element array output from the lower shuffle unit is input to the upper shuffle unit as a portion of the input data element array for the upper shuffle unit.Type: GrantFiled: July 14, 2014Date of Patent: December 12, 2017Assignee: Samsung Electronics Co., Ltd.Inventors: Keshava Prasad, Navneet Basutkar, Young Hwan Park, Ho Yang, Yeon Bok Lee

Patent number: 9805819Abstract: Described herein are techniques, systems, and circuits for addressing image data according to blocks. For example, in some cases, the address space may be divided into high order address bits and low order address bits. In these cases, an address circuit may twist an address space by shifting the high order bits and low order bits of an address in a rightward direction, shifting the low order bits of the address in a leftward direction, and shifting the high order bits and the low order bits of the address in the leftward direction. The circuit may modify the address value and untwist the address space. For example, the untwisting may include shifting the high order bits and the low order bits of an address in the rightward direction, shifting the low order bits of the address in the rightward direction, and shifting the high order bits and the low order bits of the address in the leftward direction.Type: GrantFiled: October 14, 2016Date of Patent: October 31, 2017Assignee: Amazon Technologies, Inc.Inventor: Carl Ryan Kelso

Patent number: 9762365Abstract: It is possible to provide a radio communication terminal device and a radio transmission method which can improve reception performance of a CQI and a reference signal. A phase table storage unit stores a phase table which correlates the amount of cyclic shift to complex coefficients {w1, w2} to be multiplied on the reference signal. A complex coefficient multiplication unit reads out a complex coefficient corresponding to the amount of cyclic shift indicated by resource allocation information, from the phase table storage unit and multiplies the readout complex coefficient on the reference signal so as to change the phase relationship between the reference signals in a slot.Type: GrantFiled: December 12, 2016Date of Patent: September 12, 2017Assignee: Sun Patent TrustInventors: Tomofumi Takata, Daichi Imamura, Seigo Nakao, Sadaki Futagi, Takashi Iwai, Yoshihiko Ogawa

Patent number: 9665346Abstract: Mechanisms are provided for performing a floating point arithmetic operation in a data processing system. A plurality of floating point operands of the floating point arithmetic operation are received and bits in a mantissa of at least one floating point operand of the plurality of floating point operands are shifted. One or more bits of the mantissa that are shifted outside a range of bits of the mantissa of at least one floating point operand are stored and a vector value is generated based on the stored one or more bits of the mantissa that are shifted outside of the range of bits of the mantissa of the at least one floating point operand. A resultant value is generated for the floating point arithmetic operation based on the vector value and the plurality of floating point operands.Type: GrantFiled: October 28, 2014Date of Patent: May 30, 2017Assignee: International Business Machines CorporationInventors: John B. Carter, Bruce G. Mealey, Karthick Rajamani, Eric E. Retter, Jeffrey A. Stuecheli

Patent number: 9658986Abstract: A data processing apparatus includes a computing unit that performs a matrix computation between data streams whose unit data is of a matrix format; a determining unit that for each matrix obtained by the matrix computation by the computing unit, determines based on the value of each element included in the matrix, an exponent value for expressing each element included in the matrix as a floating decimal point value; a converting unit that converts the value of each element into a significand value of the element, according to the exponent value determined by the determining unit; and an output unit that correlates and outputs the exponent value and each matrix after conversion in which the value of each element in the matrix has been converted by the converting unit.Type: GrantFiled: January 29, 2014Date of Patent: May 23, 2017Assignee: FUJITSU LIMITEDInventors: Yi Ge, Noboru Kobayashi, Hiroshi Hatano, Yasuhiro Oyama