Matrix Array Patents (Class 708/520)

Patent number: 11687616Abstract: An arithmetic processing apparatus includes a memory and a processor. The processor coupled to memory and configured to determine an individual not to be evolved to an individual of a second generation from among a plurality of individuals in a first generation based on a predetermined reference for calculation completion of fitness calculation for each of the plurality of individuals, the second generation being a generation next to the first generation, and determine to cause the determined individual to evolve to an individual of a generation next or subsequent to the second generation.Type: GrantFiled: November 6, 2020Date of Patent: June 27, 2023Assignee: FUJITSU LIMITEDInventors: Yukito Tsunoda, Teruo Ishihara

Patent number: 11609762Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.Type: GrantFiled: August 10, 2021Date of Patent: March 21, 2023Assignee: Intel CorporationInventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha OuldAhmedVall, Menachem Adelman

Patent number: 11556852Abstract: A computerimplemented method for determining a set of target items to be annotated for training a machine learning application. The method comprises providing a training data set with a set of data samples and an autoencoder with a classifier. The autoencoder comprises an embedding model that maps the set of data samples to a set of compressed feature vectors. The set of compressed feature vectors define a compressed feature matrix. Further provided are: a definition of a graph associated to the compressed feature matrix, applying a clusteringalgorithm to identify node clusters of the graph and applying a centrality algorithm to identify central nodes of the node clusters, retrieving from an annotator node labels for the central nodes, propagating the annotated node labels to other nodes of the graph and performing a training of the embedding model and the classifier with the annotated and the propagated node labels.Type: GrantFiled: March 6, 2020Date of Patent: January 17, 2023Assignee: International Business Machines CorporationInventors: Peter Willem Jan Staar, Michele Dolfi, Christoph Auer, Leonidas Georgopoulos, Ralf Kaestner, Alexander Velizhev, Dal Noguer Hidalgo, Rita Kuznetsova, Konstantinos Bekas

Patent number: 11550872Abstract: Quantum computing systems and methods are provided. In one example, a quantum computing system includes a quantum system having one or more quantum system qubits and one or more ancilla qubits. The quantum computing system includes one or more quantum gates implemented by the quantum computing system. The quantum gate(s) are operable to configure the one or more ancilla qubits into a known state. The quantum computing system includes a quantum measurement circuit operable to perform a plurality of measurements on the one or more quantum system qubits using the one or more ancilla qubits. The quantum computing system includes one or more processors operable to determine a reduced density matrix for a subset of the quantum system based on a set of the plurality of measurements that include a number of repeated measurements performed using the quantum measurement circuit.Type: GrantFiled: October 15, 2020Date of Patent: January 10, 2023Assignee: GOOGLE LLCInventor: Zhang Jiang

Patent number: 11520854Abstract: A first group of elements is elementwise multiplied with a second group of elements using a plurality of multipliers belonging to a matrix multiplication hardware unit. Results of the plurality of multipliers are added together using a hierarchical tree of adders belonging to the matrix multiplication hardware unit and a final result of the hierarchical tree of adders or any of a plurality of intermediate results of the hierarchical tree of adders is selectively provided for use in determining an output result matrix.Type: GrantFiled: October 29, 2019Date of Patent: December 6, 2022Assignee: Meta Platforms, Inc.Inventors: Yuchen Hao, Krishnakumar Narayanan Nair, Ehsan Khish Ardestani Zadeh, Rakesh Komuravelli, Abdulkadir Utku Diril, Thomas Mark Ulrich

Patent number: 11520855Abstract: A computerimplemented method is presented for performing matrix sketching by employing an analog crossbar architecture. The method includes low rank updating a first matrix for a first period of time, copying the first matrix into a dynamic correction computing device, switching to a second matrix to low rank update the second matrix for a second period of time, as the second matrix is low rank updated, feeding the first matrix with first stochastic pulses to reset the first matrix back to a first matrix symmetry point, copying the second matrix into the dynamic correction computing device, switching back to the first matrix to low rank update the first matrix for a third period of time, and as the first matrix is low rank updated, feeding the second matrix with second stochastic pulses to reset the second matrix back to a second matrix symmetry point.Type: GrantFiled: May 15, 2020Date of Patent: December 6, 2022Assignees: INTERNATIONAL BUSINESS MACHINES CORPORTATION, RAMOT AT TELAVIV UNIVERSITY, LTD.Inventors: Lior Horesh, Oguzhan Murat Onen, Haim Avron, Tayfun Gokmen, Vasileios Kalantzis, Shashanka Ubaru

Patent number: 11442709Abstract: A method for compiling and executing a nested loop includes initializing a nested loop controller with an outer loop count value and an inner loop count value. The nested loop controller includes a predicate FIFO. The method also includes coalescing the nested loop and, during execution of the coalesced nested loop, causing the nested loop controller to populate the predicate FIFO and executing a get predicate instruction having an offset value, where the get predicate returns a value from the predicate FIFO specified by the offset value. The method further includes predicating an outer loop instruction on the returned value from the predicate FIFO.Type: GrantFiled: August 3, 2020Date of Patent: September 13, 2022Assignee: Texas Instmments IncorporatedInventors: Kai Chirca, Timothy D. Anderson, Todd T. Hahn, Alan L. Davis

Patent number: 11435941Abstract: In one example, an apparatus comprises: a memory array having an array of memory elements arranged in rows and columns, each memory element being configured to store a data element; and a memory access circuit configured to: perform a row write operation to store a first group of data elements at a first row of the array of memory elements; perform a column read operation at a first column of the array of memory elements to obtain a second group of data elements; and perform a column write operation to store a third group of data elements at the first column of the array of memory elements to replace the second group of data elements.Type: GrantFiled: June 24, 2020Date of Patent: September 6, 2022Assignee: Amazon Technologies, Inc.Inventors: Kun Xu, Paul Gilbert Meyer, Ron Diamant

Patent number: 11409840Abstract: An array processor includes processor element arrays distributed in rows and columns. The processor element arrays perform operations on parameter values. The array processor also includes memory interfaces that are dynamically mapped to mutually exclusive subsets of the rows and columns of the processor element arrays based on dimensions of matrices that provide the parameter values to the processor element arrays. In some cases, the processor element arrays are vector arithmetic logic unit (ALU) processors and the memory interfaces are direct memory access (DMA) engines. The rows of the processor element arrays in the subsets are mutually exclusive to the rows in the other subsets and the columns of the processor element arrays in the subsets are mutually exclusive to the columns in the other subsets. The matrices can be symmetric or asymmetric, e.g., one of the matrices can be a vector having a single column.Type: GrantFiled: September 25, 2020Date of Patent: August 9, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Sateesh Lagudu, Allen H. Rush, Michael Mantor, Arun Vaidyanathan Ananthanarayan, Prasad Nagabhushanamgari

Patent number: 11410070Abstract: A quantum computing device comprises at least one quantum register including a plurality of logical qubits. A compression engine is coupled to each logical qubit of the plurality of logical qubits. Each compression engine is configured to compress syndrome data. A decompression engine is coupled to each compression engine. Each decompression engine is configured to receive compressed syndrome data, decompress the received compressed syndrome data, and route the decompressed syndrome data to a decoder block.Type: GrantFiled: November 18, 2019Date of Patent: August 9, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Poulami Das, Nicolas Guillaume Delfosse, Christopher Anand Pattison, Srilatha Manne, Douglas Carmean, Krysta Marie Svore, Helmut Gottfried Katzgraber

Patent number: 11392379Abstract: Disclosed embodiments relate to executing a vector multiplication instruction. In one example, a processor includes fetch circuitry to fetch the vector multiplication instruction having fields for an opcode, first and second source identifiers, and a destination identifier, decode circuitry to decode the fetched instruction, execution circuitry to, on each of a plurality of corresponding pairs of fixedsized elements of the identified first and second sources, execute the decoded instruction to generate a doublesized product of each pair of fixedsized elements, the doublesized product being represented by at least twice a number of bits of the fixed size, and generate a signed fixedsized result by rounding the most significant fixedsized portion of the doublesized product to fit into the identified destination.Type: GrantFiled: September 27, 2017Date of Patent: July 19, 2022Assignee: Intel CorporationInventors: Venkateswara R. Madduri, Carl Murray, Elmoustapha OuldAhmedVall, Mark J. Charney, Robert Valentine, Jesus Corbal

Patent number: 11392849Abstract: Systems and methods that facilitate motion formalism utilizing quantum computing, to compute matrix operators in terms of commutators between qubit operators and measurements on the quantum hardware, wherein the commutators are computed utilizing symbolic calculus. Embodiments reduce computational cost of generalized eigenvalue synthesis relying on symbolic calculus and parallelization. Embodiments disclosed herein can also develop estimators of excitedstates properties, considering constants of motion (e.g. spin) and nonconstants of motions (e.g. dipoles, density matrices).Type: GrantFiled: September 18, 2020Date of Patent: July 19, 2022Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, JSR CORPORATIONInventors: Mario Motta, Pauline Ollitrault, Stephen Wood, Panagiotis Barkoutsos, Joseph Latone, Ivano Tavernelli, Gavin Jones, Edward PyzerKnapp, Yuya Onishi

Patent number: 11379185Abstract: A matrix multiplication device and an operation method thereof are provided. The matrix multiplication device includes a plurality of unit circuits. Each of the unit circuits includes a multiplyingadding circuit, a first register, and a second register. A first input terminal and a second input terminal of the multiplyingadding circuit are respectively coupled to a corresponding first input line and a corresponding second input line. An input terminal and an output terminal of the first register are respectively coupled to an output terminal and a third input terminal of the multiplyingadding circuit. The second register is coupled to the first register to receive and temporarily store a multiplication accumulation result. Wherein, the second registers of the unit circuits output the multiplication accumulation results in a column direction in a first output mode, and output the multiplication accumulation results in a row direction in a second output mode.Type: GrantFiled: September 21, 2020Date of Patent: July 5, 2022Assignee: NEUCHIPS CORPORATIONInventors: JianWen Chen, ChiungLiang Lin

Patent number: 11334355Abstract: Technology for providing data to a processing unit is disclosed. A computer processor may be divided into a master processing unit and consumer processing units. The master processing unit at least partially decodes a machine instruction and determines whether data is needed to execute the machine instruction. The master processing unit sends a request to memory for the data. The request may indicate that the data is to be sent from the memory to a consumer processing unit. The data sent by the memory in response to the request may be stored in local read storage that is close to the consumer processing unit for fast access. The master processing unit may also provide the machine instruction to the consumer processing unit. The consumer processing unit may access the data from the local read storage and execute the machine instruction based on the accessed data.Type: GrantFiled: May 4, 2017Date of Patent: May 17, 2022Assignee: Futurewei Technologies, Inc.Inventors: Alan Gatherer, Sushma Wokhlu, Peter Yan, Ywhpyng Harn, Ashish Rai Shrivastava, Tong Sun, Lee Dobson McFearin

Patent number: 11294986Abstract: Techniques regarding an iterative energyscaled variational quantum eigensolver process are provided. For example, one or more embodiments described herein can comprise a system, which can comprise a memory that can store computer executable components. The system can also comprise a processor, operably coupled to the memory, and that can execute the computer executable components stored in the memory. The computer executable components can comprise a readout component that determines a ground state energy value of a quantum Hamiltonian by employing a variational quantum eigensolver (VQE) algorithm, wherein VQE algorithm utilizes a symmetry that emerges at an energy scale of the quantum Hamiltonian.Type: GrantFiled: November 22, 2019Date of Patent: April 5, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Antonio Mezzacapo, Richard Chen, Marco Pistoia

Patent number: 11269973Abstract: Repeating patterns are identified in a matrix. Based on the identification of the repeating patterns, instructions are generated, which are executable by processing cores of a dot product engine to allocate analog multiplication crossbars of the dot product engine to perform multiplication of the matrix with a vector.Type: GrantFiled: April 28, 2020Date of Patent: March 8, 2022Assignee: Hewlett Packard Enterprise Development LPInventors: Mashood Abdulla Kodavanji, Soumitra Chatterjee, Chinmay Ghosh, Mohan Parthasarathy

Patent number: 11263018Abstract: A vector processor is disclosed. The vector processor includes a plurality of register files provided to each of a plurality of single instruction multiple data (SIMD) lanes, storing each of a plurality of pieces of data, and respectively outputting input data to be used in a current cycle among the plurality of pieces of data, a shuffle unit for receiving a plurality of pieces of input data outputted from the plurality of register files, and performing shuffling such that the received plurality of pieces of input data respectively correspond to the plurality of SIMD lanes and outputting the same; and a command execution unit for performing a parallel operation by receiving input data outputted from the shuffle unit.Type: GrantFiled: October 23, 2017Date of Patent: March 1, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Kiseok Kwon, Jaeun Park, Dongkwan Suh, Kangjin Yoon

Patent number: 11263512Abstract: A novel and useful neural network (NN) processing core adapted to implement artificial neural networks (ANNs) and incorporating strictly separate control and data planes. The NN processor is constructed from selfcontained computational units organized in a hierarchical architecture. The homogeneity enables simpler management and control of similar computational units, aggregated in multiple levels of hierarchy. Computational units are designed with minimal overhead as possible, where additional features and capabilities are aggregated at higher levels in the hierarchy. Onchip memory provides storage for content inherently required for basic operation at a particular hierarchy and is coupled with the computational resources in an optimal ratio. Lean control provides just enough signaling to manage only the operations required at a particular hierarchical level. Dynamic resource assignment agility is provided which can be adjusted as required depending on resource availability and capacity of the device.Type: GrantFiled: April 3, 2018Date of Patent: March 1, 2022Inventors: Avi Baum, Or Danon, Hadar Zeitlin, Daniel Ciubotariu, Rami Feig

Patent number: 11256508Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array, a null vector count (N), and a selected dimension. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. N null stream vectors are inserted into the stream of vectors for the selected dimension without fetching respective null data from the memory.Type: GrantFiled: May 23, 2019Date of Patent: February 22, 2022Assignee: Texas Instruments IncorporatedInventors: Asheesh Bhardwaj, William Franklin Leven, Son Hung Tran, Timothy David Anderson

Patent number: 11249761Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing nonzerovalued elements together and storing the matrix position of each nonzerovalued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.Type: GrantFiled: July 20, 2020Date of Patent: February 15, 2022Assignee: Intel CorporationInventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha OuldAhmedVall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke

Patent number: 11249759Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for two selected dimensions of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When either selected dimension in the stream of vectors exceeds a respective specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.Type: GrantFiled: May 23, 2019Date of Patent: February 15, 2022Assignee: Texas Instruments IncorporatedInventors: William Franklin Leven, Asheesh Bhardwaj, Son Hung Tran, Timothy David Anderson

Patent number: 11232175Abstract: Implementations of the present disclosure relate to a method, system and program product for determining a causality between a plurality of variables.Type: GrantFiled: March 28, 2019Date of Patent: January 25, 2022Assignee: NEC CORPORATIONInventors: Lu Feng, Chunchen Liu, Wenjuan Wei

Patent number: 11231929Abstract: Software instructions are executed on a processor within a computer system to configure a steaming engine with stream parameters to define a multidimensional array. The stream parameters define a size for each dimension of the multidimensional array and a specified width for a selected dimension of the array. Data is fetched from a memory coupled to the streaming engine responsive to the stream parameters. A stream of vectors is formed for the multidimensional array responsive to the stream parameters from the data fetched from memory. When the selected dimension in the stream of vectors exceeds the specified width, the streaming engine inserts null elements into each portion of a respective vector for the selected dimension that exceeds the specified width in the stream of vectors. Stream vectors that are completely null are formed by the streaming engine without accessing the system memory for respective data.Type: GrantFiled: May 23, 2019Date of Patent: January 25, 2022Assignee: Texas Instruments IncorporatedInventors: Son Hung Tran, Shyam Jagannathan, Timothy David Anderson

Patent number: 11216533Abstract: A grouping means 11 that extracts basis vectors from a set of basis vectors for a lattice having a predetermined relationship with a matrix used to generate a public key, and that groups the basis vectors such that a predetermined condition is satisfied. A sampling means 12 that samples, for at least one group, the same number of arbitrary values as the number of a plurality of basis vectors included in that group, in parallel for the individual basis vectors, onto a lattice constituted by the plurality of basis vectors, the arbitrary values serving as random numbers following a discrete Gaussian distribution. The predetermined condition is that each of the basis vectors included in a group is orthogonal to the other basis vectors included in the same group and is also orthogonal to GramSchmidt basis vectors, which are vectors obtained by orthogonalizing the other basis vectors by GramSchmidt orthogonalization.Type: GrantFiled: May 12, 2017Date of Patent: January 4, 2022Assignee: NEC CORPORATIONInventors: Yuki Tanaka, Kazuhiko Minematsu

Patent number: 11188328Abstract: Aspects include a compute array of a processor with mixedprecision numerical linear algebra support. A first precision and a first shape of a first input matrix and a second precision and a second shape of a second input matrix to the compute array are determined. A number of rank updates of a result matrix to store in an accumulator register having a predetermined size are determined, where the number of rank updates is based on the first precision and the first shape of the first input matrix, the second precision and the second shape of the second input matrix, and the predetermined size of the accumulator register. A plurality of linear algebra operations is repeated in parallel within the compute array to update the result matrix in the accumulator register based on the first input matrix, the second input matrix, and the number of rank updates.Type: GrantFiled: December 12, 2019Date of Patent: November 30, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Jose E. Moreira, Brett Olsson, Brian W. Thompto, Silvia Melitta Mueller, Andreas Wagner

Patent number: 11182458Abstract: Embodiments of the present invention are directed to a new instruction set extension and a method for providing 3D lane predication for matrix operations. In a nonlimiting embodiment of the invention, a first input matrix having m rows and k columns and a second input matrix having k rows and n columns are received by a compute array of a processor. A threedimensional predicate mask having an Mbit row mask, an Nbit column mask, and a Kbit rank mask is generated. A result matrix of up to m rows, up to n columns, and up to k rank updates is determined based on the first input matrix, the second input matrix, and the predicate mask.Type: GrantFiled: December 12, 2019Date of Patent: November 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Brett Olsson, Brian W. Thompto, Jose E. Moreira, Silvia Melitta Mueller, Andreas Wagner

Patent number: 11182126Abstract: Computationally efficient mixed precision floating point waveform generation takes advantage of the highspeed generation of waveforms with singleprecision floating point numbers while reducing the generally unacceptable loss of precision of pure singleprecision floating point to generate any waveform that repeats in 2?. This approaches computes a reference phase in double precision as the modulus of the phase with 2? and then computes offsets to that value in single precision. The double precision reference phase is recomputed as needed depending on how quickly the phase grows and how large a machine epsilon is desired.Type: GrantFiled: June 25, 2019Date of Patent: November 23, 2021Assignee: Raytheon CompanyInventors: Ender Barillas, Brian Filarsky

Patent number: 11169800Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.Type: GrantFiled: October 18, 2019Date of Patent: November 9, 2021Assignee: Intel CorporationInventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha OuldAhmedVall, Jesus Corbal, Roman S. Dubtsov

Patent number: 11144615Abstract: Embodiments relate to a denominator circuit that determines the number of valid elements of a data surface covered by a kernel depending on various locations of the kernel relative to the data surface. The denominator circuit includes a first circuit and a second circuit that have the same structure. The first circuit receives numbers representing different horizontal locations of a reference point in the kernel and generates a first matrix with first output elements corresponding to the different horizontal locations. The second circuit receives numbers representing different vertical locations of a reference point in the kernel and generates a second matrix with second output elements corresponding to the different vertical locations. A matrix multiplication of the first matrix and the second matrix is performed to obtain an array of valid elements covered by the kernel.Type: GrantFiled: April 14, 2020Date of Patent: October 12, 2021Assignee: APPLE INC.Inventors: Yiu Chun Tse, Ji Liang Song, Ponan Kuo

Patent number: 11113028Abstract: An apparatus and method are provided for performing an index operation. The apparatus has vector processing circuitry to perform an index operation in each of a plurality of lanes of parallel processing. The index operation requires an index value opm to be multiplied by a multiplier value e to produce a multiplication result. The number of lanes of parallel processing is dependent on a specified element size, and the multiplier value is different, but known, for each lane of parallel processing. The vector processing circuitry comprises mapping circuitry to perform, within each lane, mapping operations on the index value opm in order to generate a plurality of intermediate input values. The plurality of intermediate input values are such that the addition of the plurality of intermediate input values produces the multiplication result. Within each lane the mapping operations are determined by the multiplier value used for that lane.Type: GrantFiled: July 25, 2019Date of Patent: September 7, 2021Assignee: Arm LimitedInventors: Xiaoyang Shen, David Raymond Lutz, CÃ©dric Denis Robert Airaud

Patent number: 11100192Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a first vector and a second vector, wherein the first vector includes one or more first elements and the second vector includes one or more second elements. The aspects may further include one or more adders and a combiner. The one or more adders may be configured to respectively add each of the first elements to a corresponding one of the second elements to generate one or more addition results. The combiner may be configured to combine a combiner configured to combine the one or more addition results into an output vector.Type: GrantFiled: October 26, 2018Date of Patent: August 24, 2021Assignee: Cambricon Technologies Corporation LimitedInventors: Jinhua Tao, Tian Zhi, Shaoli Liu, Tianshi Chen, Yunji Chen

Patent number: 11099844Abstract: Performing ndimensional stencil processing may include providing a memory unit organized in memory banks for storing elements of an nD matrix, processing the matrix using a stencil vector unit in a first processing direction of the matrix tilewise(/d). Data elements of the matrix can be equally distributed over the memory banks, and the number of memory banks can be equal to the number of data elements processable by the stencil vector unit in parallel, which is equal to the number of data elements in a width direction of one of the tiles. Additionally, the boundary elements can be grouped in the width direction of the tiles into a nD submatrix, and the nD submatrix can be processed equally to the processing the nD matrix orthogonal to the first processing direction.Type: GrantFiled: May 16, 2019Date of Patent: August 24, 2021Assignee: International Business Machines CorporationInventor: Jan Van Lunteren

Patent number: 11093582Abstract: A method for calculating axis deviation of rotor assembly based on end face runout measurement comprises three parts: calculation of three contact points, a triangle judgment criterion and a homogeneous coordinate transformation algorithm of a deviation matrix. Based on the measured end face runout data in production practice, the method realizes the prediction of axis deviation before assembly, improves the concentricity of rotors after assembly, also greatly increases the onetime acceptance rate of assembly and has important practical guiding significance for axis prediction as well as assembly phase adjustment and optimization in the assembly process of aeroengine rotor pieces.Type: GrantFiled: September 12, 2018Date of Patent: August 17, 2021Assignee: Dalian University of TechnologyInventors: Qingchao Sun, Xin Liu, Yichao Gao, Yunlong Wang

Patent number: 11093243Abstract: Vector interleaving techniques in a data processing apparatus are disclosed, comprising apparatuses, instructions, methods of operating the apparatuses, and simulator implementations. A vector interleaving instruction specifies a first source register, second source register, and destination register. A first set of input data items is retrieved from the first source register and a second set of input data items from the second source register. A data processing operation is performed on selected input data item pairs taken from the first and second set of input data items to generate a set of result data items, which are stored as a result data vector in the destination register. First source register dependent result data items are stored in a first set of alternating positions in the destination data vector and second source register dependent result data items are stored in a second set of alternating positions in the destination data vector.Type: GrantFiled: July 2, 2018Date of Patent: August 17, 2021Assignee: ARM LimitedInventors: Mbou Eyole, Nigel John Stephens

Patent number: 11086625Abstract: In an embodiment, a processor supports one or more compression assist instructions which may be employed in compression software to improve the performance of the processor when performing compression/decompression. That is, the compression/decompression task may be performed more rapidly and consume less power when the compression assist instructions are employed then when they are not. In some cases, the cost of a more effective, more complex compression algorithm may be reduced to the cost of a less effective, less complex compression algorithm.Type: GrantFiled: September 10, 2019Date of Patent: August 10, 2021Assignee: Apple Inc.Inventors: Eric Bainville, Ali Sazegari

Patent number: 11030095Abstract: A processing system includes a central processing unit (CPU) and a graphics processing unit (GPU) that has a plurality of compute units. The GPU receives an image from the CPU and determines a total result area in a virtualmatrixmultiplication space of a virtual matrixmultiplication output matrix based on convolutional parameters associated with the image in an image space. The GPU partitions the total result area of the virtual matrixmultiplication output matrix into a plurality of virtual segments. The GPU allocates convolution operations to the plurality of compute units based on each virtual segment of the plurality of virtual segments.Type: GrantFiled: December 10, 2018Date of Patent: June 8, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Swapnil Sakharshete, Samuel Lawrence Wasmundt

Patent number: 11023242Abstract: A method and apparatus of asynchronous scheduling in a graphics device includes sending one or more instructions from an instruction scheduler to one or more instruction firstin/firstout (FIFO) devices. An instruction in the one or more FIFO devices is selected for execution by a singleinstruction/multipledata (SIMD) pipeline unit. It is determined whether all operands for the selected instruction are available for execution of the instruction, and if all the operands are available, the selected instruction is executed on the SIMD pipeline unit. The selftimed arithmetic pipeline unit (SIMD pipeline unit) is effectively encapsulated in a synchronous, (e.g., clocked by global clock), scheduler and register file environment.Type: GrantFiled: January 27, 2017Date of Patent: June 1, 2021Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Greg Sadowski, Syed Zohaib M. Gilani

Patent number: 11017290Abstract: A signal processing module comprises at least one operational unit incorporating computation units, input and output interfaces able to be linked to a bus and a memory storing data destined for the computation units, the memory being organized so that each data word is stored columnwise over several addresses according to an order dependent on the application, a column having a width of one bit, the words being transferred in series to the computation units.Type: GrantFiled: November 27, 2014Date of Patent: May 25, 2021Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVESInventors: Marc Duranton, JeanMarc Philippe

Patent number: 10996944Abstract: A processing device can establish a machine learning model to produce software dependency recommendations. The model can be periodically retrained to update its knowledge of available dependencies. The software dependencies can be incorporated into software by developers who receive the selection or automatically by an intelligent software development platform. A processing device can train the model by assembling sparse user data based on feedback corresponding to software dependencies to produce a vector of preferences for each user. The processing device can also generate a latent vector of attributes for each software dependency. The processing device can then apply matrix factorization to the vectors to produce a behavior matrix that is used to train the machine learning model.Type: GrantFiled: August 6, 2019Date of Patent: May 4, 2021Assignee: Red Hat, Inc.Inventors: Avishkar Gupta, Aagam Shah, Sarah Masud

Patent number: 10986014Abstract: A monitoring system detects a deviation in a monitoring metric of a system component of a remote management system that remotely manages image forming apparatuses. When the monitoring system detects a deviation in online device count greater than or equal to a deviation threshold and makes a determination that there is a correlation between the deviations in monitoring metrics of multiple system components as detected, the monitoring system sends a failure report indicating that a failure is in the remote management system.Type: GrantFiled: June 5, 2020Date of Patent: April 20, 2021Assignee: KYOCERA DOCUMENT SOLUTIONS INC.Inventors: Dukil Park, Kazuki Nishikai, Koki Nakajima, Yasuo Nakashima, Satoshi Goshima, Yuichi Obayashi, Takeshi Nakamura

Patent number: 10915318Abstract: A vector processing unit is described, and includes processor units that each include multiple processing resources. The processor units are each configured to perform arithmetic operations associated with vectorized computations. The vector processing unit includes a vector memory in data communication with each of the processor units and their respective processing resources. The vector memory includes memory banks configured to store data used by each of the processor units to perform the arithmetic operations. The processor units and the vector memory are tightly coupled within an area of the vector processing unit such that data communications are exchanged at a high bandwidth based on the placement of respective processor units relative to one another, and based on the placement of the vector memory relative to each processor unit.Type: GrantFiled: March 4, 2019Date of Patent: February 9, 2021Assignee: Google LLCInventors: William Lacy, Gregory Michael Thorson, Christopher Aaron Clark, Norman Paul Jouppi, Thomas Norrie, Andrew Everett Phelps

Patent number: 10896039Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multidimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.Type: GrantFiled: January 31, 2019Date of Patent: January 19, 2021Assignee: Intel CorporationInventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau

Patent number: 10897605Abstract: Apparatuses, systems, and methods related to an image processor formed in an array of memory cells are described. An image processor as described herein is configured to reduce complexity and power consumption and/or increase data access bandwidth by performing image processing in the array of memory cells relative to image processing by a host processor external to the memory array. For instance, one apparatus described herein includes sensor circuitry configured to provide an input vector, as a plurality of bits that corresponds to a plurality of color components for an image pixel, and an image processor formed in an array of memory cells. The image processor is coupled to the sensor circuitry to receive the plurality of bits of the input vector. The image processor is configured to perform a color correction operation in the array by performing matrix multiplication on the input vector and a parameter matrix to determine an output vector that is color corrected.Type: GrantFiled: August 26, 2019Date of Patent: January 19, 2021Assignee: Micron Technology, Inc.Inventors: FaLong Luo, Jaime C. Cummins, Tamara Schmitz

Patent number: 10872130Abstract: Based on a Modified GramSchmidt (MGS) algorithm, QR decomposition techniques are optimized for parallel structures that provide arithmeticlogic unit (ALU) to ALU connectivity. The techniques utilize a different loop organization, but the dependent functional sequences of the algorithm are unchanged, thereby reducing likelihood of affecting error analysis and/or numerical stability. Some integrated circuit devices (e.g., FPGA) may implement hard floatingpoint (HFP) circuitry, such as a digital signal processing (DSP) block, distributed memories, and/or flexible internal connectivity, which can support the discussed high performance matrix arithmetic.Type: GrantFiled: August 31, 2017Date of Patent: December 22, 2020Assignee: Intel CorporationInventor: Martin Langhammer

Patent number: 10832799Abstract: Methods, systems and apparatus for detecting patterns in constituents of at least one biological organism are disclosed. In accordance with one method, clusters of the constituents are determined (208) by selecting (210) different subsets of at least one of genes or proteins and identifying (212) the clusters from biological data corresponding to the selected subsets. Here, membership values for the constituents, indicating membership within the clusters, are calculated for use as a basis of an additional cluster determination process (208) to obtain final clusters of constituents. By underpinning the preliminary clustering on different subsets of biological data and formulating the higherlevel clustering on the basis of the membership values, the embodiments can enable an evaluation of a large variety of biological data in a practical, accurate and highly efficient manner.Type: GrantFiled: August 12, 2016Date of Patent: November 10, 2020Assignee: Koninklijke Philips N.V.Inventors: Konstantin Volyanskyy, Nevenka Dimitrova

Patent number: 10762163Abstract: In embodiments of probabilistic matrix factorization for automated machine learning, a computing system memory maintains different workflows that each include preprocessing steps for a machine learning model, the machine learning model, and one or more parameters for the machine learning model. The computing system memory additionally maintains different data sets, upon which the different workflows can be trained and tested. A matrix is generated from the different workflows and different data sets, where cells of the matrix are populated with performance metrics that each indicate a measure of performance for a workflow applied to a data set. A lowrank decomposition of the matrix with populated performance metrics is then determined. Based on the lowrank decomposition, an optimum workflow for a new data set can be determined. The optimum workflow can be one of the different workflows or a hybrid of at least two of the different workflows.Type: GrantFiled: December 5, 2016Date of Patent: September 1, 2020Assignee: Microsoft Technology Licensing, LLCInventor: Nicolo Fusi

Patent number: 10762164Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.Type: GrantFiled: July 19, 2018Date of Patent: September 1, 2020Assignee: Cambricon Technologies Corporation LimitedInventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen

Patent number: 10755426Abstract: An electronic device comprises circuitry implementing a depth map enhancer. The depth map enhancer obtains an initial depth map corresponding to a scene and an image of the scene. The depth map enhancer generates a refined depth map corresponding to the scene using an optimizer, the initial depth map and the image. The refined depth map includes estimated depth indicators corresponding to at least a first depthinformation region, identified based at least in part on a first criterion, of the initial depth map. Input based on the refined depth map is provided to an image processing application.Type: GrantFiled: May 23, 2018Date of Patent: August 25, 2020Assignee: Apple Inc.Inventors: Mark Norman Lester Jouppi, Michael Wish Tao, Eric Bujold, Stephane Simon Rene Ben Soussan, Volker Roelke, Geoffrey T. Anneheim, Julio Cesar Hernandez Zaragoza, Florian Ciurea

Patent number: 10747846Abstract: Matrix processing includes: initializing a current matrix based at least in part on an original matrix; iteratively determining a matrix property using a plurality of iteration cycles, including, in an iteration cycle: partitioning the current matrix to obtain a plurality of partitions, wherein the plurality of partitions includes a submatrix; modifying the submatrix based at least in part on other partitions of the plurality of partitions to provide a current matrix for a next iteration; and continuing to iterate until a condition is met. Matrix processing further includes obtaining the matrix property from an iteration result; and outputting the matrix property.Type: GrantFiled: September 25, 2019Date of Patent: August 18, 2020Assignee: Cyber Atomics, Inc.Inventor: Roy Batruni

Patent number: 10743026Abstract: A video encoding method, a video encoding apparatus, a video decoding method, and a video decoding apparatus are provided. The video encoding method includes producing a fast transform matrix based on a transform matrix which is used for frequency transformation on a block which has a predetermined size; producing a transformed block by transforming the block having the predetermined size by using the fast transform matrix; and performing scaling with respect to the transformed block in order to correct a difference between the transform matrix used for the frequency transformation and the fast transform matrix.Type: GrantFiled: September 5, 2019Date of Patent: August 11, 2020Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yoonmi Hong, Woojin Han, Minsu Cheon, Jianle Chen