Patents by Inventor G. Glenn Henry

G. Glenn Henry has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural network unit with memory layout to perform efficient 3-dimensional convolutions

Patent number: 10438115

Abstract: A neural network unit convolves an H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a respective word of an N-word row of a second memory and multiplexed-register selectively receiving a respective word of an N-word row of a first memory or word rotated from an adjacent PU multiplexed-register. H first memory rows hold input blocks of B words each of channels of respective 2-dimensional input row slices. R×S×C second memory rows hold filter blocks of B words each holding P copies of a filter weight. B is the smallest factor of N greater than W. The PU blocks multiply-accumulate input blocks and filter blocks in column-channel-row order; they read a row of input blocks and rotate it around the N PUs while performing multiply-accumulate operations so each PU block receives each input block before reading another row.

Type: Grant

Filed: December 1, 2016

Date of Patent: October 8, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Kim C. Houck
Processor with memory array operable as either last level cache slice or neural network unit memory

Patent number: 10430706

Abstract: A processor comprising a plurality of processing cores, a last level cache memory (LLC) shared by the plurality of processing cores, and a neural network unit (NNU) comprising an array of neural processing units (NPU) and a memory array. The LLC comprises a plurality of slices. To transition from a first mode in which the memory array operates to store neural network weights read by the plurality of NPUs to a second mode in which the memory array operates as a slice of the LLC in addition to the plurality of slices, the processor write-back-invalidates the LLC and updates a hashing algorithm to include the memory array as a slice of the LLC in addition to the plurality of slices. To transition from the second mode to the first mode, the processor write-back-invalidates the LLC and updates the hashing algorithm to exclude the memory array from the LLC.

Type: Grant

Filed: December 1, 2016

Date of Patent: October 1, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Douglas R. Reed
Asymmetric multi-core processor with native switching mechanism

Patent number: 10423216

Abstract: A processor includes first and second processing cores configured to support first and second respective subsets of features of its instruction set architecture (ISA) feature set. The first subset is less than all the features of the ISA feature set. The first and second subsets are different but their union is all the features of the ISA feature set. The first core detects a thread, while being executed by the first core rather than by the second core, attempted to employ a feature not in the first subset and, in response, to indicate a switch from the first core to the second core to execute the thread. The unsupported feature may be an unsupported instruction or operating mode. A switch may also be made if the lower performance/power core is being over-utilized or the higher performance/power core is being under-utilized.

Type: Grant

Filed: November 12, 2013

Date of Patent: September 24, 2019

Assignee: VIA TECHNOLOGIES, INC.

Inventors: Rodney E. Hooker, Terry Parks, G. Glenn Henry
Processor with memory array operable as either victim cache or neural network unit memory

Patent number: 10423876

Abstract: A processor comprises a neural network unit (NNU) and a processing complex (PC) comprising a processing core and cache memory. The NNU comprises neural processing units (NPU), cache control logic (CCL) and a memory array (MA). To transition from a first mode in which the MA operates to hold neural network weights for the array of NPUs to a second mode in which the MA and CCL operate as a victim cache, the CCL begins to cache evicted cache lines into the MA in response to eviction requests and begins to provide to the PC lines that hit in the MA in response to load requests. To transition from the second mode to the first mode, the CCL invalidates all lines of the MA, ceases to cache evicted lines into the MA in response to eviction requests, and ceases to provide to the PC lines in response to load requests.

Type: Grant

Filed: December 1, 2016

Date of Patent: September 24, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Douglas R. Reed
Neural network unit that performs efficient 3-dimensional convolutions

Patent number: 10417560

Abstract: A neural network unit convolves a H×W×C input with F R×S×C filters to generate F Q×P outputs. N processing units (PU) each have a register receiving a memory word and a multiplexed-register selectively receiving a memory word or word rotated from an adjacent PU multiplexed-register. The N PUs are logically partitioned as G blocks each of B PUs. The PUs convolve in a column-channel-row order. For each filter column: the N registers read a memory row, each PU multiplies the register and the multiplexed-register to generate a product to accumulate, and the multiplexed-registers are rotated by one; the multiplexed-registers are rotated to align the input blocks with the adjacent PU block. This is performed for each channel. For each filter row, N multiplexed-registers read a memory row for the multiply-accumulations, F column-channel-row-sums are generated and written to the memory, then all steps are performed for each output row.

Type: Grant

Filed: December 1, 2016

Date of Patent: September 17, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Kim C. Houck
Domain-differentiated power state coordination system

Patent number: 10409347

Abstract: A multi-core microprocessor is organized into a plurality of resource-associated domains including core domains, group domains, and a global domain. Each domain relates to either local resources, group resources, or global resources that are respectively used by a single core, a group of cores, or all the cores. Each core has its own independently settable target operating state selected from a plurality of possible target operating states that designate configurations for the local resources, group resources, and global resources. Each core is provided with coordination logic configured to implement or request implementation of the core's target operating state, but only to the extent that implementation of the target operating state would not reduce performance of any other core below its own target operating state.

Type: Grant

Filed: November 15, 2018

Date of Patent: September 10, 2019

Assignee: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Darius D. Gaskins
Neural network unit with neural memory and array of neural processing units and sequencer that collectively shift row of data received from neural memory

Patent number: 10409767

Abstract: An array of N processing units (PU) each has: an accumulator; an arithmetic unit performs an operation on first, second and third inputs to generate a result to store in the accumulator, the first input receives the accumulator output; a weight input is received by the second input to the arithmetic unit; a multiplexed register has first and second data inputs, an output received by the third input to the arithmetic unit, and a control input that controls the data input selection. The multiplexed register output is also received by an adjacent PU's multiplexed register second data input. The N PU's multiplexed registers collectively operate as an N-word rotater when the control input specifies the second data input. Respective first/second memories hold W/D rows of N weight/data words and provide the N weight/data words to the corresponding weight/multiplexed register first data inputs. A sequencer controls the multiplexer and memories.

Type: Grant

Filed: April 5, 2016

Date of Patent: September 10, 2019

Assignee: VIA ALLIANCE SEMICONDUCTORS CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit with neural memory and array of neural processing units that collectively perform multi-word distance rotates of row of data received from neural memory

Patent number: 10395165

Abstract: N processing units (PU) each have an arithmetic unit (AU) that performs an operation on first, second and third inputs to generate a result to store in an accumulator having an output provided to the first input. A weight input is received by the AU second input. A multiplexed register has first, second, third and fourth data inputs and an output received by the third AU input. A first memory provides N weight words to the N weight inputs. A second memory provides N data words to the multiplexed register first data inputs. The multiplexed register output is also received by the second, third, and fourth data input of the multiplexed register one, 2{circumflex over (?)}J, and 2{circumflex over (?)}K PUs away, respectively. The N multiplexed registers collectively operate as an N-word rotater that rotates by one, 2{circumflex over (?)}J, or 2{circumflex over (?)}K words when the control input specifies the second, third, or fourth data input, respectively.

Type: Grant

Filed: December 1, 2016

Date of Patent: August 27, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD

Inventors: G. Glenn Henry, Kim C. Houck
Microprocessor that fuses if-then instructions

Patent number: 10394562

Abstract: A microprocessor performs an If-Then (IT) instruction and an associated IT block by extracting condition information from the IT instruction and for each instruction of the IT block: determining a respective condition for the instruction using the extract condition information, translating the instruction into a microinstruction, and conditionally executing the microinstruction based on the respective condition. For a first instruction, the translating comprises fusing the IT instruction with the first IT block instruction. A hardware instruction translation unit performs the extracting, determining and translating. Execution units conditionally execute the microinstructions. The hardware instruction translation unit and execution units are distinct hardware elements and are coupled together.

Type: Grant

Filed: October 10, 2017

Date of Patent: August 27, 2019

Assignee: VIA TECHNOLOGIES, INC.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit with shared activation function units

Patent number: 10387366

Abstract: A neural network unit includes first and second memories that hold rows of respective N weight and data words and provides a row of them to N corresponding neural processing units (NPU), respectively. The N NPUs each have an accumulator and an arithmetic unit that performs a series of multiply operations on pairs of weight words and data words received from the first and second memories to generate a series of products. The arithmetic unit also performs a series of addition operations on the series of products to accumulate an accumulated value in the accumulator. Activation function units (AFU) are each shared by a corresponding plurality of the N NPUs. Each AFU, in a sequential fashion with respect to each NPU of the corresponding plurality of the N NPUs, receives the accumulated value from the NPU and performs an activation function on the accumulated value to generate a result.

Type: Grant

Filed: April 5, 2016

Date of Patent: August 20, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit employing user-supplied reciprocal for normalizing an accumulated value

Patent number: 10380064

Abstract: A neural network unit including a register programmable with a representation of a reciprocal value of a divisor and a plurality of neural processing units (NPU). Each NPU has an ALU, an accumulator, and a reciprocal multiplier unit. The ALU performs arithmetic and logical operations on a sequence of operands to generate a sequence of results and accumulates the sequence of results as an accumulated value into the accumulator. The reciprocal multiplier unit receives the representation of the reciprocal value and the accumulated value and in response generates a result that is the quotient of the accumulated value and the divisor.

Type: Grant

Filed: April 5, 2016

Date of Patent: August 13, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit that performs concurrent LSTM cell calculations

Patent number: 10380481

Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each of the N words. N processing units (PU) are arranged as N/J mutually exclusive PU groups. Each PU group has an associated OBWG. Each PU includes an accumulator and an arithmetic unit that performs operations on inputs, which include the accumulator output, to generate a first result for accumulation into the accumulator. Activation function units selectively perform an activation function on the accumulator outputs to generate results for provision to the N output buffer words. For each PU group, four of the J PUs and at least one of the activation function units compute an input gate, a forget gate, an output gate and a candidate state of a Long Short Term Memory (LSTM) cell, respectively, for writing to respective first, second, third and fourth words of the associated OBWG.

Type: Grant

Filed: April 5, 2016

Date of Patent: August 13, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
Multi-operation neural network unit

Patent number: 10366050

Abstract: A neural network unit (NNU) includes N neural processing units (NPU). Each NPU has an arithmetic unit and an accumulator. First and second multiplexed registers of the N NPUs collectively selectively operate as respective first and second N-word rotaters. First and second memories respectively hold rows of N weight/data words and provide the N weight/data words of a row to corresponding ones of the N NPUs. The NPUs selectively perform: multiply-accumulate operations on rows of N weight words and on a row of N data words, using the second N-word rotater; convolution operations on rows of N weight words, using the first N-word rotater, and on rows of N data words, the rows of weight words being a data matrix, and the rows of data words being elements of a convolution kernel; and pooling operations on rows of N weight words, using the first N-word rotater.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 30, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit that performs stochastic rounding

Patent number: 10353862

Abstract: A neural network unit includes a random bit source that generates random bits and a plurality of neural processing units (NPU). Each NPU includes an accumulator into which the NPU accumulates a plurality of products as an accumulated value and a rounder that receives the random bits from the random bit source and stochastically rounds the accumulated value based on a random bit received from the random bit source.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 16, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Mechanism for communication between architectural program running on processor and non-architectural program running on execution unit of the processor regarding shared resource

Patent number: 10353861

Abstract: Functional units of a processor fetch and decode architectural instructions of an architectural program. The architectural instructions are of an architectural instruction set of the processor. An execution unit includes first and second memories, a register and processing units. The first memory holds data in rows with addresses. The second memory holds non-architectural instructions of a non-architectural program. The architectural and non-architectural instruction sets are distinct. The processing units execute the non-architectural program instructions to read data from the first memory, perform operations on the data read from the first memory to generate results, and to write the results to the first memory. The register holds information that indicates progress made by the non-architectural program during execution. The first memory is also readable and writable by the architectural program. The architectural program uses the information to decide where in the first memory to read/write data.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 16, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit with neural processing units dynamically configurable to process multiple data sizes

Patent number: 10353860

Abstract: A neural network unit. A register holds an indicator that specifies narrow and wide configurations. A first memory holds rows of 2N/N narrow/wide weight words in the narrow/wide configuration. A second memory holds rows of 2N/N narrow/wide data words in the narrow/wide configuration. An array of neural processing units (NPU) is configured as 2N/N narrow/wide NPUs and to receive the 2N/N narrow/wide weight words of rows from the first memory and to receive the 2N/N narrow/wide data words of rows from the second memory in the narrow/wide configuration. In the narrow configuration, the 2N NPUs perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories. In the wide configuration, the N NPUs perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 16, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit with output buffer feedback and masking capability with processing unit groups that operate as recurrent neural network LSTM cells

Patent number: 10346351

Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 9, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
Direct execution by an execution unit of a micro-operation loaded into an architectural register file by an architectural instruction of a processor

Patent number: 10346350

Abstract: A processor includes an architectural register file loadable with micro-operations by architectural instructions of an architectural instruction set of the processor and an execution unit that executes instructions. The instructions are either architectural instructions or microinstructions into which architectural instructions are translated. The execution unit includes a decoder that decodes the instructions into micro-operations, a mode indicator that indicates one of first and second modes, a pipeline of stages to which are provided micro-operations that control circuits of the stages of the pipeline, and a multiplexer. The multiplexer selects for provision to the pipeline a micro-operation received from the decoder when the mode indicator indicates the first mode and selects for provision to the pipeline a micro-operation received from the architectural register file when the mode indicator indicates the second mode.

Type: Grant

Filed: April 5, 2016

Date of Patent: July 9, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks
Neural network unit with output buffer feedback and masking capability

Patent number: 10282348

Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, an arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs and an output. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective output buffer word. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and the accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output.

Type: Grant

Filed: April 5, 2016

Date of Patent: May 7, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks, Kyle T. O'Brien
Tri-configuration neural network unit

Patent number: 10275393

Abstract: A neural network unit configurable to first/second/third configurations has N narrow and N wide accumulators, multipliers and adders. Each multiplier performs a narrow/wide multiply on first and second narrow/wide inputs to generate a narrow/wide product. A first adder input receives a corresponding narrow/wide accumulator's output and third input receives a widened corresponding narrow multiplier's narrow product in the third configuration. In the first configuration, each narrow/wide adder performs a narrow/wide addition on the first and second inputs to generate a narrow/wide sum for storage into the corresponding narrow/wide accumulator. In the second configuration, each wide adder performs a wide addition on the first and a second input to generate a wide sum for storage into the corresponding wide accumulator. In the third configuration, each wide adder performs a wide addition on the first, second and third inputs to generate a wide sum for storage into the corresponding wide accumulator.

Type: Grant

Filed: April 5, 2016

Date of Patent: April 30, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: G. Glenn Henry, Terry Parks

prev 1 2 3 4 5 6 … next