Patents by Inventor Terry Parks

Terry Parks has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20170102940
    Abstract: An array of N processing units (PU) each has: an accumulator; an arithmetic unit performs an operation on first, second and third inputs to generate a result to store in the accumulator, the first input receives the accumulator output; a weight input is received by the second input to the arithmetic unit; a multiplexed register has first and second data inputs, an output received by the third input to the arithmetic unit, and a control input that controls the data input selection. The multiplexed register output is also received by an adjacent PU's multiplexed register second data input. The N PU's multiplexed registers collectively operate as an N-word rotater when the control input specifies the second data input. Respective first/second memories hold W/D rows of N weight/data words and provide the N weight/data words to the corresponding weight/multiplexed register first data inputs of the N PUs.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103301
    Abstract: A processor has an instruction fetch unit that fetches ISA instructions from memory and execution units that perform operations on instruction operands to generate results according to the processor's ISA. A hardware neural network unit (NNU) execution unit performs computations associated with artificial neural networks (ANN). The NNU has an array of ALUs, a first memory that holds data words associated with ANN neuron outputs, and a second memory that holds weight words associated with connections between ANN neurons. Each ALU multiplies a portion of the data words by a portion of the weight words to generate products and accumulates the products in an accumulator as an accumulated value. Activation function units normalize the accumulated values to generate outputs associated with ANN neurons. The ISA includes at least one instruction that instructs the processor to write data words and the weight words to the respective first and second memories.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170102941
    Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, an arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs and an output. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective output buffer word. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and the accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS, KYLE T. O'BRIEN
  • Publication number: 20170103302
    Abstract: A neural network unit. A register holds an indicator that specifies narrow and wide configurations. A first memory holds rows of 2N/N narrow/wide weight words in the narrow/wide configuration. A second memory holds rows of 2N/N narrow/wide data words in the narrow/wide configuration. An array of neural processing units (NPU) is configured as 2N/N narrow/wide NPUs and to receive the 2N/N narrow/wide weight words of rows from the first memory and to receive the 2N/N narrow/wide data words of rows from the second memory in the narrow/wide configuration. In the narrow configuration, the 2N NPUs perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories. In the wide configuration, the N NPUs perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103320
    Abstract: A neural network unit includes first and second memories that hold rows of respective N weight and data words and provides a row of them to N corresponding neural processing units (NPU), respectively. The N NPUs each have an accumulator and an arithmetic unit that performs a series of multiply operations on pairs of weight words and data words received from the first and second memories to generate a series of products. The arithmetic unit also performs a series of addition operations on the series of products to accumulate an accumulated value in the accumulator. Activation function units (AFU) are each shared by a corresponding plurality of the N NPUs. Each AFU, in a sequential fashion with respect to each NPU of the corresponding plurality of the N NPUs, receives the accumulated value from the NPU and performs an activation function on the accumulated value to generate a result.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103041
    Abstract: Functional units of a processor fetch and decode architectural instructions of an architectural program. The architectural instructions are of an architectural instruction set of the processor. An execution unit includes first and second memories, a register and processing units. The first memory holds data in rows with addresses. The second memory holds non-architectural instructions of a non-architectural program. The architectural and non-architectural instruction sets are distinct. The processing units execute the non-architectural program instructions to read data from the first memory, perform operations on the data read from the first memory to generate results, and to write the results to the first memory. The register holds information that indicates progress made by the non-architectural program during execution. The first memory is also readable and writable by the architectural program. The architectural program uses the information to decide where in the first memory to read/write data.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103307
    Abstract: A processor includes a front-end portion that issues instructions to execution units that execute the issued instructions. A hardware neural network unit (NNU) execution unit includes a first memory that holds data words associated with artificial neural networks (ANN) neuron outputs, a second memory that holds weight words associated with connections between ANN neurons, and a third memory that holds a program comprising NNU instructions that are distinct, with respect to their instruction set, from the instructions issued to the NNU by the front-end portion of the processor. The program performs ANN-associated computations on the data and weight words. A first instruction instructs the NNU to transfer NNU instructions of the program from architectural general purpose registers to the third memory. A second instruction instructs the NNU to invoke the program stored in the third memory.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103319
    Abstract: A neural network unit includes a programmable indicator, a first memory that holds first operands, a second memory that holds second operands, neural processing units (NPU), and activation units. Each NPU has an accumulator and an arithmetic unit that performs a series of multiply operations on pairs of the first and second operands received from the first and second memories to generate a series of products, and a series of addition operations on the series of products to accumulate an accumulated value in the accumulator. The activation units perform activation functions on the accumulated values in the accumulators to generate results. When the indicator specifies the first action, the neural network unit writes to the first memory the results generated by the activation units. When the indicator specifies the second action, the neural network unit writes to the first memory the accumulated values in the accumulators.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170102945
    Abstract: A processor includes an architectural register file loadable with micro-operations by architectural instructions of an architectural instruction set of the processor and an execution unit that executes instructions. The instructions are either architectural instructions or microinstructions into which architectural instructions are translated. The execution unit includes a decoder that decodes the instructions into micro-operations, a mode indicator that indicates one of first and second modes, a pipeline of stages to which are provided micro-operations that control circuits of the stages of the pipeline, and a multiplexer. The multiplexer selects for provision to the pipeline a micro-operation received from the decoder when the mode indicator indicates the first mode and selects for provision to the pipeline a micro-operation received from the architectural register file when the mode indicator indicates the second mode.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170102921
    Abstract: An apparatus includes a plurality of arithmetic logic units each having an accumulator and an integer arithmetic unit that receives and performs integer arithmetic operations on integer inputs and accumulates integer results of a series of the integer arithmetic operations into the accumulator as an integer accumulated value. A register is programmable with an indication of a number of fractional bits of the integer accumulated values and an indication of a number of fractional bits of integer outputs. A first bit width of the accumulator is greater than twice a second bit width of the integer outputs. A plurality of adjustment units scale and saturate the first bit width integer accumulated values to generate the second bit width integer outputs based on the indications of the number of fractional bits of the integer accumulated values and outputs programmed into the register.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103306
    Abstract: An array of N processing units (PU) each has: an accumulator; an arithmetic unit performs an operation on first, second and third inputs to generate a result to store in the accumulator, the first input receives the accumulator output; a weight input is received by the second input to the arithmetic unit; a multiplexed register has first and second data inputs, an output received by the third input to the arithmetic unit, and a control input that controls the data input selection. The multiplexed register output is also received by an adjacent PU's multiplexed register second data input. The N PU's multiplexed registers collectively operate as an N-word rotater when the control input specifies the second data input. Respective first/second memories hold W/D rows of N weight/data words and provide the N weight/data words to the corresponding weight/multiplexed register first data inputs. A sequencer controls the multiplexer and memories.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103300
    Abstract: A neural network unit configurable to first/second/third configurations has N narrow and N wide accumulators, multipliers and adders. Each multiplier performs a narrow/wide multiply on first and second narrow/wide inputs to generate a narrow/wide product. A first adder input receives a corresponding narrow/wide accumulator's output and third input receives a widened corresponding narrow multiplier's narrow product in the third configuration. In the first configuration, each narrow/wide adder performs a narrow/wide addition on the first and second inputs to generate a narrow/wide sum for storage into the corresponding narrow/wide accumulator. In the second configuration, each wide adder performs a wide addition on the first and a second input to generate a wide sum for storage into the corresponding wide accumulator. In the third configuration, each wide adder performs a wide addition on the first, second and third inputs to generate a wide sum for storage into the corresponding wide accumulator.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103305
    Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each of the N words. N processing units (PU) are arranged as N/J mutually exclusive PU groups. Each PU group has an associated OBWG. Each PU includes an accumulator and an arithmetic unit that performs operations on inputs, which include the accumulator output, to generate a first result for accumulation into the accumulator. Activation function units selectively perform an activation function on the accumulator outputs to generate results for provision to the N output buffer words. For each PU group, four of the J PUs and at least one of the activation function units compute an input gate, a forget gate, an output gate and a candidate state of a Long Short Term Memory (LSTM) cell, respectively, for writing to respective first, second, third and fourth words of the associated OBWG.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. Glenn HENRY, Terry PARKS, Kyle T. O'BRIEN
  • Publication number: 20170103304
    Abstract: A neural network unit includes a register programmable with a control value, a plurality of neural processing units (NPU), and a plurality of activation function units (AFU). Each NPU includes an arithmetic logic unit (ALU) that performs arithmetic and logical operations on a sequence of operands to generate a sequence of results and an accumulator into which the ALU accumulates the sequence of results as an accumulated value. Each AFU includes a first module that performs a first function on the accumulated value to generate a first output, a second module that performs a second function on the accumulated value to generate a second output, the first function is distinct from the second function, and a multiplexer that receives the first and second outputs and selects one of the two outputs based on the control value programmed into the register.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103310
    Abstract: A neural network unit (NNU) includes N neural processing units (NPU). Each NPU has an arithmetic unit and an accumulator. First and second multiplexed registers of the N NPUs collectively selectively operate as respective first and second N-word rotaters. First and second memories respectively hold rows of N weight/data words and provide the N weight/data words of a row to corresponding ones of the N NPUs. The NPUs selectively perform: multiply-accumulate operations on rows of N weight words and on a row of N data words, using the second N-word rotater; convolution operations on rows of N weight words, using the first N-word rotater, and on rows of N data words, the rows of weight words being a data matrix, and the rows of data words being elements of a convolution kernel; and pooling operations on rows of N weight words, using the first N-word rotater.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS
  • Publication number: 20170103312
    Abstract: An output buffer holds N words arranged as N/J mutually exclusive output buffer word groups (OBWG) of J words each. N processing units (PU) are arranged as N/J mutually exclusive PU groups each having an associated OBWG. Each PU has an accumulator, arithmetic unit, and first and second multiplexed registers each having at least J+1 inputs. A first input receives a memory operand and the other J inputs receive the J words of the associated OBWG. Each accumulator provides its output to a respective OBWG. Each arithmetic unit performs an operation on the first and second multiplexed register outputs and accumulator output to generate a result for accumulation into the accumulator. A mask input to the output buffer controls which words, if any, of the N words retain their current value or are updated with their respective accumulator output. Each PU group operates as a recurrent neural network LSTM cell.
    Type: Application
    Filed: April 5, 2016
    Publication date: April 13, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS, KYLE T. O'BRIEN
  • Patent number: 9588572
    Abstract: A microprocessor includes a plurality of processing cores, each comprising a respective interrupt request input and a control unit configured to receive a respective synchronization request from each of the plurality of processing cores. The control unit is configured to generate an interrupt request to all of the plurality of processing cores on their respective interrupt request inputs in response to detecting that the control unit has received the respective synchronization request from all of the plurality of processing cores.
    Type: Grant
    Filed: May 19, 2014
    Date of Patent: March 7, 2017
    Assignee: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks
  • Patent number: 9588845
    Abstract: A processor includes a storage configured to receive a snapshot of a state of the processor prior to performing a set of computations in an approximating manner. The processor also includes an indicator that indicates an amount of error accumulated while the set of computations is performed in the approximating manner. When the processor detects that the amount of error accumulated has exceeded an error bound, the processor is configured to restore the state of the processor to the snapshot from the storage.
    Type: Grant
    Filed: October 23, 2014
    Date of Patent: March 7, 2017
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Publication number: 20170003707
    Abstract: A microprocessor includes a plurality of cores, a shared cache memory, and a control unit that individually puts each core to sleep by stopping its clock signal. Each core executes a sleep instruction and responsively makes a respective request of the control unit to put the core to sleep, which the control unit responsively does, and detects when all the cores have made the respective request and responsively wakes up only the last requesting cores. The last core writes back and invalidates the shared cache memory and indicates it has been invalidated and makes a request to the control unit to put the last core back to sleep. The control unit puts the last core back to sleep and continuously keeps the other cores asleep while the last core writes back and invalidates the shared cache memory, indicates the shared cache memory was invalidated, and is put back to sleep.
    Type: Application
    Filed: September 14, 2016
    Publication date: January 5, 2017
    Inventors: G. GLENN HENRY, TERRY PARKS, BRENT BEAN, STEPHAN GASKINS
  • Publication number: 20160380649
    Abstract: A hardware data compressor that compresses an input block of characters by replacing strings of characters in the input block with back pointers to matching strings earlier in the input block. A hash table is used in searching for the matching strings in the input block. A plurality of hash index generators each employs a different hashing algorithm on an initial portion of the strings of characters to be replaced to generate a respective index. The hardware data compressor also includes an indication of a type of the input block of characters. A selector selects the index generated by of one of the plurality hash index generators to index into the hash table based on the type of the input block.
    Type: Application
    Filed: September 7, 2016
    Publication date: December 29, 2016
    Inventors: G. GLENN HENRY, TERRY PARKS