Patents by Inventor David John Simpson
David John Simpson has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260056737Abstract: A matrix multiply engine can include a first operand buffer and a second operand buffer, each of which can store multiple operand elements arranged in rows and columns. A cell array can be formed of cells, where each cell includes a memory and accumulator circuitry to receive operand elements column-wise from each of the first operand buffer and the second operand buffer, to compute a dot product of the received operand elements, and to accumulate the dot product into a corresponding tile state element in the memory. Matrix elements of the operand matrices to be multiplied can be loaded row-wise into rows of the operand buffers and read column-wise into the cells. The number of elements for which a dot product is computed can be selected depending on operand element width.Type: ApplicationFiled: January 31, 2025Publication date: February 26, 2026Applicant: SiFive, Inc.Inventors: David John Simpson, Krste Asanovic, Andrew Waterman, Michael Todd Ruff
-
Publication number: 20260056736Abstract: A matrix multiply engine can include a first operand buffer and a second operand buffer, each of which can store multiple operand elements arranged in rows and columns. A cell array can be formed of cells, where each cell includes a memory and accumulator circuitry to receive operand elements column-wise from each of the first operand buffer and the second operand buffer, to compute a dot product of the received operand elements, and to accumulate the dot product into a corresponding tile state element in the memory. Matrix elements of the operand matrices to be multiplied can be loaded row-wise into rows of the operand buffers and read column-wise into the cells. The number of elements for which a dot product is computed can be selected depending on operand element width.Type: ApplicationFiled: August 23, 2024Publication date: February 26, 2026Applicant: SiFive, Inc.Inventors: David John Simpson, Krste Asanovic, Andrew Waterman, Michael Todd Ruff
-
Publication number: 20250139196Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: ApplicationFiled: December 31, 2024Publication date: May 1, 2025Inventor: David John Simpson
-
Patent number: 12223011Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: GrantFiled: November 27, 2023Date of Patent: February 11, 2025Assignee: MIPS Holding, Inc.Inventor: David John Simpson
-
Publication number: 20250036413Abstract: A system may include a processor having a pipeline, a plurality of counters, and trigger circuitry. The plurality of counters may mount events associated with processing instructions in the pipeline. Counters of the plurality of counters may count different events. The trigger circuitry may trigger a performance measurement for a first instruction after counters of the plurality of counters meet predefined values. Triggering the performance measurement may cause the plurality of counters to reset and then count events associated with processing the first instruction. In some implementations, the trigger circuitry may trigger the performance measurement based on an AND selection and/or an OR selection of multiple counters of the plurality of counters meeting predefined values.Type: ApplicationFiled: July 24, 2023Publication date: January 30, 2025Inventor: David John Simpson
-
Publication number: 20240211397Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing software-initiated work requests are accessed. A plurality of dimensions from a tensor is flattened into a single dimension. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. A direct memory access (DMA) engine coupled to the one or more processor clusters is configured. Addresses are generated based on the unique address space descriptors and the common address space descriptor. The plurality of dimensions can be summed to generate a single address. Memory is accessed using two or more of the addresses that were generated. The addresses are used to enable DMA access.Type: ApplicationFiled: February 12, 2024Publication date: June 27, 2024Inventors: David John Simpson, Stephen Curtis Johnson, Richard Douglas Trauben
-
Patent number: 11934308Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing software-initiated work requests are accessed. A plurality of dimensions from a tensor is flattened into a single dimension. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. A direct memory access (DMA) engine coupled to the one or more processor clusters is configured. Addresses are generated based on the unique address space descriptors and the common address space descriptor. The plurality of dimensions can be summed to generate a single address. Memory is accessed using two or more of the addresses that were generated. The addresses are used to enable DMA access.Type: GrantFiled: September 29, 2020Date of Patent: March 19, 2024Inventors: David John Simpson, Stephen Curtis Johnson, Richard Douglas Trauben
-
Publication number: 20240061704Abstract: Techniques for data manipulation using processor graph execution using interrupt conservation are disclosed. Processing elements are configured to implement a data flow graph. The processing elements comprise a multilayer graph execution engine. A data engine is loaded with computational parameters for the multilayer graph execution engine. The data engine is coupled to the multilayer graph execution engine, and the computational parameters supply layer-by-layer execution data to the multilayer graph execution engine for data flow graph execution. A first command FIFO is used for loading the data engine with computational parameters, and a second command FIFO is used for loading the multilayer graph execution engine with layer definition data. An input image is provided for a first layer of the multilayer graph execution engine. The data flow graph is executed using the input image and the computational parameters.Type: ApplicationFiled: October 31, 2023Publication date: February 22, 2024Inventor: David John Simpson
-
Patent number: 11880426Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: GrantFiled: July 31, 2022Date of Patent: January 23, 2024Inventor: David John Simpson
-
Patent number: 11836518Abstract: Techniques for data manipulation using processor graph execution using interrupt conservation are disclosed. Processing elements are configured to implement a data flow graph. The processing elements comprise a multilayer graph execution engine. A data engine is loaded with computational parameters for the multilayer graph execution engine. The data engine is coupled to the multilayer graph execution engine, and the computational parameters supply layer-by-layer execution data to the multilayer graph execution engine for data flow graph execution. A first command FIFO is used for loading the data engine with computational parameters, and a second command FIFO is used for loading the multilayer graph execution engine with layer definition data. An input image is provided for a first layer of the multilayer graph execution engine. The data flow graph is executed using the input image and the computational parameters.Type: GrantFiled: December 10, 2021Date of Patent: December 5, 2023Inventor: David John Simpson
-
Publication number: 20220366010Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: ApplicationFiled: July 31, 2022Publication date: November 17, 2022Inventor: David John Simpson
-
Patent number: 11481472Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: GrantFiled: July 30, 2020Date of Patent: October 25, 2022Inventor: David John Simpson
-
Publication number: 20220197692Abstract: Techniques for data manipulation using processor graph execution using interrupt conservation are disclosed. Processing elements are configured to implement a data flow graph. The processing elements comprise a multilayer graph execution engine. A data engine is loaded with computational parameters for the multilayer graph execution engine. The data engine is coupled to the multilayer graph execution engine, and the computational parameters supply layer-by-layer execution data to the multilayer graph execution engine for data flow graph execution. A first command FIFO is used for loading the data engine with computational parameters, and a second command FIFO is used for loading the multilayer graph execution engine with layer definition data. An input image is provided for a first layer of the multilayer graph execution engine. The data flow graph is executed using the input image and the computational parameters.Type: ApplicationFiled: December 10, 2021Publication date: June 23, 2022Inventor: David John Simpson
-
Patent number: 11227030Abstract: Techniques for data manipulation using a matrix multiplication engine using pipelining are disclosed. A first and a second matrix are obtained for matrix multiplication. A first matrix multiply-accumulate (MAC) unit is configured, where a first matrix element and a second matrix element are presented to the MAC unit on a first cycle. A second MAC unit is configured in pipelined fashion, where the first element of the first matrix and a second element of the second matrix are presented to the second MAC unit on a second cycle, and where a second element of the first matrix and the first element of the second matrix are presented to the first MAC unit on the second cycle. Additional MAC units are further configured within the processor in pipelined fashion. Multiply-accumulate operations are executed in pipelined fashion on each of n MAC units over additional k sets of m cycles.Type: GrantFiled: March 31, 2020Date of Patent: January 18, 2022Assignee: Wave Computing, Inc.Inventor: David John Simpson
-
Patent number: 10997102Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing software-initiated work requests are accessed. A direct memory access (DMA) engine, coupled to the one or more processor clusters, is configured, wherein the DMA engine employs address generation across a plurality of tensor dimensions. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. DMA addresses are generated based on the unique address space descriptors and the common address space descriptor. Memory using two or more of the DMA addresses that were generated is accessed, where the two or more DMA addresses enable processing within the one or more processor clusters.Type: GrantFiled: August 12, 2020Date of Patent: May 4, 2021Assignee: Wave Computing, Inc.Inventors: David John Simpson, Richard Douglas Trauben, Stephen Curtis Johnson
-
Publication number: 20210011849Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing software-initiated work requests are accessed. A plurality of dimensions from a tensor is flattened into a single dimension. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. A direct memory access (DMA) engine coupled to the one or more processor clusters is configured. Addresses are generated based on the unique address space descriptors and the common address space descriptor. The plurality of dimensions can be summed to generate a single address. Memory is accessed using two or more of the addresses that were generated. The addresses are used to enable DMA access.Type: ApplicationFiled: September 29, 2020Publication date: January 14, 2021Inventors: David John Simpson, Stephen Curtis Johnson, Richard Douglas Trauben
-
Publication number: 20200387564Abstract: Techniques for data manipulation using integer matrix multiplication using pipelining are disclosed. A first integer matrix with dimensions m×k and a second integer matrix with dimensions k×n are obtained for matrix multiplication within a processor. The first and second integer matrices employ a two's complement variable radix point data representation. The first and second integer matrices are distilled into (j×j) submatrices. A first variable radix point format and an initial value for an accumulator register are configured dynamically. A first variable radix point format is configured dynamically for the first integer matrix and a second variable radix point format is configured dynamically for the second integer matrix. Multiply-accumulate operations are executed in a pipelined fashion on the (j×j) submatrices of the first integer matrix and the second integer matrix, where a third variable radix point format is configured for the result.Type: ApplicationFiled: July 30, 2020Publication date: December 10, 2020Inventor: David John Simpson
-
Publication number: 20200371978Abstract: Techniques for data manipulation using processor cluster address generation are disclosed. One or more processor clusters capable of executing software-initiated work requests are accessed. A direct memory access (DMA) engine, coupled to the one or more processor clusters, is configured, wherein the DMA engine employs address generation across a plurality of tensor dimensions. A work request address field is parsed, where the address field contains unique address space descriptors for each of the plurality of dimensions, along with a common address space descriptor. DMA addresses are generated based on the unique address space descriptors and the common address space descriptor. Memory using two or more of the DMA addresses that were generated is accessed, where the two or more DMA addresses enable processing within the one or more processor clusters.Type: ApplicationFiled: August 12, 2020Publication date: November 26, 2020Inventors: David John Simpson, Richard Douglas Trauben, Stephen Curtis Johnson
-
Publication number: 20200311183Abstract: Techniques for data manipulation using a matrix multiplication engine using pipelining are disclosed. A first and a second matrix are obtained for matrix multiplication. A first matrix multiply-accumulate (MAC) unit is configured, where a first matrix element and a second matrix element are presented to the MAC unit on a first cycle. A second MAC unit is configured in pipelined fashion, where the first element of the first matrix and a second element of the second matrix are presented to the second MAC unit on a second cycle, and where a second element of the first matrix and the first element of the second matrix are presented to the first MAC unit on the second cycle. Additional MAC units are further configured within the processor in pipelined fashion. Multiply-accumulate operations are executed in pipelined fashion on each of n MAC units over additional k sets of m cycles.Type: ApplicationFiled: March 31, 2020Publication date: October 1, 2020Inventor: David John Simpson
-
Patent number: 8166350Abstract: A computer-readable medium is configured to receive a report processing request at a hierarchical report processor. The hierarchical report processor includes a parent process and at least one child process executing on a single processing unit, and is configured to process the report processing request as a task on the single processing unit.Type: GrantFiled: March 5, 2010Date of Patent: April 24, 2012Assignee: Business Objects Software Ltd.Inventors: David John Simpson, Philipp Ziegler