Patents by Inventor Christopher L. Mills

Christopher L. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11580353
    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: February 14, 2023
    Assignee: Apple Inc.
    Inventor: Christopher L. Mills
  • Publication number: 20230018248
    Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
    Type: Application
    Filed: September 14, 2022
    Publication date: January 19, 2023
    Inventors: Christopher L. Mills, Sung Hee Park
  • Patent number: 11537864
    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In a reduction mode, the planar engine circuit may process values arranged in one or more dimensions of input to generate a reduced value. The reduced values across multiple input data may be accumulated. The planar engine circuit may program a filter circuit as a reduction tree to gradually reduce the data into a reduced value. The reduction operation reduces the size of one or more dimensions of a tensor.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: December 27, 2022
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Patent number: 11537838
    Abstract: Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: December 27, 2022
    Assignee: Apple Inc.
    Inventors: Erik K. Norden, Liran Fishel, Sung Hee Park, Jaewon Shin, Christopher L. Mills, Seungjin Lee, Fernando A. Mujica
  • Publication number: 20220398440
    Abstract: Embodiments of the present disclosure relate to circular buffers in a neural processor circuit. The neural processor circuit includes multiple neural engine circuits and a data processor circuit coupled to at least one of the neural engine circuits. The at least one neural engine circuit performs at least convolution operations. The data processor circuit includes a circular buffer, and a flow control circuit coupled to the circular buffer. The flow control circuit generates at least one addressing parameter that defines wrapping of data in the circular buffer. The circular buffer controls data flow in the neural processor circuit by storing first data associated with the at least one neural engine circuit so that the first data is wrapped around in the circular buffer. An addressing layout of the first data wrapped around in the circular buffer is defined by the at least one addressing parameter.
    Type: Application
    Filed: June 10, 2021
    Publication date: December 15, 2022
    Inventor: Christopher L. Mills
  • Patent number: 11513799
    Abstract: Embodiments of the present disclosure relate to chained buffers in a neural processor circuit. The neural processor circuit includes multiple neural engines, a planar engine, a buffer memory, and a flow control circuit. At least one neural engine operates as a first producer of first data or a first consumer of second data. The planar engine operates as a second consumer receiving the first data from the first producer or a second producer sending the second data to the first consumer. Data flow between the at least one neural engine and the planar engine is controlled using at least a subset of buffers in the buffer memory operating as at least one chained buffer that chains flow of the first data and the second data between the at least one neural engine and the planar engine.
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: November 29, 2022
    Assignee: Apple Inc.
    Inventor: Christopher L. Mills
  • Patent number: 11487846
    Abstract: Embodiments relate to a neural processor circuit including a plurality of neural engine circuits, a data buffer, and a kernel fetcher circuit. At least one of the neural engine circuits is configured to receive matrix elements of a matrix as at least the portion of the input data from the data buffer over multiple processing cycles. The at least one neural engine circuit further receives vector elements of a vector from the kernel fetcher circuit, wherein each of the vector elements is extracted as a corresponding kernel to the at least one neural engine circuit in each of the processing cycles. The at least one neural engine circuit performs multiplication between the matrix and the vector as a convolution operation to produce at least one output channel of the output data.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: November 1, 2022
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Erik K. Norden, Sung Hee Park
  • Patent number: 11475283
    Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: October 18, 2022
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Sung Hee Park
  • Publication number: 20220237439
    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor circuit also includes a data processor circuit that is coupled to one or more neural engine. The data processor circuit receives the output data from the neural engine and generates a branching command from the output data. The neural processor circuit further includes a task manager that is coupled to the data processor circuit. The task manager receives the branching command from the data processor circuit. The task manager enqueues one of two or more segment branches according to the received branching command. The two or more segment branches are subsequent to a pre-branch task segment that includes the pre-branch task. The task manager transmits a task from the selected one of the segment branches to data processor circuit to perform the task.
    Type: Application
    Filed: January 22, 2021
    Publication date: July 28, 2022
    Inventors: Kenneth W. Waters, Christopher L. Mills
  • Publication number: 20220237438
    Abstract: A neural processor includes neural engines for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural processor also includes a data processor circuit coupled to external system memory. The data processor circuit includes a buffer for storing the output data from the neural engines. The neural processor further includes a task manager coupled to the data processor circuit. The task manager receives a context-switch task. The context-switch task specifies a switch of the data processor circuit from handling an outgoing task to an incoming task. The task manager sends configuration data of the context-switch task to cause the data processor circuit to transmit the output data corresponding to the outgoing task from the buffer to the external system memory. The data processor circuit also fetches data corresponding to the incoming task from the external system memory to the buffer.
    Type: Application
    Filed: January 22, 2021
    Publication date: July 28, 2022
    Inventors: Christopher L. Mills, Kenneth W. Waters
  • Publication number: 20220222510
    Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a convolution operation on input data in a first mode and a parallel sorting operation on input data in a second mode. The neural engine circuit includes a plurality of operation circuits and an accumulator circuit coupled to the plurality of operation circuits. The plurality of operation circuits receives input data. In the first mode, the plurality of operation circuits performs multiply-add operations of a convolution on the input data using a kernel. In the second mode, the plurality of operation circuits performs a portion of a parallel sorting operation on the input data. In the first mode, the accumulator circuit receives and stores first results of the multiply-add operations. In the second mode, the accumulator circuit receives and stores second results of the parallel sorting operation.
    Type: Application
    Filed: January 13, 2021
    Publication date: July 14, 2022
    Inventors: Christopher L. Mills, Sung Hee Park
  • Publication number: 20220222509
    Abstract: A neural processor includes one or more neural engine circuits for performing convolution operations on input data corresponding to one or more tasks to generate output data. The neural engine circuits process the input data having a power-of-two (P2) shape. The neural processor circuit also includes a data processor circuit. The data processor circuit fetches source data having a non-power-of-two (NP2) shape. The source data may correspond to data of a machine learning model. The data processor circuit also reshapes the source data to generate reshaped source data with the P2 shape. The data processor circuit further sends the reshaped source data to the one or more neural engine circuits as the input data for performing convolution operations. In some cases, the data processor circuit may also perform padding on the source data before the source data is reshaped to the P2 shape.
    Type: Application
    Filed: January 13, 2021
    Publication date: July 14, 2022
    Inventor: Christopher L. Mills
  • Publication number: 20220156575
    Abstract: Embodiments of the present disclosure relate to a tensor access operation circuit in a neural processor circuit. The neural processor circuit further includes a data processor circuit and at least one neural engine circuit. The tensor access operation circuit indirectly accesses at least a region of a source tensor in a system memory having a rank, and maps one or more source components of the source tensor into an input tensor having another rank. The data processor circuit stores an output version of the input tensor obtained from the tensor access operation circuit and sends the output version of the input tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
    Type: Application
    Filed: November 19, 2020
    Publication date: May 19, 2022
    Inventor: Christopher L. Mills
  • Publication number: 20220138553
    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
    Type: Application
    Filed: October 30, 2020
    Publication date: May 5, 2022
    Inventor: Christopher L. Mills
  • Publication number: 20220019875
    Abstract: Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.
    Type: Application
    Filed: September 13, 2021
    Publication date: January 20, 2022
    Inventors: Liran Fishel, Sung Hee Park, Christopher L. Mills
  • Patent number: 11200490
    Abstract: Embodiments relate to a neural processor circuit including neural engines, a buffer, and a kernel access circuit. The neural engines perform convolution operations on input data and kernel data to generate output data. The buffer is between the neural engines and a memory external to the neural processor circuit. The buffer stores input data for sending to the neural engines and output data received from the neural engines. The kernel access circuit receives one or more kernels from the memory external to the neural processor circuit. The neural processor circuit operates in one of multiple modes, at least one of which divides a convolution operation into multiple independent convolution operations for execution by the neural engines.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: December 14, 2021
    Assignee: Apple Inc.
    Inventors: Sung Hee Park, Seungjin Lee, Christopher L. Mills
  • Publication number: 20210319290
    Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.
    Type: Application
    Filed: April 9, 2020
    Publication date: October 14, 2021
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Patent number: 11120327
    Abstract: Embodiments relate to a neural processor circuit that includes a kernel access circuit and multiple neural engine circuits. The kernel access circuit reads compressed kernel data from memory external to the neural processor circuit. Each neural engine circuit receives compressed kernel data from the kernel access circuit. Each neural engine circuit includes a kernel extract circuit and a kernel multiply-add (MAD) circuit. The kernel extract circuit extracts uncompressed kernel data from the compressed kernel data. The kernel MAD circuit receives the uncompressed kernel data from the kernel extract circuit and performs neural network operations on a portion of input data using the uncompressed kernel data.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: September 14, 2021
    Assignee: APPLE INC.
    Inventors: Liran Fishel, Sung Hee Park, Christopher L. Mills
  • Publication number: 20210271958
    Abstract: Embodiments relate to a neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
    Type: Application
    Filed: March 2, 2020
    Publication date: September 2, 2021
    Inventors: Christopher L. Mills, Kenneth W. Waters
  • Publication number: 20210241079
    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.
    Type: Application
    Filed: February 4, 2020
    Publication date: August 5, 2021
    Inventors: CHRISTOPHER L. MILLS, KENNETH W. WATERS, YOUCHANG KIM