Patents by Inventor Christopher L. Mills

Christopher L. Mills has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11972348
    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
    Type: Grant
    Filed: October 30, 2020
    Date of Patent: April 30, 2024
    Assignee: APPLE INC.
    Inventor: Christopher L. Mills
  • Patent number: 11934941
    Abstract: A neural processor circuit includes one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
    Type: Grant
    Filed: November 17, 2022
    Date of Patent: March 19, 2024
    Assignee: APPLE INC.
    Inventors: Christopher L. Mills, Kenneth W. Waters
  • Publication number: 20240037399
    Abstract: Embodiments of the present disclosure relate to a texture unit circuit in a neural processor circuit. The neural processor circuit includes a tensor access operation circuit with the texture unit circuit, a data processor circuit, and at least one neural engine circuit. The texture unit circuit fetches a source tensor from a system memory by referencing an index tensor in the system memory representing indexing information into the source tensor. The data processor circuit stores an output version of the source tensor obtained from the tensor access operation circuit and sends the output version of the source tensor as multiple of units of input data to the at least one neural engine circuit. The at least one neural engine circuit performs at least convolution operations on the units of input data and at least one kernel to generate output data.
    Type: Application
    Filed: October 10, 2023
    Publication date: February 1, 2024
    Applicant: Apple Inc.
    Inventor: Christopher L. MILLS
  • Publication number: 20240028894
    Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
    Type: Application
    Filed: July 27, 2023
    Publication date: January 25, 2024
    Applicant: Apple Inc.
    Inventor: Christopher L. MILLS
  • Patent number: 11880757
    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
    Type: Grant
    Filed: January 11, 2023
    Date of Patent: January 23, 2024
    Assignee: APPLE INC.
    Inventor: Christopher L Mills
  • Patent number: 11853868
    Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
    Type: Grant
    Filed: September 14, 2022
    Date of Patent: December 26, 2023
    Assignee: APPLE INC.
    Inventors: Christopher L. Mills, Sung Hee Park
  • Patent number: 11783174
    Abstract: Embodiments of the present disclosure relate to splitting input data into smaller units for loading into a data buffer and neural engines in a neural processor circuit for performing neural network operations. The input data of a large size is split into slices and each slice is again split into tiles. The tile is uploaded from an external source to a data buffer inside the neural processor circuit but outside the neural engines. Each tile is again split into work units sized for storing in an input buffer circuit inside each neural engine. The input data stored in the data buffer and the input buffer circuit is reused by the neural engines to reduce re-fetching of input data. Operations of splitting the input data are performed at various components of the neural processor circuit under the management of rasterizers provided in these components.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: October 10, 2023
    Assignee: Apple Inc.
    Inventor: Christopher L. Mills
  • Publication number: 20230289291
    Abstract: A neural processor may include a system memory access circuit coupled to a system memory. The system memory access circuit is configured to fetch, from the system memory, first input data of a first task associated with a neural network. The neural processor may also include neural engines coupled to the system memory access circuit. The neural engines are configured to perform convolution operations on the first input data in a first set of operating cycles. The neural processor may further include a cache access circuit coupled to a cache. The cache access circuit is configured to instruct the cache to prefetch from the system memory, during the first set of operating cycles corresponding to the first task, second input data of a second task of the neural network. The second task is scheduled for processing in a second set of operating cycles after the first set of operating cycles.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Seungjin Lee, Jaewon Shin, Christopher L Mills
  • Publication number: 20230229902
    Abstract: Embodiments relate to a neural engine circuit of a neural network processor circuit that performs a parallel sorting operation on input data. The neural engine circuit includes operation circuits and an accumulator circuit coupled to the outputs of the operation circuits. Each of the operation circuits operates in parallel and is configured to compare a field of a first record of a first set of records and a corresponding field of a second record of a second set of records to generate a comparison result on values in the field and the corresponding field. The accumulator circuit includes a record store storing records that are involved in the parallel sorting operation and a sideband register that stores the comparison results generated by the operation circuits.
    Type: Application
    Filed: January 19, 2022
    Publication date: July 20, 2023
    Inventor: Christopher L. Mills
  • Publication number: 20230206051
    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.
    Type: Application
    Filed: March 10, 2023
    Publication date: June 29, 2023
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Publication number: 20230169308
    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
    Type: Application
    Filed: January 11, 2023
    Publication date: June 1, 2023
    Inventor: Christopher L. Mills
  • Publication number: 20230169316
    Abstract: Embodiments of the present disclosure relate to indexing in a neural processor circuit. The neural processor circuit includes multiple neural engine circuits and a data processor circuit directly coupled to at least one of the neural engine circuits. The at least one neural engine circuit performs a convolution operation on input data to generate output data. The data processor circuit includes a buffer memory and an indexing circuit coupled to the buffer memory. The buffer memory stores an index tensor and the output data as a source tensor. The indexing circuit fetches a portion of the source tensor from the buffer memory by referencing the index tensor representing indexing information into the portion of the source tensor.
    Type: Application
    Filed: November 30, 2021
    Publication date: June 1, 2023
    Inventor: Christopher L. Mills
  • Publication number: 20230128047
    Abstract: Embodiments of the present disclosure relate to binary comparison operations (e.g., Boolean operations) and reduction operations in a neural processor circuit to enable implementation of conditional operations without software control. The neural processor circuit includes a neural engine circuit and a planar engine circuit coupled to the neural engine circuit. The neural engine circuit performs a convolution operation to generate output data. The planar engine circuit includes a binary comparator circuit and a filter circuit coupled to the binary comparator circuit. The binary comparator circuit performs a binary comparison operation on a tensor from the output data to generate a conditional tensor. The filter circuit performs a reduction operation for each patch of the conditional tensor to generate a respective reduced value of multiple reduced values associated with a corresponding channel of multiple channels of the conditional tensor.
    Type: Application
    Filed: October 25, 2021
    Publication date: April 27, 2023
    Inventor: Christopher L. Mills
  • Publication number: 20230121448
    Abstract: Embodiments of the present disclosure relate to a reduction operation in a neural processor circuit where results of the reduction operation are retained for multiple post-processing operations. The neural processor circuit includes neural engine circuits and a planar engine circuit coupled to the neural engine circuits. At least one neural engine circuit performs a convolution operation to generate output data. The planar engine circuit includes a filter circuit and a line buffer coupled to the filter circuit. The filter circuit performs a reduction operation for each patch of a tensor from the output data to generate a respective reduced value associated with a corresponding channel of the tensor. The line buffer stores reduced values each being associated with a respective channel of the tensor. The line buffer retains the reduced values for a defined number of operating cycles as indicated by a refresh flag defining resetting of the line buffer.
    Type: Application
    Filed: October 19, 2021
    Publication date: April 20, 2023
    Inventors: Christopher L. Mills, Youchang Kim
  • Patent number: 11630991
    Abstract: Embodiments relate to a neural processor that includes one or more neural engine circuits and planar engine circuits. The neural engine circuits can perform convolution operations of input data with one or more kernels to generate outputs. The planar engine circuit is coupled to the plurality of neural engine circuits. A planar engine circuit can be configured to multiple modes. In an elementwise mode, the planar engine circuit may combine two tensors by performing operations element by element. The planar engine circuit may support elementwise operation for two tensors that are in different sizes and ranks. The planar engine circuit may perform a broadcasting operation to duplicate one or more values across one or more channels to make a smaller tensor matching the size of the larger tensor.
    Type: Grant
    Filed: February 4, 2020
    Date of Patent: April 18, 2023
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Publication number: 20230099652
    Abstract: Embodiments relate to a neural processor circuit with scalable architecture for instantiating one or more neural networks. The neural processor circuit includes a data buffer coupled to a memory external to the neural processor circuit, and a plurality of neural engine circuits. To execute tasks that instantiate the neural networks, each neural engine circuit generates output data using input data and kernel coefficients. A neural processor circuit may include multiple neural engine circuits that are selectively activated or deactivated according to configuration data of the tasks. Furthermore, an electronic device may include multiple neural processor circuits that are selectively activated or deactivated to execute the tasks.
    Type: Application
    Filed: November 21, 2022
    Publication date: March 30, 2023
    Inventors: Erik Norden, Liran Fishel, Sung Hee Park, Jaewon Shin, Christopher L. Mills, Seungjin Lee, Fernando A. Mujica
  • Publication number: 20230081023
    Abstract: Embodiments relate to a neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
    Type: Application
    Filed: November 17, 2022
    Publication date: March 16, 2023
    Inventors: Christopher L. Mills, Kenneth W. Waters
  • Patent number: 11604975
    Abstract: A neural processor includes one or more neural engine circuits and a planar engine circuit. The neural engine circuits can perform convolution operations of first input data with one or more kernels to generate a first output. The planar engine circuit receives second input data that corresponds to a version of the first input data. The planar engine circuit also receives third input data that includes fourth input data and fifth input data stored together in a dimension of third input data. The planar engine circuit performs a first elementwise operation between a version of the second input data and a version of the fourth input data to generate intermediate data. The planar engine circuit performs a second elementwise operation between the intermediate data and a version of the fifth input data to generate a second output.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: March 14, 2023
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Kenneth W. Waters, Youchang Kim
  • Patent number: 11599780
    Abstract: A neural processor circuit including one or more planar engine circuits that perform non-convolution operations in parallel with convolution operations performed by one or more neural engine circuits. The neural engine circuits perform the convolution operations on neural input data corresponding to one or more neural engine tasks to generate neural output data. The planar engine circuits perform non-convolution operations on planar input data corresponding to one or more planar engine tasks to generate planar output data. A data processor circuit in the neural processor circuit addresses data dependency between the one or more neural engine tasks and the one or more planar engine tasks by controlling reading of the neural output data as the planar input data by the planar engine circuits or reading of the planar output data as the neural input data by the neural engine circuits.
    Type: Grant
    Filed: March 2, 2020
    Date of Patent: March 7, 2023
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Kenneth W. Waters
  • Patent number: 11580353
    Abstract: Embodiments relate to a neural engine circuit that includes an input buffer circuit, a kernel extract circuit, and a multiply-accumulator (MAC) circuit. The MAC circuit receives input data from the input buffer circuit and a kernel coefficient from the kernel extract circuit. The MAC circuit contains several multiply-add (MAD) circuits and accumulators used to perform neural networking operations on the received input data and kernel coefficients. MAD circuits are configured to support fixed-point precision (e.g., INT8) and floating-point precision (FP16) of operands. In floating-point mode, each MAD circuit multiplies the integer bits of input data and kernel coefficients and adds their exponent bits to determine a binary point for alignment. In fixed-point mode, input data and kernel coefficients are multiplied. In both operation modes, the output data is stored in an accumulator, and may be sent back as accumulated values for further multiply-add operations in subsequent processing cycles.
    Type: Grant
    Filed: May 4, 2018
    Date of Patent: February 14, 2023
    Assignee: Apple Inc.
    Inventor: Christopher L. Mills