IMAGE STREAM PIPELINE CONTROLLER FOR DEPLOYING IMAGE PRIMITIVES TO A COMPUTATION FABRIC
According to some embodiments, an image pipeline controller may determine an image stream having a plurality of image primitives to be executed. Each image primitive may be, for example, associated with an image algorithm and a set of primitive attributes. The image pipeline controller may then automatically deploy the set of image primitives to an image computation fabric based at least in part on primitive attributes.
Latest Intel Patents:
- SYSTEMS AND METHODS FOR ECHO PREVENTION USING WIRELESS COMMUNICATIONS
- METHOD AND SYSTEM OF EVALUATION OF AUDIO DEVICE SUSCEPTIBILITY TO ULTRASONIC ATTACK
- ENHANCED PERFORMANCE AND EFFICIENCY OF VERSATILE VIDEO CODING DECODER PIPELINE FOR ADVANCED VIDEO CODING FEATURES
- SCHEDULING OF THREADS FOR EXECUTION UTILIZING LOAD BALANCING OF THREAD GROUPS
- ACKNOWLEDGMENT PROTOCOL PROTECTION
The present application is a continuation of U.S. patent application Ser. No. 13/994,013, filed Jun. 13, 2013, which is a National Stage Entry of International PCT Application PCT/US2011/067487 filed Dec. 28, 2011 which both applications are incorporated herein by reference.
TECHNICAL FIELDMany devices include one or more image sensors and/or image displays, and an image processing unit may facilitate the processing of data coming from the sensor, being provided to the display, and/or is otherwise being utilized by applications running on the device. For example, a smart phone might include a number of different cameras and a touch screen. The image processing unit may include an image computation fabric having a number of different components to process image information.
BACKGROUNDIn some cases, the image processing unit may execute a series of image primitives to create output image data (e.g., to be sent to a touch screen) based on input image data (e.g., received from a smart phone's camera). The image primitives may be, for example, associated with an image primitive library and might include, for example, sensor primitives, calibration primitives, optics primitives, etc.
Typically, an application executing in connection the image processing unit determines which image primitives will be executed by the various components of the image computation fabric. For example, the application might determine that a filter primitive will be executed by fixed function hardware. Such an approach, however, can have several disadvantages. For example, the application might be unaware that another application is also attempting to use the same fixed function hardware. As a result, an application may “stall” or need to wait until the fixed function hardware becomes free, and the performance of the system may be degraded.
Moreover, the substantial number and relative complexity of image primitives (and the fact that they may operate differently in connection with different components of different image execution fabrics) may result in substantial software development costs and inhibit innovation for application software developers (who may be forced to create customized software for each new platform).
The device 100 illustrated in
All systems and processes discussed herein may be embodied in program code stored on one or more non-transitory computer-readable media. Such media may include, for example, a solid state Random Access Memory (“RAM”) or Read Only Memory (“ROM”) storage units. Embodiments are therefore not limited to any specific combination of hardware and software.
The image processing unit 200 may execute a series of image primitives 220 to create output image data (e.g., to be sent to a touch screen) based on input image data (e.g., received from a smart phone's camera). The image primitives 220 are associated with an image primitive library stored in an image primitive database 260 and might include, for example, sensor primitives, calibration primitives, optics primitives, lighting primitives, depth primitives, segmentation primitives, color primitives, filter primitives, and/or three dimensional depth primitives.
The set of image primitives 200 executed on the stream of image information may represent a set of resources used by an application to process the image data. For example, an imaging application might require a small set of image primitives 200 to provide processing to implement specific high level algorithms, such as face recognition, gesture recognition, etc. That is, the image primitives 220 may be used together to process image data and achieve higher level goals. The image primitives 220 may represent building blocks for larger algorithms, and may be resources which must be managed and made available to multiple simultaneous imaging and visual computing applications.
A set of image primitives 220 may be associated with many different types of image algorithms, such as those associated with pixel correction, artifact removal, histogram information, scaling functions, face recognition, visual object recognition, visual scene analysis, machine vision, gesture recognition, and/or depth map calculations. Moreover, different types of image primitives 220 might be associated with, by way of examples only, camera sensor format processing (Bayer Red Green Blue (RGB), Aptina™ RGB, Kodak™ RGBW, etc.), camera sensor dimensions (1080p, etc.), camera sensor frame rates, calibrations (Auto White Balance, Auto Shutter, Auto Focus, etc.), dead pixel detection and correction, lighting controls, optics controls, three dimensional depth sensor controls (structured light, stereo triangulation, etc.), color conversion (RGB, YUV, HIV, etc.), Look-Up Table (LUT) processing and value substitution, boolean operations, segmenting an image into various component parts (foreground, background, objects, etc.), filters (sharpen, blur, media, etc.), edge detection (Sobel, Roberts, Prewitt, etc.), point operations (Pixel Math, etc.), and/or domain processing (Fourier, HAAR, Karhunen-Loeve, Slant Transform, etc.)
Typically, an application executing in connection the image processing unit 200 determines which image primitives 220 will be executed by the various components 212, 214, 216, 218 of the image computation fabric 210. For example, the application might determine that a filter primitive will be executed by the fixed function hardware 212. Such an approach, however, can have several disadvantages. For example, the application might be unaware that another application is also attempting to use the fixed function hardware 212. As a result, an application may “stall” or need to wait until the fixed function hardware becomes free, and the performance of the system may be degraded.
Moreover, the substantial number and relative complexity of image primitives 220 (and the fact that they may operate differently in connection with different components of different image execution fabrics 210) may result in substantial software development costs and inhibit innovation for application software developers (who may be forced to create customized software for each new platform).
Thus, embodiments provided herein may provide for improved deployment of image primitives to a computation fabric. In particular,
The image pipeline controller 330 may deploy image primitives 320 (e.g. to various components of the image computation fabric 310) in a number of different ways. For example,
At 402, an image pipeline controller may determine an image stream having a plurality of image primitives to be executed, each image primitive being associated with an image algorithm and a set of primitive attributes. The image stream might be, for example, received from a video camera. At 404, the image pipeline controller may automatically deploy the set of image primitives to an image computation fabric based at least in part on primitive attributes.
As used herein, a primitive “attribute” may be any information that describes aspects of the operation or execution of the image primitive. One skilled in the art will recognize that a wide range of attributes may be assigned to each primitive or group of primitives within a segment, thus the attributes listed herein serve to illustrate the concepts of this invention and therefore do not limit the applicability of this invention to incorporate other useful attributes beside those listed.
For example,
The image primitives may be stored within a primitive attribute database or other data structure and used by a compiler or translator that is accessed by a pipeline controller to interpret the attributes and execute primitives in accordance with the attributes. For example,
According to some embodiments, the image pipeline controller 630 and/or primitive attribute database 640 or other data structure at run-time may read the primitive attributes of each image primitive 620 to determine the best way to run a workload within a given image computation fabric 610. For example, an image primitive 620 may be available both in fixed function hardware 612 and a software proxy as defined in the primitive attributes, in which case an application might choose which type should be executed to achieve a performance versus wattage target.
According to some embodiments, when a software application has not specified how to use an image primitive 620 via a primitive attribute, the image pipeline controller 630 and/or primitive attribute database 640 or other data structure may be used by the various components comprising the run-time framework within this invention to automatically attempt to optimize performance. According to some embodiments, the run-time framework may automatically attempt to optimize performance of primitives across a compute fabric according to a-priori defined attributes of each primitive, where primitives may be grouped into segments which may be executed in-order or out-of-order according to their attributes. Moreover, as described with respect to
For example,
The sequencer component 734 may execute a sequencing algorithm to order the image primitives 720 within the image stream for an in-order image primitive execution in a pipeline sequence. According to some embodiments, the image primitives 720 may be associated with an original order, and the execution of the image primitives 720 may be performed for at least some of the image primitives 720 in an order different than the original order for an “out-of-order” primitive execution in a pipeline sequence. For example, at run time the sequencer component 734 may order the image primitives 720 to execute efficiently within the image computation fabric 710. For example, portions of an image stream may allow out-of-order image primitive execution (and may have no dependencies) and such image primitives 720 may be candidates for parallel execution across the components of the image computation fabric 710.
A resource manager and run time resource lock mechanism may be responsible for determining the availability of assets or components of the image computation fabric 710, locking assets for exclusive use by a pipeline or application, monitoring asset states, and/or freeing assets for use by other pipelines or application. Such an approach may permit, for example, multiple simultaneous applications to use the components of the image computation fabric 710. For example,
According to some embodiments, a tile processor 836 in the image pipeline controller 830 may determine whether a tile subset of image data is to be deployed to the image computation fabric 810 based at least in part on a primitive attribute in the primitive attribute database 840. For example, a primitive attribute might indicate that a convolution image primitive in an image stream 820 can be divided into tiles that can be separately processed by components of the image computation fabric 810 (e.g., to allow for more efficient execution). That is, at run time the tile processor 835 may manage dividing an image stream 820 being sent through the pipeline into tiled regions when possible and/or specified by an application. The tiling technique may let an image be processed in smaller tiles that fit inside a cache line, enabling swap-free access to the data with little or no page faults. This may speed up performance as compared to processing each image primitive over an entire image, sequentially.
According to some embodiments, a load distributor and balancer 838 in the image pipeline controller 830 may execute a load-balancing algorithm between image primitives in different image streams 820. For example, at run time the load distributor and balancer 838 may let multiple applications simultaneously use available assets in the image computation fabric 810, and a stream multiplexer may manage resource locks and resource contention issues. The load distributor and balancer 838 may also execute a workload distribution algorithm to select an image processing component to receive one of the image primitives in the image streams 820. The selection may be based on a power and performance policy, resource reservation priorities, pipeline priorities, and/or resource availability arbitration priorities. According to some embodiments, a workload distribution algorithm may reduces stall and/or optimize for power or performance associated with execution of the image primitives in the image computation fabric 810. Thus, the load distributor and balancer 838 may spread the workload across available resources in the image computation fabric 810, to parallelize workload execution when possible. According to some embodiments, information in the primitive attribute database 840 may provide guidance for the load distributor and balancer 838.
For example, a workload distribution algorithm might select one of the fixed function hardware image processing unit 812 or a “software emulation” or proxy of the fixed function hardware image processing unit 812 based on primitive attributes and/or an image processing component status (e.g., when the fixed function hardware image processing unit 812 is in use by another application, the load distributor and balancer 838 might select to use a software proxy of that component instead).
Note that
The image streams 920 are composed of sequences of image primitives. According to some embodiments, a subset of the image primitives within a stream are associated with an image stream “segment.” For example,
The image streams 1010, 1020 of
According to some embodiments, the image stream segments may be associated with one or more image stream attributes for workload distribution, stall reduction, power optimization, performance optimization, load balancing, and/or a sequencing algorithm. Thus, a pipeline or image stream may be composed of segments, where segments are composed of sets of image primitives. Moreover, sets of primitives may be combinations of either fixed function hardware, software proxy emulations of the fixed function hardware that can be used when the fixed function hardware is busy, or “software only” primitives. Moreover, segments might be executed either in-order or out-of-order. According to some embodiments, image primitives, segments and/or entire pipelines may have policy attributes such as priority, power/performance budget, memory size requests, memory bandwidth requests. Note that a programmable segment could be provided such that it is associated with an arbitrary set of image primitives and/or an arbitrary image primitive order (e.g., to allow a customer to program an area image processing function).
Thus, a segment of an image stream may be assigned various attributes to control its execution during run time. For example,
These attributes may be used by an image pipeline controller when deploying the segment to an image computation fabric. For example,
Embodiments described herein may provide a standard software API across different execution components and/visual computing assets associated with perceptual computing software and fixed function hardware, camera pipelines and asset to help provide an improved user experience and performance versus wattage advantages.
Embodiments described herein may provide a standard software API across different execution components and/visual computing assets associated with perceptual computing software and fixed function hardware, camera pipelines and asset to help provide an improved user experience and performance versus wattage advantages.
According to some embodiments, a run-time framework may automatically attempt to facilitate or optimize performance of primitives across a compute fabric according to a-priori defined attributes of each primitive. Moreover, according to some embodiments, primitives may be grouped into segments which might be executed in-order or out-of-order according to their attributes. Moreover, segments may be chained together into a pipeline, and the run-time framework may attempt to facilitate or optimize the workload according to the available compute resources as per the attributes defined for each primitive or segment. According to some embodiments, the facilitation or optimization might include support for multiple simultaneous applications to share the compute fabric, interleaving for resource sharing and usage by different applications, resource locking and sharing mechanisms for primitives in a compute fabric, adjusting the behavior of the computing primitive assets such as by adjusting a clock frequency, voltage, bus speed, processor speed, processor time slice size for threads, device and thread priorities, bus arbitration priorities, memory tile sizes, cache behavior, memory behavior, primitive implementation method of SW or FF HW, etc.
The following illustrates various additional embodiments and do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.
Although embodiments have been described with respect to particular types of image sensors and displays, note that embodiments may be associated with other types of sensors and displays. For example, three dimensional cameras and/or displays may be supported by any of the embodiments described herein. Moreover, while embodiments have been illustrated using particular ways of processing image information, note that embodiments might instead be associated with any other sorts of image primitives and/or algorithms.
Embodiments have been described herein solely for the purpose of illustration. Persons skilled in the art will recognize from this description that embodiments are not limited to those described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.
Claims
1-29. (canceled)
30. A method, comprising
- generating, with a run-time framework, a plurality of image primitives grouped into image segments, wherein at least one of the image primitives to create output image data;
- wherein at least one image primitive is associated with an image primitive library, the image primitives to be at least one of: (i) a histogram primitive, (ii) a scaling primitive; or (iii) a machine vision primitive; and
- deploying, by the run-time framework, the plurality of image primitives to hardware for execution.
31. The method of claim 30, wherein image segments are grouped for at least one of in-order execution or out-of-order execution.
32. The method of claim 30, wherein image segments are grouped in the run-time framework for in-order execution and out-of-order execution, wherein a first run-time framework is executed in order and a second run-time framework is executed out of order.
33. The method of claim 30, wherein the output image data is output to a display.
34. The method of claim 30, wherein the image primitives generate output image data from input image input received from an image sensor.
35. The method of claim 30, wherein the generation of the run-time framework is powered by a battery power source.
36. The method of claim 30, wherein the run-time framework comprises at least one of: (i) a hardware run-time framework, (ii) a software run-time framework, or (iii) a combination of hardware and software run-time framework components.
37. The method of claim 30, wherein image primitives associated with a plurality of image segments are deployed.
38. The method of claim 30, wherein at least one of the segments is executed by an operating system, and information about the segments is associated with an application programming interface.
39. The method of claim 30, further comprising the hardware for execution comprises at least one of a system on a chip, a computation fabric, a processing unit, or a fixed function hardware image processing unit
40. The method of claim 30, further comprising executing a sequencing algorithm to order the image primitives within an image segment for an in-order image primitive execution in the run-time framework.
41. The method of claim 30, wherein the image primitives comprise an original order and said executing is performed for at least some of the image primitives in an order different than the original order for an out-of-order image primitive execution in the run-time framework.
42. The method of claim 30, wherein the output image data is produced by the image primitives before any reader of that data accesses it.
43. The method of claim 30, wherein at least one image segment comprises at least one of: (i) pixel correction, (ii) artifact removal, (iii) histogram information, (iv) a scaling function, (v) face recognition, (vi) visual object recognition, (vii) visual scene analysis, (viii) machine vision, (ix) gesture recognition, or (x) depth map calculation.
44. A non-transitory computer-readable storage on at least one medium having stored thereon instructions that when executed:
- generate, with a run-time framework, a plurality of image primitives grouped into image segments, wherein at least one of the image primitives to create output image data;
- wherein at least one image primitive is associated with an image primitive library, the image primitives to be at least one of: (i) a histogram primitive, (ii) a scaling primitive; or (iii) a machine vision primitive; and
- deploy, by the run-time framework, the plurality of image primitives to hardware for execution.
45. The non-transitory computer-readable storage of claim 44, wherein image segments are grouped for at least one of in-order execution and out-of-order execution.
46. The non-transitory computer-readable storage of claim 44, wherein image segments are grouped in the run-time framework for in-order execution and out-of-order execution, wherein a first run-time framework is executed in order and a second run-time framework is executed out of order.
47. The non-transitory computer-readable storage of claim 44, wherein the output image data is output to a display.
48. The non-transitory computer-readable storage of claim 44, wherein the image primitives generate output image data from input received from a camera.
49. The non-transitory computer-readable storage of claim 44, wherein generation of the run-time framework is powered by a battery power source.
50. The non-transitory computer-readable storage of claim 44, wherein the run-time framework comprises at least one of: (i) a hardware run-time framework, (ii) a software run-time framework, or (iii) a combination of hardware and software run-time framework components.
51. The non-transitory computer-readable storage of claim 44, wherein image primitives are associated with a plurality of image segments are deployed.
52. The non-transitory computer-readable storage of claim 44, wherein at least one of the segments is executed by an operating system, and information about the segments is associated with an application programming interface.
53. The non-transitory computer-readable storage of claim 44, wherein the hardware for execution comprises at least one of a system on a chip, a computation fabric, a processing unit, or a fixed function hardware image processing unit.
54. The non-transitory computer-readable storage of claim 44, further comprising executing a sequencing algorithm to order the image primitives within an image segment for an in-order image primitive execution in the run-time framework.
55. The non-transitory computer-readable storage of claim 44, wherein the image primitives comprise an original order and said executing is performed for at least some of the image primitives in an order different than the original order for an out-of-order image primitive execution in the run-time framework.
56. The non-transitory computer-readable storage of claim 44, wherein the output image data is produced by the image primitives before any reader of that data accesses it.
57. The non-transitory computer-readable storage of claim 44, wherein at least one image segment comprises at least one of: (i) pixel correction, (ii) artifact removal, (iii) histogram information, (iv) a scaling function, (v) face recognition, (vi) visual object recognition, (vii) visual scene analysis, (viii) machine vision, (ix) gesture recognition, or (x) depth map calculation.
58. A system, comprising:
- a processor;
- an image sensor to create input image data; and
- a memory to store instructions that executed by the processor: generate, with a run-time framework, a plurality of image primitives grouped into image segments, wherein at least one of the image primitives to create output image data in response to the input image data, wherein at least one image primitive is associated with an image primitive library, the image primitives to be at least one of: (i) a histogram primitive, (ii) a scaling primitive; or (iii) a machine vision primitive; and deploy, by the run-time framework, the plurality of image primitives to hardware for execution.
59. The system of claim 58, wherein image segments are grouped for at least one of in-order execution and out-of-order execution.
60. The system of claim 58, wherein image segments are grouped in the run-time framework for in-order execution and out-of-order execution, wherein a first run-time framework is executed in order and a second run-time framework is executed out of order.
61. The system of claim 58, wherein the image primitives generate output image data from input received from a camera.
62. The system of claim 58, wherein generation of the run-time framework is powered by a battery power source.
63. The system of claim 58, wherein the run-time framework comprises at least one of: (i) a hardware run-time framework, (ii) a software run-time framework, or (iii) a combination of hardware and software run-time framework components.
64. The system of claim 58, wherein image primitives associated with a plurality of image segments are deployed.
65. The system of claim 58, wherein at least one of the segments is executed by an operating system, and information about the segments is associated with an application programming interface.
66. The system of claim 58, wherein the hardware for execution comprises at least one of a system on a chip, a computation fabric, a processing unit, or a fixed function hardware image processing unit.
67. The system of claim 58, further comprising executing a sequencing algorithm to order the image primitives within an image segment for an in-order image primitive execution in the run-time framework.
68. The system of claim 58, wherein the image primitives comprise an original order and said executing is performed for at least some of the image primitives in an order different than the original order for an out-of-order image primitive execution in the run-time framework.
69. The system of claim 58, wherein the output image data is produced by the image primitives before any reader of that data accesses it.
70. The system of claim 58, wherein at least one image segment comprises at least one of: (i) pixel correction, (ii) artifact removal, (iii) histogram information, (iv) a scaling function, (v) face recognition, (vi) visual object recognition, (vii) visual scene analysis, (viii) machine vision, (ix) gesture recognition, or (x) depth map calculation.
71. A system, comprising:
- a means to process information;
- a means for creating input image data; and
- a means to store instructions that executed by the processor: generate a plurality of image primitives grouped into image segments to create output image data in response to the input image data, wherein at least one image primitive is associated with an image primitive library, the image primitives to be at least one of: (i) a histogram primitive, (ii) a scaling primitive; or (iii) a machine vision primitive; and deploy the plurality of image primitives to means for execution.
72. The system of claim 71, wherein image segments are grouped for at least one of in-order execution and out-of-order execution.
73. The system of claim 71, wherein image segments are grouped for in-order execution and out-of-order execution, wherein a first means of execution executes image segments in order and a second means of execution executes image segments out-of-order.
74. The system of claim 71, wherein the image primitives generate output image data from input received from a means for capturing input.
75. The system of claim 71, wherein generation of means of execution for image segments is powered by means of suppling power.
76. The system of claim 71, wherein at least one of the segments is executed by an operating system, and information about the segments is associated with an application programming interface.
77. The system of claim 71, wherein the hardware for execution comprises at least one of a system on a chip, a computation fabric, a processing unit, or a fixed function hardware image processing unit.
78. The system of claim 71, further comprising executing a sequencing algorithm to order the image primitives within an image segment for an in-order image primitive execution in means of run-time execution.
79. The system of claim 71, wherein the image primitives comprise an original order and said executing is performed for at least some of the image primitives in an order different than the original order for an out-of-order image primitive execution in means of run-time execution.
80. The system of claim 71, wherein the output image data is produced by the image primitives before any reader of that data accesses it.
81. The system of claim 71, wherein at least one image segment comprises at least one of: (i) pixel correction, (ii) artifact removal, (iii) histogram information, (iv) a scaling function, (v) face recognition, (vi) visual object recognition, (vii) visual scene analysis, (viii) machine vision, (ix) gesture recognition, or (x) depth map calculation.
Type: Application
Filed: Mar 23, 2016
Publication Date: Dec 15, 2016
Applicant: Intel Corporation (Santa Clara, CA)
Inventors: Scott A. Krig (Folsom, CA), Stewart N. Taylor (Los Altos, CA)
Application Number: 15/078,682