SYSTEMS AND METHODS FOR SMART CAPTURE TO PROVIDE INPUT AND ACTION SUGGESTIONS
Example systems and methods provide input suggestions to a user to improve user experience on user devices. The input suggestions can be fill information from another app on device to the present app being used by user, information for performing a search (without the user having to copy-paste data or entering the data manually), responses to a message/notification received by the user, information/content/data to be shared between apps (without switching between apps), and emojis/GIFs that can be used by the user. The method includes analyzing one or more content of one or more screen displayed on device, generating at least one of a logical tree structure and a data mashup model of the one or more analyzed content for each screen, and providing a recommendation to a user. The recommendation can be a connected action or an input suggestion.
This application is a continuation application, claiming priority under § 365(c), of an International application No. PCT/KR2022/011893, filed on Aug. 10, 2022, which is based on and claims the benefit of an Indian Provisional Specification patent application number 202141037550, filed on Aug. 18, 2021, in the Indian Intellectual Property Office, of an Indian Provisional Specification patent application number 202141018905, filed on Aug. 23, 2021, in the Indian Intellectual Property Office, and of an Indian Complete Specification patent application number 202141018905, filed on Apr. 23, 2022, in the Indian Intellectual Property Office, the disclosure of each of which is incorporated by reference herein in its entirety.
BACKGROUND FieldThe disclosure relates to improving a user experience on user devices and, more particularly, to improving the user experience on user devices by providing input suggestions to the user, by connecting actions using the consolidation of content across the device.
Description of Related ArtCurrently, a user can provide inputs using a keyboard (which can be a physical keyboard or a digital keyboard or virtual keyboard or the like) to various sources displayed on the user device. For example, the user of the user device can provide contact information such as a home address, phone number, email address, bank details, or the like for multiple applications including various pages/forms. Further, the user can experience difficulty in everyday situations like filling in data entry fields, searching for information, and responding. The user of the user device must copy and paste the contents or manually provide the data by referring to the contents received from other applications or pages received on the user device. The user must manually type and retype the same thing while searching or browsing across different related pages/forms included in the applications. Thus, it is difficult for the user to share relevant information or type the same information multiple times on the different pages/forms on the user device. Further, the users must constantly switch between the applications/pages/ forms to search and share the same set of information multiple times.
Further, page/forms/applications with different layouts and displays can make reviewing screen content to derive meaningful information quite challenging. In some cases, difficulties can arise from retrieving information from input fields in forms/pages that are specified to be non-editable. Text-based or view-based techniques may not work in cases where the screen includes mainly images. For example, on social media platforms, various fields such as text, images, and the like are not accessible and the contents cannot be captured, and thus it can be difficult to use such information for purposes such as Visual Question Answering (VQA) for Bixby® Commands, etc. Existing solutions, based on image analysis, use complex architectures.
Also, user devices receive multiple sources of information in the form of notifications, images, user-created data, and copied data. However, these multiple sources have not been consolidated to provide actions to permit users to accomplish tasks quickly. When multiple options are available (App, App Actions), no mechanism exists to provide the best possible option. Further, the user response has not been considered when providing the actions based on the previously chosen options.
The actions suggested in the existing methods are based on selected content only, and an action suggestion model is trained on a remote server and pushed to device for predicting action. The suggestions in prior art are generated based on search records and whitelisting websites which does not provide dynamism in the actions. In the existing methods, the suggestions include only matching entities which are discoverable in the current screen and one entity-action which are mapped previously. Conventional systems talk about looking for other similar entities (restaurant name to other restaurant name). Also, conventional systems transmit the actions to a secondary device through sharing, but don't consider other device data to modify actions.
The issues present in the conventional mechanisms may include screenshot/image-based boundary extraction of screen content without any intelligence based on screen type. The existing methods are not suitable on the user device at runtime for continuously engaged screen content like an ongoing conversation. Inference time 3.98s on a central processing unit (CPU). The conventional systems can scale to only limited number of content types on the user device/server. Size of the model increases with the increase in number of classes on the server/user device.
Also, the drawbacks in image analysis-based approach classifies a view or content type, unable to understand relation between the content on the screen. Existing methods are not suitable on device at runtime for continuously engaged screen content like an ongoing conversation. In terms of field classification, the existing methods cannot classify the field type, and can only detect if the input field takes sensitive data. Average time for analysis per app is 5.7 secs suitable for offline analysis of layouts for the given application. Not feasible for commercialization.
The existing prior art takes input such as the Application screen/Response context (Layout shown on the screen) and provides output as Augmented Application Screen/Response Context (Interact-able Hyperlinks). The conventional systems perform intent association, analyze all available/accessible contextual inputs, including named entities (songs, movies etc.) and phrases (time/date, translated phrases). These conventional systems mainly analyze what is visible on the screen, for example named entities, text displayed on the screen but cannot provide structured interpretation of screen content Example: Conversation screen understanding: what are sent messages, what are received messages, which message is of high priority, which message should the user respond to, etc.
Other conventional systems take input as screen displayed on the device and provide output in the form of natural voice interaction on the user screen. These systems include a context orchestrator that analyses the screen (UI elements that are displayed on the screen) and builds knowledge graph. It includes understanding of text, images displayed on the screen, their position, relation between the content etc. The conventional systems do not provide structured interpretation of screen content which is visible on the screen. Example: Conversation screen understanding. Also, it cannot classify fields displayed on the screen. Example: there will be multiple input fields in various applications (Name, Bank account number, Address field, etc.). Without classifying these fields, input suggestions cannot be provided.
Another conventional system includes modeling personal entities on a mobile device using embedding which provides recommendations based on personal entity on the screen, talks about identifying personal entity on the screen and building/updating personal knowledge base. Further using personal entity modelling and personal knowledge base to provide recommendations, recommendation includes personalized assistance to a user: determine completions for input, identify clusters or groups of similar personal entities (Ex: sports groups), suggest advertisements, etc.
In
The issues in the
As illustrated in
Issues arise from the conventional approach in
As illustrated in
Thus, it is desired to address the above-mentioned disadvantages or other shortcomings or at least provide a useful alternative.
SUMMARYEmbodiments of the disclosure provide methods, systems, and a user device for smart capture to provide input and action suggestions.
Various example embodiments provide methods, a user device and systems for improving a user experience on user devices by providing input suggestions to a user, wherein the input suggestions can fetch information from the previously or recently accessed forms/pages/applications to the current form/page/application being used by the user. In an embodiment, the input suggestions are for performing a search, without the user having to copy-paste or manually provide the data to a message/notification received by the user. Also, information/content/data to be shared between various forms/pages/applications without switching between various media contents can be provided. Further, suggestions can be provided to the user by means of emojis/GIFs and the like that can be used by the user, and so on.
Various example embodiments provide methods, a user device and systems for providing connected actions by consolidation of content that the user device receives (e.g., notification, screen data, clipboard, text selected or the like), core app data (e.g., messages, notes, media, contact etc.), application activity (app actions/usage), device data, near-by device and user context through data-mashup.
Various example embodiments relate to methods and systems for improving the user experience on user devices by providing input suggestions to the user, wherein the input suggestions can be determined based on deep screen capture.
Various example embodiments provide that, after classifying related classes (e.g., Dates, contacts, accounts, etc.) by analyzing various sources such as messages, images, files, and notifications, an operation that can be linked (predicted to the user's next operation) is proposed to the user.
Various example embodiments identify duplicate data received from multiple sources like message, images, files, notification and construct single entry for connected actions.
Various example embodiments provide methods and systems for connecting actions using the consolidation of content across the device.
In an embodiment, a method for providing at least one recommendation includes collecting, by a user device, at least one data from a plurality of sources on the user device; feeding, by the user device, the collected data to a data mashup model; identifying, by the user device, a plurality of types of the data using the data mashup model; determining, by the user device, one or more relationships among the types the data using the data mashup model; predicting, by the user device, one or more possible actions to be performed by a user as an outcome of the determined relationships using the data mashup model and providing, by the user device, a suggestion to the user to pursue the one or more actions from the prediction.
In an embodiment, a method includes analyzing, by a user device, at least one content displayed on one or more screens of the user device; generating, by the user device, at least one logical tree structure from the analyzed at least one content; detecting, by the user device, relationship and co-references between the analyzed contents by resolving anaphors and antecedents based on at least one logical tree structure of the analyzed at least one content; detecting, by the user device, anaphors displayed on the screen of the user device; resolving, by the user device, the detected anaphors with antecedents on the screen; fetching, by the user device, candidate contents to be suggested from a knowledge base and providing, by the user device, a recommendation, the fetched contents for at least one input by the user of the user device.
In an embodiment, a method includes analyzing, by a user device, contents of one or more screens displayed on the user device; generating, by the user device, at least one logical tree structure of the analyzed contents for each screen; classifying, by the user device, interest portion of the screen from at least one logical tree structure; detecting and classifying, by the user device, at least one input field requiring user input in a screen displayed on the device; fetching, by the user device, candidate contents to fill the detected input field from the logical tree structure, based on the detected interest portion of the screen and providing, by the user device, a recommendation, the fetched contents for the input by the user.
These and other aspects of the embodiments herein will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. It should be understood, however, that the following descriptions, while indicating at least one embodiment and numerous specific details thereof, are given by way of illustration and not of limitation. Many changes and modifications may be made within the scope of the embodiments herein without departing from the spirit thereof, and the embodiments herein include all such modifications.
The above and other aspects, features and advantages of certain embodiments of the present disclosure will be more apparent from the following detailed description, taken in conjunction with the accompanying drawings, in which:
The embodiments herein and the various features and advantageous details thereof are explained more fully with reference to the non-limiting embodiments that are illustrated in the accompanying drawings and detailed in the following description. Descriptions of well-known components and processing techniques are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those of skill in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the embodiments herein.
The terms “user device” and “electronic device” are used interchangeably in the patent disclosure.
The various embodiments of the disclosure provide input suggestions or recommendations to a user to improve a user experience on user devices. Various example embodiments are described with reference to the drawings, and more particularly to
Various embodiments provide methods and systems for providing recommendations for output by a user device based on the analyzed contents on the screen of the user device. The contents can be captured from a plurality of sources displayed on the screen, wherein the plurality of sources may include, but are not limited to social media applications such as Whatsapp®, Facebook®, notification screen data, clipboard, text selection from the web browser, device data, nearby device, and user context. Various embodiments generate a logical tree structure of the analyzed contents for each application. Various embodiments can detect and classify one or more input fields requiring user input in an application currently being displayed on the user device. Various embodiments can automatically fetch from the logical tree structure candidate contents to be filled in the detected input fields. Therefore, the various embodiments can provide recommendations of the fetched contents for the input to the user.
The various embodiments can improve a user experience on user devices by providing input suggestions to the user, wherein the input suggestions can, for example, fill information from another app on the user device to a present app being used by the user, information for performing a search (without the user having to copy-paste data or entering the data manually), responses to a message/notification received by the user, information/content/data to be shared between apps (without switching between apps), emojis/GIFs that can be used by the user, and so on. Various embodiments provide methods and systems for improving a user experience on user devices by providing input suggestions to the user, wherein the input suggestions can be determined based on deep screen capture. The input suggestions can be based on information extracted from image/text/data/content. The input suggestions can be based on information from at least one previous screen accessed/view by the user. The input suggestions can be based on information from incoming messages. The input suggestions can be based on recently accessed content/data.
The communication network 106 may include at least one of, but is not limited to, a wired network, a value-added network, a wireless network, a satellite network, or a combination thereof. Examples of the wired network may include, but are not limited to, a Local Area Network (LAN), a Wide Area Network (WAN), an Ethernet, and so on. Examples of the wireless network include, but are not limited to, a cellular network, a wireless LAN (Wi-Fi), BLUETOOTH™, Bluetooth low energy, Zigbee, Wi-Fi Direct (WFD), Ultra-wideband (UWB), infrared data association (IrDA), near field communication (NFC), and so on. In an embodiment, the user device 102, and databases 212 may be connected with each other directly and/or indirectly (for example, via direct communication, via an access point, and so on). In an embodiment, the user device 102, and the databases may be connected with each other via a relay, a hub, and a gateway. User device 102 and the databases may be connected to each other in any of various manners (including those described above) and may be connected to each other in two or more of various manners (including those described above) at the same time.
The user device 102 may, for example, be a device that enables user(s) to analyze contents on the user device 102. The contents can, for example, be from various sources of the user device 102, which may include, but are not limited to, notifications, screen data, clipboard, text selection, core application data such as messages, notes, media, contact, application activity device data, nearby device and the like. The user device 102 can intelligently provide actions for a given input by constructing associated actions with reasoning by finding related received contents across the device.
The user device 102 can dynamically suggest future actions by considering a next set of actions/things performed by the user after consuming suggestions. The user device 102 can provide suggestions for in an application by finding related other application data and modify suggestions based on the important events occurring at that time. The suggestions are provided using contents of similar applications such as orders made, viewed content, activities done at location. Based on the proposed method, the connected actions are derived from one device's data to other connected device data based on the user context.
The memory 202 may include, for example, at least one type of storage medium, from among a flash memory type storage medium, a hard disk type storage medium, a multi-media card micro type storage medium, a card type memory (for example, an SD or an XD memory), random-access memory (RAM), static RAM (SRAM), read-only memory (ROM), electrically erasable programmable ROM (EEPROM), programmable ROM (PROM), a magnetic memory, a magnetic disk, or an optical disk.
The memory 202 can store various sources displayed on a screen of the user device 102 which may include, but are not limited to, notification(s), screen data, media, social media application(s), contacts, clipboard(s), text selection(s), notes, media, device data, nearby device(s), and the like. The memory 202 can store an interest region of a user in performing various actions.
For example, if a user receives an email invitation, the invitation can be registered in a schedule. The registering of the invitation is an example of a “connected action”. When a user receives a message to confirm a flight reservation, the flight reservation can be connected to a restaurant nearby a destination location. The connecting of the flight reservation to a restaurant is an example of a second connected action. When an address mentioned in the received message is linked to the map app, the address information of the message is linked.
After analysing and storing account information received by text message, if it is a text input that requires account information, a suggestion to enter a stored account may be provided [Input Suggestion]. For example, if a search for a movie is conducted on a particular social media platform and that social media platform does not include any reviews of the searched for movies, the search may be automatically conducted on another social media platform by linking to that platform. [Input Suggestion].
The memory 202 may also include a management module to manage contents for providing suggestions to the user. Embodiments herein may refer to a controller 214 and the management module interchangeably, wherein both the terms refer to the controller 214.
The memory 202 may also store a learning module 308 (see
Examples of the neural network, the recommendation module 312 may be, but are not limited to, an Artificial Intelligence (AI) model, a multi-class Support Vector Machine (SVM) model, a Convolutional Neural Network (CNN) model, a deep neural network (DNN), a recurrent neural network (RNN), a restricted Boltzmann Machine (RBM), a deep belief network (DBN), a bidirectional recurrent deep neural network (BRDNN), generative adversarial networks (GAN), a regression-based neural network, a deep reinforcement model (with ReLU activation), a deep Q-network, and so on. The neural network may include a plurality of nodes, which may be arranged in layers. Examples of the layers may be, but are not limited to, a convolutional layer, an activation layer, an average pool layer, a max pool layer, a concatenated layer, a dropout layer, a fully connected layer, a SoftMax layer, and so on. Each layer has a plurality of weight values and performs a layer operation through calculation of a previous layer and an operation of a plurality of weights/coefficients. A topology of the layers of the neural network may vary based on the type of the respective network. In an example, the neural network may include an input layer, an output layer, and a hidden layer. The input layer receives a layer input and forwards the received layer input to the hidden layer. The hidden layer transforms the layer input received from the input layer into a representation, which may be used for generating the output in the output layer. The hidden layers extract useful/low-level features from the input, introduce non-linearity in the network and reduce a feature dimension to make the features equivalent to scale and translation. The nodes of the layers may be fully connected via edges to the nodes in adjacent layers. The input received at the nodes of the input layer may be propagated to the nodes of the output layer via an activation function that calculates the states of the nodes of each successive layer in the network based on coefficients/weights respectively associated with each of the edges connecting the layers.
The recommendation module 312 (see
Here, being provided through learning means that, by applying the learning method to a plurality of learning data, a predefined operating rule, or the neural network, the recommendation module 312 of the desired characteristic is made. Functions of the neural network, recommendation module 312 may be performed in the user device 102 itself in which the learning according to an embodiment is performed, and/or maybe implemented through a separate server/system (e.g., server 110).
Returning to
The wired communicator may enable the user device 102 to communicate with the other devices using communication methods such as, but not limited to, wired LAN, Ethernet, and so on. The short-range communicator may enable the user device 102 to communicate with the other devices using communication methods such as, but not limited to, Bluetooth low energy (BLE), near field communicator (NFC), WLAN (or Wi-fi), Zigbee, infrared data association (IrDA), Wi-Fi Direct (WFD), UWB communication, Ant+(interoperable wireless transfer capability) communication, shared wireless access protocol (SWAP), wireless broadband internet (Wibro), wireless gigabit alliance (WiGiG), and so on. The mobile communicator may transmit/receive wireless signals with at least one of a base station, an external terminal, or a server on a mobile communication network/cellular network. For example, the wireless signal may include a speech call signal, a video telephone call signal, or various types of data, according to transmitting/receiving of text/multimedia messages. The broadcasting receiver may receive a broadcasting signal and/or broadcasting-related information from the outside through broadcasting channels. The broadcasting channels may include satellite channels and ground wave channels. In an embodiment, the electronic device 102 may or may not include the broadcasting receiver.
The input unit 206 (e.g., including input circuitry) may be configured to enable the user to interact with the user device 102. The input unit 206 may, for example, include a capturing unit configured to capture media contents such as notification(s), messages, clipboard contents, notes, contacts, device data and the like received by the user device 102. The capturing unit/input unit referred to herein can be any kind of device used to capture inputs (video input, image input, or any media input) from the various sources of the device.
The input unit 206 can include be any kind of device used to capture media. The input unit 206 can be, but is not limited to, a digital camera, a media capturing device, a web camera, Single-lens reflex (SLR) cameras, Digital SLR (DSLR) cameras, mirrorless cameras, compact cameras, video recorders, digital video recorders, and the like. The media referred to herein can be, but not limited to video, images, and the like.
The output unit 208 (e.g., including output circuitry) may be configured to provide recommended suggestions to the user based on contents previously received by the user device 102.
For an instance, on-device AI method for providing connected actions by consolidation of the content that user receives in a device framework data like (Notification, Screen Data, Clipboard, Text Selected), Core App Data (Messages, Notes, Media, Contact etc.), App Activity (App Actions/Usage), Device Data, Near By Device and User Context through DataMashup.
Dynamism in providing Connected Action through User Behavioral Pattern/reasoning, next set of things user does after Consuming/Acting upon Suggestions. Similar Application data/usage has been considered while constructing the Suggestion (User Behavioural Reasoning & Similar Application Data).
Intelligent mechanism which can identify the duplicate data received from multiple sources like Message, Images, Files, Notification and Construct Single entry for Connected Actions (Multi Modal Data Similarity to Single Connected Actions). The controller 214 may include one or a plurality of processors. The one or a plurality of processors may be a general-purpose processor, such as a central processing unit (CPU), an application processor (AP), or the like, a graphics-only processing unit such as a graphics processing unit (GPU), a visual processing unit (VPU), and/or an Artificial Intelligence (AI)-dedicated processor such as a neural processing unit (NPU).
Content capturing module 310 can capture the contents of the screen by a deep screen capture mechanism using a screen understanding framework including: Screen Understanding, Field classification on the screen. Screen Understanding: in-depth understanding of the screen content for various screen types (Conversation Screen Understanding, Media Screen Understanding, etc.)
Classification module 306 can perform field classification by understanding screen content—TF-IDF based extraction by mapping the views using N-Array Depth First traversal, Field Classification as system component to enable system wide user cases.
Screen-Field Matrix (SFM) is a sparse matrix, that stores the composite weight for each input field , in each screen. The matrix column title represents tokens (words) in each screen, the matrix row title represents input fields, the SFM stores the weight of each of input field across different screens.
Extraction module 304 can resolve co-references within screen and extract relationship (Screen based Co-reference Resolution, after structured interpretation). Extraction module 304 can extract interest region on the screen, upon structured interpretation of screen content and extract content on the screen based on screen's identified interest region.
The learning module 308 of the neural network can be processed by controller 214 to obtain the input from the capturing unit of the user device 102. The learning module 308 can be provided with the suggestion of a user's choice of consuming actions or inputs to the user device 102. The learning module can be continuously provided with the user's choice/decision on consuming contents.
Recommendation module 312 can provide suggestions to recommend input to the users and action suggestions which can be performed by the user based on received or consumed contents by the user. The recommendation module 312 can fetch most recently used contents based on the identified context and region of interest extracted by the device.
At step 806, the method includes identifying a plurality of types of the data using the data mashup model.
At step 808, the method includes determining one or more relationships among the types the data using the data mashup model.
At step 810, the method includes predicting, by the user device 102, one or more possible actions to be performed by a user as an outcome of the relationships determined using the data mashup model.
At step 812, the method includes providing a suggestion to the user to pursue the one or more actions from the prediction.
Referring to
Referring to
Referring to
Referring to
As illustrated in
Therefore, making use of similar application's contents like orders made, viewed content, activities done at location, propagation actions derived from one device's data to other connected devices based on the context user is in actions will be suggested with best possible Application through Content Parsing and previous data received from the applications.
Therefore, intelligent system which can differentiate the same data received from multiple sources like (invitation for event through message, image card, files, notification, etc.) for the uniqueness detection which avoids duplicate actions propagation.
As shown in the
As illustrated in
As illustrated in
As illustrated content input management in applications, includes analyzing contents of one or more screens displayed from time to time on the device, generating a logical tree structure of the analyzed contents, for each screen (Segmented Screen Tree), detecting and classifying one or more input fields requiring user input in a screen currently being displayed on the device; (Screen based TF-IDF for Field Classification), automatically fetching from the logical tree structure, candidate contents to fill the detected input field and providing as a recommendation, the fetched contents for input by the user.
As illustrated in FIG.11B, relationship is extracted for providing input suggestions. The deep screen captures the source screen and interprets the structured input by parsing received and sent notifications to the user. Further, the screen based co-reference resolution can extract relationships between different screen contents and provide input suggestion (content) based on content extraction. As illustrated in
As illustrated in
The destination screen can be a food delivery application or any other application involving the location to deliver. The field classification can classify the search field from the previous screen and provide input suggestion(s) to the destination screen.
As illustrated in
As illustrated in
As illustrated in
As illustrated in
As illustrated, field classification comprises explicit autofill hints; view heuristics such as view hints, resource in name, text, content description and web view; keyword heuristics such as keyword DB, screen based TF-IDF and view heuristic such as parents and sibling node, activity component name and screen title.
As illustrated, field classification with screen based TF-IDF can be performed by receiving a source screen having input field(s), identifying information from the screen based on input field (e.g., view hierarchy, hints, resource IDs, content description, HTML attributes, activity & component name, screen title, parent and sibling nodes, etc.), retrieving tags from the screen information, prepare term and field list, dynamically preparing sparse screen based TF-IDF′ (Term Frequency—Inverse Doc Frequency: Document Term Matrix between input fields and Tags), associating and updating weights iteratively for each term and field, and classifying fields on the source screen, based on screen-based TF-IDF.
Also,
As illustrated, link creation makes details for a given input whether it has any context associated with it followed by any user actionable details. Context resolution is responsible for identifying the 2 different that are saved/newly received input text as similar kind of actionable information. Link weight has responsibility of calculating various link weights that are possible based on contexts that are mapped between 2 different Data.
Weighted Dynamic Action Resolution based on the Weights that are calculated which has various possibilities, WDAR maps to final Suggestion through Context association between the Data Nodes.
The DNN based Model for extracting entities from the Received Input, DNN Machine Learning Model which is used to extract the entities, DNN model to extract entities from a given Input.
The maximum pooling layer performs a pooling operation that calculates the maximum, or largest, value in each patch of each feature map. The results are down sampled or pooled feature maps that highlight the most present feature in the patch, not the average presence of the feature in the case of average pooling. A concatenation layer takes inputs and concatenates them along a specified dimension. The Softmax layer performs a mathematical function that converts a vector of numbers into a vector of probabilities, where the probabilities of each value are proportional to the relative scale of each value in the vector.
In an example, the user device 102 receives the message and image including the address, phone number and email ID and generates the knowledge mesh for the address, phone number and email ID using an ontology inference rules behaviour learning technique. Based on the knowledge mesh, the user device 102 provides the suggestion (e.g., Navigate, Save Contact, Send Message or Send email) to user.
User Next Actions after clicking on Suggested Actions will be used to improve Future Action. Based on the User Interaction with the given Suggestion, User Behavior will be learned with a parameter like User Action Consumption (Clicked), Swiped Away, Removed along with Current Context which are fed into Bayesian Model to learn Behavior.
When Similar Input Text is received into the device, the reasoning engine will suggest whether providing Action will be useful or not based on previous history and suggestion made accordingly.
Based on the proposed method, the action suggestion will be shown based on the location. Consider an example in which the user of the user device 102 orders a food item in the first food delivery application. Over the period of time, various data (e.g., cuisine name, restaurant name, application type or the like) is stored in the first food delivery application. Various data is used for actions suggestion in the applications. Further, in the messaging application, the friends are discussed as “John: When you are in Hyderbad Don't forget to try out Paradise Biryani; Joseph: Bring these items for me from Film City when you are visiting that place”. The various data from the first food delivery application along with type of the application and the data from the messaging application are considered, so when the user of the user device 102 launches the similar application (e.g., second food delivery application) in the specified location (i.e., Film City at Hyderbad), similar actions (e.g., Paradise Biryani) will be suggested to the user. This improves the user experience.
In an example, when the user of the user device 102 gets a delivery message and the user goes to pick the parcel for which PIN shall be shown. Based on the proposed method, the content/action will be routed to the smart watch as the user may carry only watch with him/not the smart phone.
Based on the behavioural understanding of the user, the actions will be shown to the user by consolidating the data in the user device 102. Consider an example, the user downloads a particular type of a file and after downloading, the user suggests to open file/share file using various applications over the period of time. Hence, after incoming file content is downloaded and analyzed, respective actions (e.g., read file or share file) are suggested to the user.
As illustrated,
Therefore, deep screen capturing can be provided for an on-device AI method for providing connected actions by consolidation of the content that user receives in the device framework data like (Notification, Screen Data, Clipboard, Text Selected), Core App Data (Messages, Notes, Media, Contact etc.), App Activity (App Actions/Usage), Device Data, Near By Device and User Context through Data Mashup.
Dynamism in providing connected action through user behavioral pattern/reasoning, next set of things user does after consuming/acting upon suggestions. Similar application data/usage has been considered while constructing the suggestion (User Behavioural Reasoning & Similar Application Data). Intelligent mechanism which can identify the duplicate data received from multiple sources like Message, Images, Files, Notification and Construct Single entry for Connected Actions (Multi Modal Data Similarity to Single Connected Actions).
Hence, methods and systems to provide input suggestions using deep screen capture in which one screen understanding framework includes: screen understanding, field classification on the screen, screen intelligence (interest & Relationship); input suggestions such as fill suggest, search suggest, response suggest, share suggest, emoji suggest.
In an embodiment, the system and method provides structured interpretation of the screen content using segmented screen tree. Screen understanding: in-depth understanding of the screen content for various screen types (Conversation Screen Understanding, Media Screen Understanding, etc.).
In an embodiment, the system and method is for field classification with screen-based TF-IDF. Field classification by understanding screen content—TF-IDF based extraction by mapping the views using N-Array Depth First traversal, Field Classification as system component to enable system wide user cases.
System and Method for relationship extraction among the content on the screen, and providing input suggestions. Resolving co-references within screen and extracting relationship (Screen based Co-reference Resolution, after structured interpretation)
System and Method for interest extraction on the screen, and providing input suggestions. Extracting interest region on the screen, upon structured interpretation of screen content and Extracting content on the screen based on screen's identified interest region.
Similarly, the user of the user device 102 checks multiple applications for price for buying products. Similarly, the user of the user device 102 checks better options to eat on different food applications. This results in improving the user experience for showing better results.
As shown in the
The foregoing description of the specific embodiments will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation. Therefore, while the embodiments herein have been described in terms of at least one embodiment, those skilled in the art will recognize that the embodiments herein can be practiced with modification within the spirit and scope of the embodiments as described herein.
Claims
1. A method for providing at least one recommendation, the method comprising:
- collecting, by a user device, at least one data from a plurality of sources on the user device;
- feeding, by the user device, the collected data to a data mashup model;
- identifying, by the user device, a plurality of types of the data using the data mashup model;
- determining, by the user device, one or more relationships among the types of the data using the data mashup model;
- predicting, by the user device, one or more possible recommendations to be performed by a user as an outcome of the determined relationships using the data mashup model; and
- providing, by the user device, a suggestion to the user to pursue the one or more recommendations from the prediction.
2. A method for providing at least one recommendation, the method comprising:
- analyzing, by a user device, at least one content captured from a plurality of sources displayed on a screen of the user device;
- generating, by the user device, at least one logical tree structure based on the at least one analyzed content;
- detecting, by the user device, at least one input field requiring at least one user input in the plurality of sources displayed on the user device, wherein the at least one input field is classified based on at least one input type;
- fetching, by the user device, at least one candidate content from the logical tree structure, wherein the at least one candidate content is based on the at least one detected input field; and
- recommending, by the user device, the at least one fetched candidate content to a user of the user device.
3. The method as claimed in claim 2, wherein the at least one recommendation is generated by analyzing at least one content captured from a plurality of sources displayed on the user device, wherein the at least one recommendation is suggested based on at least one action performed by user of the user device, wherein the at least one logical tree structure is generated by determining one or more relationships among types of the data using a data mashup model, wherein at least one input field requiring at least one user input in the plurality of sources displayed on the user device is detected based on the outcome of the determined relationships using the data mashup model.
4. The method as claimed in claim 3, wherein the at least one recommendation is suggested by using previously generated at least one action of the user and analyzing at least one content captured on the user device.
5. The method as claimed in claim 2, wherein generating the at least one logical tree structure based on the at least one content comprises receiving at least one screen of the user device, retrieving at least one content capture event, dynamically creating a segmented screen tree, identifying based on screen type or categories, dynamically traverse the segmented screen tree using an associated identifier and providing a structural interpretation of the screen content.
6. The method as claimed in claim 2, wherein the at least one input field is classified by identifying information from at least one input type of at least one screen of the user device, retrieving tags, and preparing at least one term and at least one field list.
7. The method as claimed in claim 6, wherein classifying the at least one input field is based on dynamically preparing a screen-field matrix, and associating and updating weights for at least one term and at least one field list.
8. The method as claimed in claim 7, wherein the classifying at least one input field on at least one screen is based on the Screen-Field Matrix.
9. The method as claimed in claim 2, wherein the at least one candidate content is recommended by based on extracting a relationship and at least one interest on at least one screen of the user device.
10. The method as claimed in claim 9, wherein extracting the relationship and at least one interest on at least one screen is based on resolving co-references within at least one screen, extracting an interest region of at least one screen associated with the structural interpretation of at least one screen of the user device.
11. The method as claimed in claim 9, wherein extracting the relationship on at least one screen of the user device is based on identifying at least one interest region of at least one screen of the user device.
12. A user device for providing at least one recommendation, the user device comprising:
- a memory;
- a controller;
- a hardware processor, wherein the hardware processor is configured to:
- analyze at least one content captured from a plurality of sources displayed on a screen of the user device;
- generate at least one logical tree structure based on the at least one analyzed content;
- detect at least one input field requiring at least one user input in the plurality of sources displayed on the user device, wherein the at least one input field is classified based on at least one input type;
- fetch at least one candidate content from at least one logical tree structure, wherein the at least one candidate content is based on the at least one detected input field and
- recommend at least one fetched candidate content to a user of the user device.
13. The user device as claimed in claim 12, wherein the at least one recommendation is generated based by analyzing at least one content captured from a plurality of sources displayed on the user device, wherein the at least one recommendation is suggested based on at least one action performed by the user.
14. The user device as claimed in claim 13, wherein the at least one recommendation is suggested by utilizing previously generated at least one action performed by the user and analyzing at least one content captured on the user device.
15. The user device as claimed in claim 12, wherein the generating at least one logical tree structure based on the at least one content comprises receiving at least one screen of the user device, retrieving at least one content capture event, dynamically creating a segmented screen tree, identifying based on screen type or categories, dynamically traverse the segmented screen tree using an associated identifier and providing structural interpretation of the screen content.
Type: Application
Filed: Oct 6, 2022
Publication Date: Apr 6, 2023
Inventors: Naresh PURRE (Bengaluru), Sriram Shashank (Bengaluru), Sri Lakshmi Punuru (Bengaluru), Barath Raj Kandur Raja (Bengaluru), Vanraj Vala (Bengaluru), Aayush Yadav (Bengaluru), Aditi Anil Kagane (Bengaluru, IN), Sudeep Kumar Kodali (Bengaluru, IN), Rishabh Kumar (Bengaluru), Srinivasa Rao Siddi (Bengaluru), Manjunath Bhimappa Ujjinakoppa (Bengaluru), Mansoor Variyathpara Mohammed (Bengaluru), Hemant Tiwari (Bengaluru), Dwaraka Bhamidipati Sreevatsa (Bengaluru), Ankita Bhardwaj (Bengaluru), Vipin Rao (Bengaluru), Likhith Amarvaj (Bengaluru), Vibhav Agarwal (Bengaluru), Yashwant Singh Saini (Bengaluru), Himanshu Arora (Bengaluru), Muthu Kumaran (Bengaluru), Seungseok Kang (Suwon-si), Sanguk Jeon (Suwon-si), Jaehoon Park (Suwon-si), Pilsik Choi (Suwon-si), Hojun Jaygarl (Suwon-si), Shweta Ratanpura (Bengaluru), Mritunjai Chandra (Bengaluru)
Application Number: 17/961,315