Abstract: Methods and systems for building a universal always-on multimodal identification system. A universal representation to be used for executing one or more tasks, working on data with one or more signal modalities and comprising modal fusions signals at various levels is learned from a dataset that is targeted user or object agnostic. This universal representation is combined with a second stage task specific representation that is learned on-the-device using data from the particular user without sending the data to the cloud.