: After ChatGPT, Microsoft working on AI model that takes images as cues #IndiaNEWS #Education New Delhi, March 3 (IANS) As the war over artificial intelligence (AI) chatbots heat up, Microsoft has
After ChatGPT, Microsoft working on AI model that takes images as cues #IndiaNEWS #Education
New Delhi, March 3 (IANS) As the war over artificial intelligence (AI) chatbots heat up, Microsoft has unveiled Kosmos-1, a new AI model that can also respond to visual cues or images, apart from text prompts or messages.
The multimodal large language model (MLLM) can help in an array of new tasks, including image captioning, visual question answering and more.
Kosmos-1 can pave the way for the next-stage beyond ChatGPTs text prompts.
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context and follow instructions, said Microsofts AI researchers in a paper.
The paper suggests that multimodal perception, or knowledge acquisition and grounding in the real world, is needed to move beyond ChatGPT-like capabilities to artificial general intelligence (AGI), reports ZDNet.
More importantly, unlocking multimodal input greatly widens the applications of language models to more high-value areas, such as multimodal machine learning, document intelligence, and robotics, the paper read.
The goal is to align perception with LLMs, so that the models are able to see and talk.
Experimental results showed that Kosmos-1 achieves impressive performance on language understanding, generation, and even when directly fed with document images.
It also showed good results in perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and vision tasks, such as image recognition with descriptions (specifying classification via text instructions).
We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs, said the team.
IANS
na/
Stock Market NEWS Best intraday tips Intraday Stocks below 100
0 Reactions React
More posts by @IndiaNEWS
: Santosh Trophy Final: Skipper Mukhims return boosts Meghalaya, to go all out against Karnataka #IndiaNEWS #Sports The team from Northeast India will be going for its maiden national title and they
0 Reactions React
: Bindu remembers how she impressed Rajesh Khanna in Do Raaste #IndiaNEWS #Entertainment Mumbai, March 3 (IANS) Recalling how her performance shocked superstar Rajesh Khanna in the 1969 film Do Raaste
0 Reactions React
0 Comments
Sorted by latest first Latest Oldest Best
Terms of Use Create Support ticket Your support tickets Powered by ePowerPress Stock Market News! Top Seo SMO © hashkaro.com2024 All Rights reserved.