Gemini 1.5 and Google’s Nature | Stratechery by Ben Thompson

Stratechery · Intermediate ·🧠 Large Language Models ·2y ago

Key Takeaways

The video discusses Gemini 1.5, Google's latest language model, and its capabilities, including multimodal processing and a 1 million token context window, as well as Google's infrastructure and business model shift.

Full Transcript

Gemini 1.5 in Google's nature was published on Wednesday April 10th 2024 it was impossible to miss the leading message at yesterday's Google NEX keynote Google has the best infrastructure for AI this was CEO sunar Pai in his video greeting I want to highlight just a few reasons Google cloud is showing so much progress one is our deep investments in AI we have known for a while that AI will transform every industry and Company including our own that's why we've been building AI infrastructure for over a decade including tpus now in their fifth generation these advancements have helped customers train and serve Cutting Edge language models these Investments put us at the Forefront of the AI platform shift Google Cloud Co Thomas krian made the priority clear as well today we're going to focus on how Google is helping leading companies transform their operations and become digital and AI leaders which is the new way to Cloud we have many important advances starting with our infrastructure what was most interesting about the keynote though is what that infrastructure makes possible and by extension what that says about Google's ability to compete grounding one of the most surprising things about large language models llms is how much they know from the very beginning though hallucinations have been a concern hallucinations are of course part of what makes l so impressive a computer is actually being creative it's also a feature that isn't particularly impressive to the Enterprise customers that this keynote was directed at to that end Kiran shortly after going over Google's infrastructure advantages talked about grounding both in terms of the company's Gemini model broadly and for Enterprise use cases specifically in the context of Google's vertex AI Model Management Service to augment models vertex AI provides managed tools to connect your model to Enterprise applications and database using extensions and function calling vertex also provides retrieval augmented generation combining the strengths of retrieval and generative models to provide high quality personalized answers and recommendations vertex can augment models with up-to-date Knowledge from the web and from your organization combining generative AI with your Enterprise truth today we have a really important announcement you can now ground with Google search perhaps the World perhaps the world's most trusted source of factual information with a deep understanding of the world's knowledge grounding Gemini's responses with Google search improves response quality and significantly reduces hallucination second we're also making it easy to ground your models with data from your Enterprise databases and applications and any database anywhere once you've chosen the right model tuned it and connected it with your Enterprise truth vertex's mlops can help you manage and monitor models a retrieval augmented generation reg implementation using Google is an obvious win and mirrors chat gpt's integration with Bing or Microsoft co-pilot in bing the llm provides answers when it can and searches the web for things it doesn't know a particularly useful feature if you're looking for more recent information a more impressive demonstration of grounding though was in the context of integrating Gemini with Google's big query data warehouse and looker business intelligence platform in this demo the worker gets an alert that a particular product is selling out using generative AI the worker can see sales Trends find some weere models and create a plan of action for dealing with declining inventory for delivery to her team what is notable is not the demo specifics which is unapologetically made up for symbol Google's demo brand rather note the role of the llm it is not providing information or taking specific actions but rather serving as a much more accessible natural language interface to surface and collect data that would otherwise take considerably more expertise and time in other other words it is trustworthy because it is grounded through integration Google is promising with its other Enterprise data services Gemini 1.5 at the same time that last section didn't really follow on from the introduction yes those llms leveraging Google or big query are rning on Google's infrastructure but other companies or startups can build something similar this is where the rest of pai's introduction comes in we also continue to build capable AI models to make products like search maps and Android radically more helpful in December we took our next big step with Gemini our largest and most capable model yet we've been bringing it to our products and to Enterprises and developers through our apis we've already introduced our next Generation Gemini 1.5 Pro it's been in private preview in vertex AI 1.5 Pro shows dramatically enhanced performance and includes a breakthrough in Long context understanding that means it can run 1 million tokens of information consistently opening up new possibilities for Enterprises to create discover and build using AI there's also Gemini's multimodal capabilities which can process audio video text code and more with these two advances Enterprises can do things today that just weren't possible with AI before Google hasn't settled Gemini 1.5 was made but clearly the company has overcome the key limitation of traditional Transformers m requirements increase quadratically with context length one promising approach is ring attention with blockwise Transformers which breaks long context into pieces to be computed individually even as the various devices Computing those pieces simultaneously communicate to make sense of the context as a whole in this case memory requirements scale linearly with context length and can be extended by simply adding more devices to the ring topology this is where Google's infrastructure comes in the company not only has a massive Fleet of tpus but has also been developing those tpus to run in parallel at every level of the stack from Chip to Cluster to even data centers this latter requirement is more pertinent for training than inference if there is a solution that calls for scale Google is the best place to provide it and it seems the company has done just that with Gemini 1.5 demos to that end and per pai's closing line almost all the other Demos in the keynote were implicitly leveraging Gemini 1.5s context window in a Gemini for work space demo the worker evaluated two statements of work against each other and against the company's Compliance Document Google Drive is AI ready without any additional pre-work and each of these documents is over 70 Pages it would have taken me hours to review these docs but instead Gemini is going to help me find a clean answer to save me a ton of time but before I proceed with this vendor I need to ensure that no compliance issues exist and I'm going to be honest I have not memorized every rule in our compliance rule book because it is over a 100 pages I would have to need to scour the 80 pages of this proposal and compare it manually with 100 pages of the rule book so instead in the side panel I ask does this offer comply oops with the following and I'm going to at mention our compliance rulebook hit enter and see what Gemini has to say okay so interesting Gemini has found an issue because the supplier does not list their security certifications because Gemini is grounded in my company's data with Source citations to specific files I can trust this response and start to troubleshoot before selecting a vendor the key distinction between this demo and the last one is that quote at the beginning a large context window just works in a far greater number of use cases without any fiddly rag implementations or special connections to external data stores just upload the files you need to analyze in your off in a creative agent with imagine demo the worker was seeking to create marketing images and storyboards for an outdoor product the creative agent can analyze previous campaigns to understand our unique brand style and apply it to new ideas in this case the creative agent has analyzed over 3,000 brand images descriptions videos and documents of other products that we have in our catalog contained within Google drive to create this summary the creative agent was able to use Gemini Pros 1 million token context window and its ability to reason across text images and video to generate this summary next this was to be fair one of the weaker demos the brand submarine marketing campaign weren't that impressive and the idea of creating a podcast with synthetic voices is technically impressive and also something that will never be listened to that though is impressive in its own right as I noted in an update when Gemini 1.5 was first announced quote a massively larger context window makes it possible to do silly stuff end quote and silly stuff often turns into serious capabilities in a Gemini code assistant demo formerly duet AI for developers a developer new to a job and the codebase was tasked with making a change to a site's homepage and for the developers out there you know that this means we're going to need to add padding in the homepage modify some views make sure that the figs are changed for our microservices and typically it would take me a week or two to even just get familiarized with our company's codebase which has over a 100,000 lines of code across 11 services but now with Gemini code assist as a new engineer on the team I can be more productive than ever and can accomplish all of this work in just a matter of minutes this is because Gemini's code Transformations with full full codebase awareness allows us to easily reason through our entire codebase and in comparison other models out there can't handle anything beyond 12 to 15,000 lines of code and even then they struggle to get it right Gemini inside of cod assist is so intelligent that we can just give it our business requirements including the visual design Gemini codesys doesn't just suggest code edits it provides clear recommendations and make sure that all of these recommendations are aligned with symbol Outfitter security and compliance requirements so let's recap behind the scenes Gemini has analyzed my entire codebase in get lab it's implemented a new feature and has ensured that all of the code generated is compatible with my company's standards and requirements again leave aside the impossibility of this demo the key takeaway is the cap abilities unlocked when the model is able to have all the context around a problem well working this is only possible with and here the name is appropriate a long context window and that is ultimately enabled by Google's infrastructure Google's nature in case it isn't clear I think that this keynote was by far the most impressive presentation Google has made in the AI era not least because the company knows exactly what its advantages are several years ago I wrote an article called Microsoft's Monopoly hangover that discussed the company's then ongoing transition away from Windows as the center of its strategy the central conat was a comparison to Lou gersner 1990s transformation of IBM quote the great thing about a monopoly is that a company can do anything because there is no competition the bad thing is that when the Monopoly is finished the company is still capable of doing anything at a mediocre level but nothing at a high one because it has become fat and lazy to put it another way for a former Monopoly big is the only truly differentiated asset end quote my argument was that business models could be changed IBM did it and Microsoft was in the process of doing so when I wrote that moreover gersner had shown that culture could be changed as well and nadela did just that at Microsoft what couldn't be changed was nature IBM was a company predicated on breadth not specialization that's why gersner was right to not break apart the company but to instead deliver internet solutions to Enterprises similarly Microsoft was a company indicated on integration around windows the company's shift to Services centered on teams as Microsoft's operating system in the cloud was true to the company's nature Google is facing many of the same challenges after its decades long dominance of the open web all of the products shown yesterday rely on a different business model than advertising and to properly execute and deliver on them will require a cultural shift to supporting customers instead of tolerating them what hasn't changed because it is the company's nature and thus cannot is the Reliance on scale and an overwhelming infrastructure advantage that more than anything is what defines Google and it was encouraging to see that so explicitly put forward as an advantage for more analysis like this please like And subscribe and visit cher.com and listen to the sharptech podcast also check out the asianometry channel on YouTube to learn more about the technology changing our world

Original Description

Read the Article: https://stratechery.com/2024/gemini-1-5-and-googles-nature/ Links: Stratechery: https://stratechery.com Sign up for Stratechery Plus: https://stratechery.com/stratechery-plus Sharp Tech website: https://sharptech.fm
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

The video discusses Gemini 1.5, Google's latest language model, and its capabilities, including multimodal processing and a 1 million token context window. It also covers Google's infrastructure and business model shift, highlighting the importance of scale and infrastructure in AI development.

Key Takeaways
  1. Building capable AI models
  2. Introducing Gemini 1.5 Pro
  3. Enhancing performance with Long context understanding
  4. Processing audio, video, text, code, and more with multimodal capabilities
  5. Running 1 million tokens of information consistently
  6. Adding padding in the homepage
  7. Modifying some views
  8. Making sure that the figs are changed for our microservices
  9. Giving it our business requirements including the visual design
💡 Google's infrastructure and scale are key advantages in AI development, and its business model shift requires a cultural change to rely on these strengths.

Related Reads

📰
Kairos-4B: the open-source world model that just lapped the competition four times over
Learn about Kairos-4B, an open-source world model that surpasses competition four times over, and how it achieves real-time performance on edge devices
Medium · Machine Learning
📰
New AI tutor achieves 0.71-1.30 SD effect size in Dartmouth course [pdf]
Phosphor, an AI-powered learning platform, achieves significant learning gains by integrating LLM-graded formative assessments into instructional content, increasing student engagement and efficacy
Hacker News (AI)
📰
Guardrails for LLM Apps in Java
Learn to secure LLM apps in Java with guardrails against prompt-injection and data breaches
Dev.to · Puneet Gupta
📰
Guardrails for LLM Apps in Python
Learn to defend LLM apps in Python with guardrails against prompt-injection attacks and improper data handling
Dev.to · Puneet Gupta
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →