The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer

AI Engineer · Intermediate ·🚀 Entrepreneurship & Startups ·2y ago

Key Takeaways

The video discusses the evolution of intelligent interfaces, specifically how AI-driven systems like ChatGPT have improved human-computer interaction, and explores the potential of multimodal input and output methods, including voice, gesture, and visual UI, to create more immersive and interactive experiences. It highlights the use of LLMs, pose detection, and other technologies to enable more natural and adaptive interactions.

Full Transcript

[Music] hi everybody thanks for having us here today um we're super excited to be here I'm Sam and I'm one of the co-founders of new computer and I'm Jason the other co-founder and we're really excited that we are starting today by letting you all see our pores up close um which is amazing um so you know when Sam and I started a new computer we we did so because we believed that for so long we've taken certain metaphors and abstractions and tools for granted and for the first time what feels like 40 years we can finally change all of that and we can start thinking from first principles what our relationship not only with Computing but with intelligence period should look like in the future so what do we mean by intelligence because uh you know sometimes I'm on the internet and I wonder if it even exists um well one way to think about intelligence is uh the ability to sort of take in lots of information different types different volumes from different sources um visualize as Dots here and sort of find ways to make sense of it all find ways to reason find ways to find meaning um and as human beings as carbon based life forms we do this through a process where first we use our senses to sort of perceive the world around us um then we you know process that information in our heads and then given what we think we then choose a reaction um so if we're lucky we are blessed with at least five senses six would I've had for margaritas um but as humans we sort of are in inherently capable of just processing all of this at the same time then that actually is how our short-term memory gets to work um and taking all this context and information we then get to form what's called a theory of mind um what is going on what is you know how is the world relating to me right now what should I be doing about it so we sense we think and then we react um and how do we react well um there's a lot of things right now uh but if we take it the way back to the Stone Age and we think real simple um a lot of how people used to react and communicate is just unintelligible grants um and then one day we that sort of evolved into a language as we know it um and to this day that's still something that we rely on to communicate and react to the world around us um and that's also how a lot of us think so we have language um but the language of communication is so much broader than just language we're standing here on stage right now I'm making eye contact with some of you nice shirt um and I'm making gestures I'm wearing these ridiculous gloves I'm looking at Sam I'm looking at things I'm pointing at things um and I can hear sort of laughter or I can hear people you know thinking I'm taking lots of information at once and right now I'm sensing thinking and reacting so this year um well last year Tech technically we saw a really amazing thing happen um kind of with the Advent of chat chat GPT I would say where we saw the beginnings of a computer start to approximate that same Loop where input was coming in in the form of language there was some reasoning process um however that actually works um and then the output felt also like language coming back to us and this was very inspiring to me and Jason and we've been spending a lot of time this past year thinking about what's next and how this gets to feel even more natural um for people to interact with computers specifically and so today we wanted to take you on a tour of a few demos um one um which you can do with the computer right now um and then a few which are kind of with futuristic or uh Next Generation Hardware which may be available soon and knowing that you're all Engineers we know that this will kind of get the Sparks flowing um the ideas flowing for seeing how like you might use um some of these things that are coming out soon or things that exist today to build things that feel more natural so I'll start by getting to a demo and I will say um this is a live audio visual demo so I am foolish enough to make that choice so we will see how it goes um before we show any demos it's prudent to point out that none of these represent the product we we are building they are simply yes pieces stories of inspiration so the point of this first demo is to imagine we have a lot of things where we're saying like Okay is text the right input is audio the right input and we've been thinking about it's not if those are the right things but when so in this case you'll see some measurements happening on the left here what's actually happening is that this has this has access to my camera and it's taking uh real-time pose measurements of where I am with relev relative to the screen so I just it knows I'm at the keyboard basically because it's making that assessment and you can see the reasoning in the side here where it's saying user is close to screen will use keyboard input user is facing screen will use text output and so this we're using an llm to actually make that choice as it as it goes to the response so let's try something else and again demo Gods be nice because this may not work at all but if I now walk away and it doesn't detect me anymore it should now actually start listening to me hello can you hear me are you going to respond I think that's a no it might not respond but basically what we are attempting to build here is like if I want to actually talk to the computer in a really Natural Way um if I'm there next to the keyboard I should not it should not be paying attention to my uh Voice or any sounds ambient sounds and if I walk away from the keyboard I might want to have a conversation with it like walk around the room it is listening it seems to not to decided not to actually talk back but oh it's talking is there something you need help sounds like an interesting project Samantha how is your talk going so far yay [Music] [Applause] yes you can see it paid attention and it decided to ignore me for a while but anyway this is this is just like a toy demo you can see here we have um this is how it's working kind of behind the scenes it's like trying to decide if I'm close to the keyboard facing the screen not facing the screen and use that all as inputs to decide whether it should talk to me or um just display the text as on the interface um cool so the reason why we think this is interesting is because we think you know people are naturally sensitive to other people and um we we think computers instead of asking people to adapt to computers to be like come up to me and type and whatever should find ways to try to adapt to circumstances and context of people exactly so um again here it's like in this case it's adapting to where I am by using the pose detection whether or not I'm actually in the process of talking to it to decide to update its own world State use an llm to actually do that and then use the llm to respond using the knowledge of that world State and so this is a really simple and as you can see kind of hacky demo that is what something you could build today in theory you could imagine how this could be like a really cool native way to uh interact with an Elm on your computer where you don't have to worry about the input at all um so again takeaways are consider like explicit inputs what I'm typing what I'm saying along with implicit where I am um there's other things you could do with that like tone and emotion detection um you could plug in a whole bunch of different signals that you want to extract from that and you can even imagine if I'm in the frame with Sam and the agent knows Sam and she had recently been complaining about me I should probably not bring that up until I leave the thing um Y and as we mentioned that um using it as a reasoning engine and then next one cool and yeah and then we're adapting so we want to get to the futuristic stuff um Jason has been spending a lot of time imagining this so he's going to walk you through a few things that might exist shortly in the near future when new hardware comes out so um we think future we still think the sensing thinking react Loop will will take place to preface all of this these are my personal speculative VI I not representative of anything that I think might actually happen um and this is a very conservative view of the next 1 to 12 months maybe so it's not a true future future AGI God worshipping type situation um so let's start with uh what I call like a social interface um we're all really excited about you know certain headsets being released at certain points um and one thing that I think is interesting about some headsets is they have sensors and they have hand tracking and eye TR tracking um and just like how I'm being expressive right now maybe there comes a day where I can be such with a computer that sort of lives with me so here my here I am in my apartment minding my own business um and my ex decides to uh FaceTime me um and now I've declined the call you know with his historically with deterministic interfaces um I would have had to like find the hang up button or go like hey Alexa decline call like thinking commands thinking computer speak but like as a person I can be like off you know I can be like I'm busy I can be like I'm sick you know like all this stuff the computer should be able to interpret for me and you know send send uh what's his name again tox toxic trashiest whatever on his merry way um so explicit social gestures can be a great way to determine user user intent like the way I just showed now um but we should also consider interpreting implicit gestures if I give a really fast gesture with a slow gesture my mood my tone how far away I am um but we should also be conscious of social cultural norms different gestures mean different things in different societies and it might mean you know as you scale your application re Hardware to different locals this is something that you should pay attention to now I want to move on to talk about what I call new physics and this part is super fun um this demo is based on um a little uh I think on iPad which you know has over five daily active users in the world it's very popular um and here I'm imagining like okay mid Journey if I was the pounder mid journey I would be putting all my resources and making some sort of uh mid Journey canvas app for iPad so in this one I've asked mid journey to create uh Balenciaga Naruto which now I'm realizing kind of looks like me um so let's think about the iPad it's like this big slab that you can like touch and Fiddle with right so what do I want to do okay I want to like edit this photo um but first I need to make space how do I do that well very easy you just you know um you can just zoom out and now you have extra space very obvious we do this all the time um I kind of think my cat would look really good in that outfit so I kind of want to find a way to do that here let me just ask AI real quick um hey random AI send me pictures of my cat and you know the AI knows me and has context and gives me pictures of my cat and then what do I do here well why can't we just take one of the photos and sort of just blend them with the other um and the metaphor you're seeing here as you sort of work with these photos they start glowing when you pick them up and what does light you guys know the Pink Floyd uh Dark Side of the Moon album cover like we're really familiar with the idea that light can sort of provide different colors and and sort of concentrate back into one form and we're leaning into that metaphor here implicitly um and so it's now created something that looks 50% human 50% cat 100% cringe I don't really like this how do we remix this what is a gesture what is the thing we do in real life that's remixing um for me it's a margarita and for Sam it's her morning hu we shake a blender bottle so why why can't we work with intelligent materials the same way that we work with real materials and just blend it out this is totally doable right now David why aren't you building this if you don't build this I'm going to build this it's fine um so you know here the metaphor is like what we're trying to say is you know think about familiar Universal metaphors like physics like light like metaballs like squishy like fog whatever because you know if you're designing an iPhone you have to be very cognizant of the qualities of Al aluminium and titanium to make an iPhone but generative intelligence is a probabilistic material that's sort of more fluid maybe it's fog maybe it's Mercury um and you know for this reason maybe metaphors that are really rigid like wood or paper or metal aren't the right metaphors to use for some of these experiences um so finally we want to walk you through an experience that's inherently mixed modal um/ mixed reality um let's imagine for a second there's a piece of Hardware coming out that's a wearable that has a camera on it and has a microphone and can maybe project things I don't know if such a thing will ever exist but let's imagine for a second it does um I'm sort of browsing this book this Beyonce tour book and I see these images that I find really inspiring um what I'm trying to do here is what if I could just point at something on my desk and say like this is cool and have the sort of device uh pick up on that and and and indicate that it's heard me and it's going to do something by by sort of projection mapping this sort of feedback um this is you know this demo doesn't really have sound but the way this would work is ideally a combination of voice and gesture at the same time um and obviously this gesture is really easy to make mistakes with so anytime you work with probabilistic materials you want to provide a graceful way out so in this case I've accidentally tapped this photo why can't I just flick it away like dust and be like that that's wrong I don't want to press an undo button I don't want to press command Z I just want to flick it away um really leaning to physics of it um so now that I found two pieces I'm kind of like okay I want to send this to two of my friends who there was a friend who I said I would do Halloween with but I can't remember their name um what do I do here I should ask AI I should be like who is that friend I said i' spend Halloween with and you notice here that like we're imagining sort of projection mapped UI pieces that can work with the context of the world you're in right now such that you don't have to go fish out a phone or use cumbersome voice commands um it just all sorts of naturally meling with the world um and you know crucially I think one point we want to make is voice in doesn't need to mean voice out gesture in doesn't need to mean gesture out and visual UI in does not need to mean visual UI out we can mix these modalities in real time for whatever makes sense in whatever context you're in so given that Interac that require multiple simultaneous inputs are now possible um it's our job as designers and developers to sort of think on behalf of the user and think when what's the appropriate output given the current context and be smart about it um yeah yeah so again the takeaways as we mentioned it's this idea of we have a lot of sensors and and contextual modalities available to us as ingredients even today there will be more tomorrow as you kind of saw with these upcoming uh potential Hardware releases um but even even now with a laptop with things like typing speed with things like uh the tone of voice there's a lot of ways that you could gather context and extract signals from it you could choose to process it in a variety different ways and so all of that can H now be passed to an llm and used in a reasoning layer which decides how um both to respond in words and also how to present that information um and so basically everything can now be an input and your output could be everywhere and have every format um at the same time one might say everything everywhere all at once well you want to be intentional with it you know you if someone wants to generate a photo on their Apple watch you're like why why like no use your freaking phone Jesus um anyway and the last thing we'll say is um probabilistic interfaces are hard because they have lots of different outputs so a really great way to sort of ground these interfaces is to lean into familiar metaphors whether they are from nature from physics or even from human-made tools and materials like buttons for now um and you know social norms is also a material that we work with right so your banking AI agent probably shouldn't be able to have a deep philosophical chat with you that just socially doesn't make sense that we exactly um but on the same note we we we've related all these interfaces to what humans perceive and experience now but what might truly intelligent interface look like in the future where if we think we where we are right now isomorphism what is the abstraction later above that and that's kind of for us to figure out um so with that um yeah think that's all thank [Applause] you

Original Description

ChatGPT was a turning point for consumer adoption of AI due to its easy-to-use interface. Just by changing some elements of design, interaction, and behavior, an existing model suddenly 'clicked' in terms of its utility for everyday people. What might be the next leap forward for making AI-driven applications even more accessible & intuitive? Join Sam & Jason as they showcase various demos of novel interaction & behavior paradigms for AI-driven applications. Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair About Samantha Whitmore Former Head of Engineering at Kensho, a startup which used early NLP techniques to organize information for financial clients, including Goldman Sachs, BAML, and JPMC. Kensho was acquired in 2018 by S&P Global for $550mm, at the time the largest Al acquisition in history. Subsequently was Head of Engineering at Maximus, a startup that partnered with IMAX to build video super-resolution software. Recently was one of the early core contributors to LangChain (pioneered the implementation of Memory). About Jason Yuan Former member of Apple Design Team where he worked on the future of computing and artificial intelligence. Founder and co-inventor of MakeSpace (now known as Sprout), a multi-player-first video conferencing platform. Creator of mercuryos.com and helped pioneer ideas in generative interfaces. Worked on projects with culture makers like Blackpink, Chanel, Vogue, Jackson Wang, The MET Gala, Nike, Christina Aguilera, FKA Twigs and The Weeknd.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AI Engineer · AI Engineer · 12 of 60

1 AI Engineer Summit 2023 — DAY 1 Livestream
AI Engineer Summit 2023 — DAY 1 Livestream
AI Engineer
2 AI Engineer Summit 2023 — DAY 2 Livestream
AI Engineer Summit 2023 — DAY 2 Livestream
AI Engineer
3 Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)
Principles for Prompt Engineering - Karina Nguyen (Claude Instant @ Anthropic)
AI Engineer
4 Announcing the AI Engineer Network: Benjamin Dunphy
Announcing the AI Engineer Network: Benjamin Dunphy
AI Engineer
5 The 1,000x AI Engineer: Swyx
The 1,000x AI Engineer: Swyx
AI Engineer
6 Building AI For All: Amjad Masad & Michele Catasta
Building AI For All: Amjad Masad & Michele Catasta
AI Engineer
7 The Age of the Agent: Flo Crivello
The Age of the Agent: Flo Crivello
AI Engineer
8 See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman
See, Hear, Speak, Draw: Logan Kilpatrick & Simón Fishman
AI Engineer
9 Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
AI Engineer
10 Pydantic is all you need: Jason Liu
Pydantic is all you need: Jason Liu
AI Engineer
11 Building Blocks for LLM Systems & Products: Eugene Yan
Building Blocks for LLM Systems & Products: Eugene Yan
AI Engineer
The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer
The Intelligent Interface: Sam Whitmore & Jason Yuan of New Computer
AI Engineer
13 Climbing the Ladder of Abstraction: Amelia Wattenberger
Climbing the Ladder of Abstraction: Amelia Wattenberger
AI Engineer
14 Supabase Vector: The Postgres Vector database: Paul Copplestone
Supabase Vector: The Postgres Vector database: Paul Copplestone
AI Engineer
15 [Workshop] AI Engineering 101
[Workshop] AI Engineering 101
AI Engineer
16 The Hidden Life of Embeddings: Linus Lee
The Hidden Life of Embeddings: Linus Lee
AI Engineer
17 [Workshop] AI Engineering 201: Inference
[Workshop] AI Engineering 201: Inference
AI Engineer
18 The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex
The AI Pivot: With Chris White of Prefect & Bryan Bischof of Hex
AI Engineer
19 The AI Evolution: Mario Rodriguez, GitHub
The AI Evolution: Mario Rodriguez, GitHub
AI Engineer
20 Move Fast Break Nothing: Dedy Kredo
Move Fast Break Nothing: Dedy Kredo
AI Engineer
21 AI Engineering 201: The Rest of the Owl
AI Engineering 201: The Rest of the Owl
AI Engineer
22 Building Reactive AI Apps: Matt Welsh
Building Reactive AI Apps: Matt Welsh
AI Engineer
23 Pragmatic AI with TypeChat: Daniel Rosenwasser
Pragmatic AI with TypeChat: Daniel Rosenwasser
AI Engineer
24 Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
AI Engineer
25 Retrieval Augmented Generation in the Wild: Anton Troynikov
Retrieval Augmented Generation in the Wild: Anton Troynikov
AI Engineer
26 Building Production-Ready RAG Applications: Jerry Liu
Building Production-Ready RAG Applications: Jerry Liu
AI Engineer
27 120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson
120k players in a week: Lessons from the first viral CLIP app: Joseph Nelson
AI Engineer
28 The Weekend AI Engineer: Hassan El Mghari
The Weekend AI Engineer: Hassan El Mghari
AI Engineer
29 Harnessing the Power of LLMs Locally: Mithun Hunsur
Harnessing the Power of LLMs Locally: Mithun Hunsur
AI Engineer
30 Trust, but Verify: Shreya Rajpal
Trust, but Verify: Shreya Rajpal
AI Engineer
31 Open Questions for AI Engineering: Simon Willison
Open Questions for AI Engineering: Simon Willison
AI Engineer
32 Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
Storyteller: Building Multi-modal Apps with TS & ModelFusion - Lars Grammel, PhD
AI Engineer
33 GPT Web App Generator - 10,000 apps created in a month: Matija Sosic
GPT Web App Generator - 10,000 apps created in a month: Matija Sosic
AI Engineer
34 Using AI to Build an Infinite Game: Jeff Schomay
Using AI to Build an Infinite Game: Jeff Schomay
AI Engineer
35 How to Become an AI Engineer from a Fullstack Background - Reid Mayo
How to Become an AI Engineer from a Fullstack Background - Reid Mayo
AI Engineer
36 The Code AI Maturity Model and What It Means For You: Ado Kukic
The Code AI Maturity Model and What It Means For You: Ado Kukic
AI Engineer
37 AI Engineer World’s Fair 2024 - Keynotes & Multimodality track
AI Engineer World’s Fair 2024 - Keynotes & Multimodality track
AI Engineer
38 From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
From Text to Vision to Voice Exploring Multimodality with Open AI: Romain Huet
AI Engineer
39 The Making of Devin by Cognition AI: Scott Wu
The Making of Devin by Cognition AI: Scott Wu
AI Engineer
40 The Future of Knowledge Assistants: Jerry Liu
The Future of Knowledge Assistants: Jerry Liu
AI Engineer
41 Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney
Llamafile: bringing AI to the masses with fast CPU inference: Stephen Hood and Justine Tunney
AI Engineer
42 Open Challenges for AI Engineering: Simon Willison
Open Challenges for AI Engineering: Simon Willison
AI Engineer
43 Lessons From A Year Building With LLMs
Lessons From A Year Building With LLMs
AI Engineer
44 From Software Developer to AI Engineer: Antje Barth
From Software Developer to AI Engineer: Antje Barth
AI Engineer
45 Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner
Unlocking Developer Productivity across CPU and GPU with MAX: Chris Lattner
AI Engineer
46 Copilots Everywhere: Thomas Dohmke and Eugene Yan
Copilots Everywhere: Thomas Dohmke and Eugene Yan
AI Engineer
47 Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
AI Engineer
48 Low Level Technicals of LLMs: Daniel Han
Low Level Technicals of LLMs: Daniel Han
AI Engineer
49 Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta
Emergence Launch: AI Agents and the future enterprise: Dr. Satya Nitta
AI Engineer
50 How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
How Codeium Breaks Through the Ceiling for Retrieval: Kevin Hou
AI Engineer
51 What's new from Anthropic and what's next: Alex Albert
What's new from Anthropic and what's next: Alex Albert
AI Engineer
52 Using agents to build an agent company: Joao Moura
Using agents to build an agent company: Joao Moura
AI Engineer
53 Decoding the Decoder LLM without de code: Ishan Anand
Decoding the Decoder LLM without de code: Ishan Anand
AI Engineer
54 Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
AI Engineer
55 Building with Anthropic Claude: Prompt Workshop with Zack Witten
Building with Anthropic Claude: Prompt Workshop with Zack Witten
AI Engineer
56 Building Reliable Agentic Systems: Eno Reyes
Building Reliable Agentic Systems: Eno Reyes
AI Engineer
57 10x Development: LLMs For the working Programmer - Manuel Odendahl
10x Development: LLMs For the working Programmer - Manuel Odendahl
AI Engineer
58 Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner
Disrupting the $15 Trillion Construction Industry with Autonomous Agents: Dr. Sarah Buchner
AI Engineer
59 Hypermode Launch: Kevin Van Gundy
Hypermode Launch: Kevin Van Gundy
AI Engineer
60 Git push get an AI API: Ryan Fox-Tyler
Git push get an AI API: Ryan Fox-Tyler
AI Engineer

The video discusses the evolution of intelligent interfaces and explores the potential of multimodal input and output methods to create more immersive and interactive experiences. It highlights the use of LLMs and other technologies to enable more natural and adaptive interactions. By understanding the concepts and techniques presented in the video, viewers can build intelligent interfaces using LLMs and design adaptive AI systems.

Key Takeaways
  1. Use LLMs to make decisions and respond to user input
  2. Adapt to user's context and circumstances using pose detection and other technologies
  3. Use multiple signals to extract information from user's behavior
  4. Blend photos of a cat and a human to create a new image using AI
  5. Combine voice and gesture input to control a device
💡 Probabilistic interfaces can be grounded by leaning into familiar metaphors from nature, physics, or human-made tools and materials.

Related Reads

Up next
Watch this before applying for jobs as a developer.
Tech With Tim
Watch →