Large Language Models with Don Rosenthal

Smart Products

Treść dostarczona przez Himakara Pieris. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Himakara Pieris lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.

12M ago 49:57

MP3•Źródło odcinka

I'm excited to share this episode with Don Rosenthal. Don is a seasoned product leader with extensive experience in AI and large language models. He has led product teams at Google AI research, Facebook Applied AI, and Uber's Self-Driving Technology division. During this conversation, Don shared his insights on the anatomy of an LLM, ways to incorporate LLMs into products, risk mitigation strategies, and taking on LLM-powered projects.

Links

Don on LinkedIn

Attention is all you need

The Illustrated Transformer

Transcript

[00:00:00] Don Rosenthal: please, please, please do go out and do come up with unique and exciting, uh, important new applications, build stuff that solves important problems we couldn't even try to address previously. I just want you to be sure that, uh, you're going into this with your eyes open and that you've prepared your stakeholders properly.

[00:00:21] Don Rosenthal: There, um, there are a lot of successful applications that have been built with these LLMs and a lot of the pioneers have discovered all the pitfalls and where all the dragons hide so that we can We can avoid them.

[00:00:35] Himakara Pieris: I'm, Himakara Pieris. You're listening to smart products. A show where we, recognize, celebrate and learn from industry leaders who are solving real world problems. Using AI.

[00:00:46]

[00:00:47] Himakara Pieris: , today we are going to talk about large language models. And I can't think of a better person to have this conversation with than Don Rosenthal. So Don has [00:01:00] spent most of his career in AI.

[00:01:02] Himakara Pieris: He started out as a developer. building ground support systems for the Hubble telescope, um, including being part of the team that built the first air ground system ever deployed for a NASA mission. He then went on to build and manage NASA's first AI applications group, where his team flew in, flew the first two AI systems in space.

[00:01:22] Himakara Pieris: And he worked on prototype architectures for autonomous Mars rovers, done then commercialized. Uh, the air technology from Hubble Telescope in two of his air companies that he founded. He was the group product manager for autonomy at Uber 80 G. Uber's autonomous vehicle spin off in Pittsburgh. He was the PM for face recognition at Facebook.

[00:01:43] Himakara Pieris: And most recently, Don was the group product manager for conversational at a I research

[00:01:50] Himakara Pieris: done. Welcome to the smart production.

[00:01:53] Don Rosenthal: Thank you very much. I'm really, really excited to be here. You might. Thank you for inviting me.[00:02:00]

[00:02:01] Himakara Pieris: So let's start with the basics. What is an LLM?

[00:02:05] Don Rosenthal: Good place to start. Um, let me start out by saying that, uh, LLMs have finally solved, and I don't think that's really an exaggeration.

[00:02:14] Don Rosenthal: They finally solved one of the longstanding foundational problems of natural language understanding. Understanding the user's intent. Um. What do I mean by that? Um, uh, any one of us who's used the recommender system for movies, TV, music, which pretty much all of us, um, we know how frustrating it can be to try to get the system to understand what we're, we're looking for.

[00:02:40] Don Rosenthal: These systems have all trained us to dumb down our queries. Uh, in order to have any chance of a successful retrieval, you can't talk to in the way you would to a friend or or to any other person. You can't, for example, say, Hey, um, I like all kinds of music. Uh, the genre is not [00:03:00] important, jazz, pop, classical, rock, even opera, as long as it's got a strong goosebump factor, put together a playlist for me with that kind of vibe for the next 30 minutes while I do chores.

[00:03:13] Don Rosenthal: But you can, in fact, say that to, uh, something that's got a large language model in it, like chat gbt. And go ahead and try it. When I did, I even asked it if it understood what I meant by goosebump factor, assuming I'd have to explain it, but it said, Sure, I know what it is and gave me a perfectly reasonable explanation and definition of it.

[00:03:36] Don Rosenthal: So... Why and how is it able to do that? Um, we can get into the technology a little bit later, but for the 3, 000 foot level to start with, the point is that through an absolutely enormous amount of training, um, these systems have internally created a highly nuanced model of language. Which they can [00:04:00] then use for the semantic understanding of language that is input to it, as well as to craft highly nuanced and natural sounding language responses.

[00:04:09] Don Rosenthal: Um, and it's important to, to, uh, to underscore that these are the two things that large language models do really well. Um, semantic understanding of language and its input to it, and Uh, highly nuanced and natural sounding land, which responses and yes, they hallucinate and they make up stuff out of thin air.

[00:04:30] Don Rosenthal: But the interesting thing is that they always seem to hallucinate within the correct context of your query. So, you know, if you ask them about strawberries, it might make stuff up about strawberries, but it's not going to make stuff up about fire engines. And as for the highly nuanced Natural sounding responses.

[00:04:53] Don Rosenthal: Um, just, uh, remember, for example, the response to the, uh, the query of generating instructions for [00:05:00] removing a peanut butter sandwich from a VCR written in the style of the ST James Bible, which kind of broke the Internet last November.

[00:05:10] Himakara Pieris: Take us inside an LLM. Um, what makes this technology so transformative, if you will?

[00:05:17] Don Rosenthal: Um, I'm not going to go into the, the technical details of how they work, but, um, it'd be great to be able to cover. Why they're so important and what has enabled them to become the agent of change in LLP to become so transformative. Um, and if you are interested in, in more details, the original paper from 2017 is attention is all you need.

[00:05:42] Don Rosenthal: It's all over the internet. You can find it easily. I'd also recommend, um, the Illustrated Transformer by Jay Alamar, A L A M m a R, who is well known for his, uh, incredible capability of helping you to easily understand complicated, [00:06:00] uh, concepts. And if you'd rather watch a video than, than read an explanation to check out his video.

[00:06:06] Don Rosenthal: The narrated transformer anyway, six transformers were able to. Help us leapfrog into the current generation of NLP tools. It's kind of important to first explain the state of the art just prior to their introduction, if that's okay. So, at that time, uh, NLP, the NLP world was using a set of technologies which were, uh, grouped together under the subfield of recurrent neural networks.

[00:06:34] Don Rosenthal: Um, not a very descriptive name, um, But the TLDR is that these technologies took the input sequence, any type of sequence, but let's say with language, so sequence of words in the sentence, um, and, uh, the RRN took the sequence of words, fed them in, in order, one at a time, the, quick, brown, fox, etc. [00:07:00] But they included a really novel component, which enabled feedback connections that allowed them to inject information from previous time time steps.

[00:07:09] Don Rosenthal: And this is what enabled them to capture contextual dependencies between words in a sentence instead of just having a look at one particular word. But so when quick was input, you get some feedback from the When brown was input, some feedback from the quick problem with this was, I mean, it was, it worked well for the time, but the problem was that the farther along in the sentence you got, the weaker the feedback was from the early previous steps.

[00:07:39] Don Rosenthal: So by the time you got to the end of the input sequence, um, the system may have been left with so little signal from the initial inputs. that they had very little effect on the evaluation of the sequence. So, uh, put that all together. Words that were closer to each other affected each other more than words that were farther apart in [00:08:00] trying to understand what the sentence meant.

[00:08:02] Don Rosenthal: And obviously that's a problem because language isn't constructed that way. Um, it also meant that sequences could only be evaluated. Sequentially, one at a time, and that made RNN processing really low. So the two stripes against RNNs, although they were really valuable for the time, was that they focused more on words that happened to be closer together in a sentence, and that they only processed sequentially, uh, one word at a time.

[00:08:31] Don Rosenthal: Um, so then along came transformers with a new idea, which was, let's present all of the words in the sequence to the transformer at once, all at the same time. Thank you. And this lets the system evaluate the connections between each word and every other word, um, regardless of where they show up in the sentence, um, and it can do this to figure out which words should pay particular attention to which other words.

[00:08:58] Don Rosenthal: And that's the intention part [00:09:00] of Attention is all you need. So no longer the words have to be close to each other to capture the contextual relevance between them. But it also meant, and this was the other key, uh, improvement. You could now evaluate all of the words in the input in parallel instead of analyzing one word at a time in order.

[00:09:18] Don Rosenthal: And I'm being a little bit hand wavy and imprecise. But, um, I'm trying to give you the intuition about how these work rather than, than teach you how to build one. Um, but at this point now, we could analyze semantic information equally between all combinations of words, no matter where they appeared in this, in this sequence, and we could do this in parallel.

[00:09:39] Don Rosenthal: So. NLP is solved, right? Um, unfortunately, not so fast. Uh, transformers, yes, they could analyze the semantic connections between all pairs of words, and yes, you could do this, you could do a lot of the work in parallel, um, but if you look a little closer, you see [00:10:00] that we've actually created a lot of extra computing for ourselves.

[00:10:04] Don Rosenthal: Transformers evaluate semantic connections between every word in the sequence with every other word in the sequence. Which means that as the sequence goes longer, uh, the number of pairs that you've got to analyze not only grows, but it grows incredibly quickly. Um, uh, a sense of two words. It's one communication path between one and two.

[00:10:26] Don Rosenthal: Uh, three words, three communications. One and two, two and three, and one and three. Uh, you get to 10 words, it's 45 communication paths. And, um, as the group size grows, The speed at which the number of communication pads grows, uh, um, that accelerates. And if you get to 10 24 people, the number of paths is, uh, over half 1,000,523, uh, 776 to be precise.

[00:10:57] Don Rosenthal: And I know that's correct 'cause I asked Chad g p t [00:11:00] to calculate it for me. So, um, So when you input, uh, a large document, your resume, uh, for example, you're inputting a very large sequence, which then requires a lot of computation. Um, and, um, you probably run into the term context window and that roughly maps to the size of the, the input, um, uh, uh, uh, size, the input length.

[00:11:25] Don Rosenthal: And um, now you kind of understand how, even though we can parallelize it, um, Give one core of each GPU each word, uh, to do it to evaluate in parallel with other words. Um, uh, even if you could do this, it requires, uh, a lot of GPUs. A lot of G TPUs to, to enable the parallel analysis. Uh, and, um, while the attention mechanism, um, has enabled some incredible advances in N L P, you never get something, uh, for nothing.

[00:11:59] Don Rosenthal: So [00:12:00] we've taken a big, taken a big step. We've got new problems to solve, but, um, we've getting, we've gotten a, we've gotten a lot of value out of this new, new approach.

[00:12:10] Himakara Pieris: , so we talked about what makes. LLMs to transformative. And we are seeing a lot of LLMs coming to market. Some of them are from big tech companies like OpenAI, Azure, from Microsoft.

[00:12:22] Himakara Pieris: And then we have Titan from Amazon and Palm from Google. Some of them are open source, Eleuther, Falcon, et cetera. And we're also seeing large language models coming to market from. Not so big tech companies and also not so very tech companies like, um, from the not very big company standpoint, Databricks, Dali, Dialpad, GPT from Dialpad and Bloomberg GPT where they built something from ground up.

[00:12:48] Himakara Pieris: So could you talk us through what it takes to build an LLM and also who should consider building an LLM on their own?

[00:12:58] Don Rosenthal: That's a critically important [00:13:00] question. I'm really glad you asked this. To train up an LLM from scratch requires a lot of compute resources. Which means you need a lot of money. Um, Microsoft invested 12 billion or more in open AI.

[00:13:16] Don Rosenthal: I'm pretty sure that they're not hiring 12 billion worth of AI researchers. I'm guessing that much of the money is covering the ongoing costs of compute resources. And in addition, you need a lot of training data and that represents a lot of time you've got to do a lot of, of, of training, even though you don't have to do with what you did in the old days of discriminative AI, you don't have to label it.

[00:13:45] Don Rosenthal: It's still, if you're taking the full, full corpus of the Internet and trading on that, it's going to take a lot of time to do that. So, uh, if you have one takeaway from this conversation today, I hope it's that, um, you should [00:14:00] lead the development of LLMs, uh, to the well heeled, uh, big tech companies who have a lot of money and a lot of people, and many of them also design and build and own their own compute resources.

[00:14:14] Don Rosenthal: Or leave it to the open source community, which at least Has a lot of people, uh, it's really hard to overestimate the amount of work, uh, the cost and the time required to develop these on your own. Plus, once you've done the initial training of the model, you then need a very large user community to evaluate what's being generated.

[00:14:35] Don Rosenthal: So people can say, Oh, we like the way it responds here, but we don't like the way it responds here. So those preferences can be fed back into the training of the models. Uh, and it creates, um, the naturalness of the responses today. That all has to be done manually, which again takes a lot of money and a lot of time.[00:15:00]

[00:15:00] Don Rosenthal: That makes, um, that makes a lot of sense. So the alternate approach to building your own LLM, um, is the closest alternate is fine tuning your own. And matter of fact, DALI and, and, um, Dialpad GPD seems to be fine tuned large language models. What are your thoughts on fine tuning? Great. Another really, really important question.

[00:15:23] Don Rosenthal: Um, first off, it's a, it's a term that's used very imprecisely these days. So think of fine tuning in terms of transfer learning. If you have trained a system, how to process English, it should be easier to train it, to process a second related language. You don't have to start from scratch. If you've trained a model, not allowed to language model necessary, but if you train a model to play one card game, It already understands what face cards are, what suits are, etc.

[00:15:51] Don Rosenthal: It's easier to train it on, um, a second game. But, um, if there's a second thing that I, that I hope you take away [00:16:00] from this conversation, it should be to, uh, uh, try to avoid fine tuning as well. Uh, the process of fine tuning is exactly the same process as that of the initial training. But at a somewhat smaller scale, you're retraining a previously trained model.

[00:16:17] Don Rosenthal: Um, at a smaller scale, yes, but when your initial size is enormous, this doesn't necessarily get you out of the problem of needing a lot of time. And money, um, remember that what fine tuning actually creates is a new variant, a new variation of the original next word, but it's basically a continuation of the, of the model training.

[00:16:41] Don Rosenthal: And in addition, there's some potential downsides, such as there's a thing called catastrophic forgetting, which we've known about in deep learning since the early days. And this is where the changes made in the weights of a neural network, stable neural network that's been trained when you teach it, [00:17:00] when you train it on a new task, can affect its ability to perform on tasks it was previously trained on.

[00:17:09] Don Rosenthal: But I also want to underscore this is not the city that fine tuning. is a bad, uh, idea for every situation. For example, a really reasonable, uh, use case for fine tuning an LLM is if you'll be operating in a domain where there is a very specialized vocabulary and a vocabulary that's reasonably stable, such as, uh, medicine, uh, or pharmacy.

[00:17:36] Don Rosenthal: So for many practical reasons, A lot of the companies, especially small to medium sized companies, are now going to build their own LLM and fine tune their own LLM. So what are some other mechanisms available? for these product teams to incorporate LLM features into their products. Great, great, great, great.

[00:17:56] Don Rosenthal: So, um, many of the capabilities that you might think [00:18:00] require fine tuning, uh, can be addressed by using other techniques, um, in systems. external to the LLM. So instead of trying to teach one monolithic model to do everything, um, connected to models, uh, out or, or other systems, even conventional systems, external to the LLMs.

[00:18:24] Don Rosenthal: One of those, which is getting a lot of attention these days is something called retrieval augmented, uh, generation. And. I'm happy to go into that particular example. If that makes sense. Yeah, that would make sense. Great. Great. Um, so, uh, let's work through an example. Um, let's say you're a law firm and you've got a huge amount of unstructured data.

[00:18:47] Don Rosenthal: You've got All of the contracts, all the litigation, all the presentations, you've got emails, you've got slack conversations, all of that, that have been generated over the life of the company. Um, [00:19:00] and you can't put these in a, an SQL database. They're, they're unstructured. So you'd like to be able to refer back to these specific documents that they're really useful.

[00:19:09] Don Rosenthal: Um, but you'd like to be able to do it without having to manually sort through. Mountains of documents, either physical documents or documents stored in the cloud, you want to be able to automate the retrieval of the appropriate documents in response to a query. So, um, let's say at first you think, ah, I'm going to fine tune my LLM on all of these documents, and that will eventually encode all of this information that all the information that's in the documents into the weights Of your neural net and the weights of your LLM, and that would allow you to use one single end to end model to solve the problem.

[00:19:45] Don Rosenthal: An interesting idea, but there are a few issues. Money and time, like we talked about before. Uh, hallucinations, which we can talk about later. But since you're working with one monolithic model, these are hallucinations that can't be caught [00:20:00] prior to sending the reply to the user. But most importantly, um, the thing that's really gonna, uh, hang you up here is that you'll need to constantly continue to fine tune your LLM as new documents are generated.

[00:20:16] Don Rosenthal: Because if the information is not encoded into the weights of the network, it has no idea that it never exists. I mean, I'm sure those of you who have used, uh, I've played around with, with, uh, chatbots and stuff. Now, um, you'll ask it a question about, uh, uh, you know, what's the, what was the score in the last, uh, Niners game?

[00:20:37] Don Rosenthal: It'll have no idea and it'll answer something like, you know, my training ended in. September of 21. I can't tell you anything about that that information. So, um, this particular use case, um, um, is really well suited for retrieval of automated augmented generation. Um, [00:21:00] the way this works. And again, I'll start at a high level.

[00:21:04] Don Rosenthal: We can get into more details. You feed your documents into an embedding engine, um, and there are plenty out there. Uh, there's good ones that are even open source like Pinecone. That encodes the semantic information of each document into an embedding, sometimes called a vector, which can then be stored in a vector database.

[00:21:24] Don Rosenthal: And these vectors incorporate, incorporate learned semantic features. Which enabled the ability to compare the semantics similar similarities of two documents with a simple arithmetic comparison. You remember back when we were talking about, uh, attention and you were comparing, um, semantic correlations between two words in a sentence.

[00:21:48] Don Rosenthal: This is, uh, this is something something akin to that. It's not exactly the same thing, but you can think of it in in the same way. Um, so you can ask, for example, uh, what VC or [00:22:00] startup contracts have we written for seed round funding using cryptocurrency and which currencies were used in each of them. And, um, you give that query to the LLM, it analysis, it analyzes it, uh, to find the important components in it.

[00:22:16] Don Rosenthal: funding, contracts, cryptocurrency, etc. It won't be just keywords, but, um, let's, let's just give you an idea of what it's talking about. And it, um, it generates an embedding of the analysis of the query with the same Embedding generator that's been used on your documents by, um, using one of these, um, uh, um, systems that allow you to build pipelines, uh, of, of systems.

[00:22:47] Don Rosenthal: Things can, an LLM can call an external system, like the embedding generator. And then once you've got the embedding, you can feed it into the vector database. And then that gets compared with the encoded query. [00:23:00] Uh, and it finds the documents that are. Uh, related to that query semantically, and that retrieval system grabs the those documents, the ones that are most closely related to the query, it sends that back to the LLM, which generates a response to the user and maybe attaches the docs or the links to them.

[00:23:23] Don Rosenthal: And. Important point here. If tomorrow a new funding document for crypto seed rounds is generated instead of having to re fine tune the model with a new document, you just feed the new document into the embedding generator, store it in the vector database, and you're, you're completely, uh, up to date. And as I said, there are pipeline, Pipeline platforms like lang chain that make it easy to connect together the external systems, uh, and the LLM.

[00:23:56] Don Rosenthal: I went through that kind of, um, quickly, but I hope that that was at least a [00:24:00] good introduction, uh, to using external systems, uh, as opposed to fine tuning.

[00:24:06] Himakara Pieris: Yeah, absolutely. Um, so the gist of that is using, um, written work meditation, you have the capability to tap into data that's, that's not in the weights.

[00:24:18] Himakara Pieris: Of the model. And this is a great way to tap into your internal systems or other parts of the ecosystem within your application and open up capabilities to your users. And it's a much easier, much faster thing to do compared to fine tuning, etc. And also, it's a way to keep your information current without having to fine tune every time.

[00:24:38] Don Rosenthal: Exactly, exactly. And a rule of thumb is remember what LLMs are good for. They're really good at understanding the query of the user. And they're really good at, um, generating highly new, nuanced, natural sounding language. All the other stuff in between, see if you can ship that out to external systems like, uh, SQL databases, vector [00:25:00] databases, math lab, uh, you know, uh, search engines, whatever, whatever is needed in your particular use case.

[00:25:08] Himakara Pieris: Great. So we talked about what LLMs are, what's inside an LLM. ways to incorporate LLM capabilities in your product. Let's talk about risks and challenges because if you are to go and pitch using these capabilities in your product, I think it's very important to understand what you're getting yourself into, right?

[00:25:29] Himakara Pieris: Exactly. Let's talk a bit about what kind of risks are there, what kind of challenges are there, and how you can plan on mitigating them as well.

[00:25:36] Don Rosenthal: Okay, so this is my personal analysis. Your mileage may vary, but there's a lot of other folks in the industry, um, that think along the same lines. Um, and, um, there are, in my mind, there are four key challenges.

[00:25:53] Don Rosenthal: Uh, for those of us who want to incorporate LLMs into their products, um, um, and as Hema [00:26:00] said at the beginning, this is an amazing time to be an AI PM because you can really quickly and really easily prototype systems, uh, that demonstrate their potential use, um, to your end users, to your company, et cetera.

[00:26:14] Don Rosenthal: But the real challenge is the next step after prototyping. getting them ready for real world use and then scaling. So let me just start by listing these top four challenges. The first one is factuality and groundedness, um, sometimes called hallucination. Second is the high cost of inference servability.

[00:26:37] Don Rosenthal: It costs a lot to train these models. It also costs a lot to run data through them and get an answer. Um, uh, the third one is implementing guardrails against inappropriate content, which we sometimes refer to as content moderation. And the fourth one, um, is, is that this, these systems [00:27:00] represent. a major shift in product development, uh, from a series of short, plannable, progressive steps to loops of experimental, uh, experimentation, trial and error, et cetera, because these systems are non deterministic.

[00:27:17] Don Rosenthal: Um, and it's critically important to be aware of these issues and go into a project. With your eyes open, um, and have buy-in from all, uh, of the stakeholders because these challenges are very likely to manifest as, uh, increased costs, uh, and increased time to market compared to what your stakeholders are used to.

[00:27:40] Don Rosenthal: So make sure these are discussed and that you derive the consensus from the, from the very beginning. So if it's okay, I'm going to start with the brave new world of non deterministic systems because folks are likely likely familiar with the other three. Uh, but this one is the least talked [00:28:00] about and the least acknowledged, and maybe the first time some folks have, have heard, I've heard it, uh, brought up.

[00:28:08] Don Rosenthal: Um, so, um, uh, so the first thing to come to grips with is that, um, productizing large language models does not neatly fit into our, our tried and true software engineering practices. Uh, LLMs are non deterministic. Instead of being able to predict the expected output for given out input, um, uh, uh, and even, um, you know, the, the, the exact same inputs can even produce, uh, different outputs.

[00:28:42] Don Rosenthal: Uh, you, you, um, this is something that we're not, we're not used to in developing products. And, um, in addition to their, the outputs not being predictable, the evaluation of these systems is more art than science. So, Um, [00:29:00] there are academic benchmarks, and the academic benchmarks work really well for academic pace papers where you're trying to prove progress against the last published state of the art architecture.

[00:29:12] Don Rosenthal: We beat this by, you know, X amount of X percent, but these academic benchmarks don't add very much for Evaluating user experience, um, which is something that's critical for us as product managers. Um, and evaluating the user experience these days, um, typically involves subjective human rating, which is lengthy and expensive.

[00:29:39] Don Rosenthal: So it can be hard to one measure progress on your project and also to objectively. Define success. Um, again, that was kind of high level. Let me go into a little bit more detail. Um, we're used to being able to, uh, as product managers uncover a significant business [00:30:00] problem, generate the requirements, then developers code up the solution and Q.

[00:30:05] Don Rosenthal: A. Test it to uncover any bugs. Developers Fix the bugs, you QA again, rinse and repeat, and finally you get to a point where for any given input, you can predict what the output will be, and if the output isn't that, it's a bug, um, and you can generate regression tests to ensure that any new changes, uh, haven't broken old capabilities, and the entire modern discipline of software engineering Is built around this type of sequence, but with deep neural nets in general, um, and especially with the enormous models like LLMs, this goes almost completely out of the window.

[00:30:48] Don Rosenthal: Um, so let's look at a completely new environment in the lab where, um, the LLMs are being developed research. And again, this is true for any deep network, but let's stick with these [00:31:00] really, really enormous ones. Researchers laboriously train these models over and over and over again. Each time of the random starting point and they keep looking for generating one that they That they like depending on what particular metrics they're using at that point You start doing experiments and it's really Empirical it's like a biologist With a finding a discovering a new organism, you poke it, you prod it to see what it does, and it doesn't do and how it behaves.

[00:31:34] Don Rosenthal: Um, I'm not exaggerating here. This is really the way you get to know your new model. And in addition, you are likely to discover, um, what are called emergent behaviors. These are capabilities that through experimentation, you discover even though you never explicitly trained the model to do that. Okay, then [00:32:00] you step back and assess what you've got and see how useful it is for the things you hoped it would be useful for and specifying what on effect unexpected things it could be good for that you hadn't anticipated.

[00:32:14] Don Rosenthal: And there's no debugging in the classic sense, uh, because there's no human readable code, right? This is all machine learning. This is all, um, uh, generated, uh, on its own. And, uh, the deep dark secrets are hidden in the weights and biases of the model. It's all about things that the system is, has learned from analyzing, uh, the training data.

[00:32:39] Don Rosenthal: Um, and forget regression testing because at least in the classic sense. Because these models are not not deterministic for any. Input. You cannot predict the output. So the best you could do is evaluate if the output is reasonable, and that takes a lot of time and more [00:33:00] money than the stakeholders might have expected.

[00:33:03] Don Rosenthal: So even though you're hopefully not the ones building these models from scratch, when you're working on the addition of any LLM to a product, or even just evaluating which one out of all the choices we have these days will be the best for your application, These are likely the things that you have to be prepared to deal with, and I want to make a really important important point.

[00:33:25] Don Rosenthal: Now, after painting that, uh, picture of doom and gloom, please don't misunderstand. I'm not trying to scare you away from using this technology. You can build. Really, really important, impressive products with them. There will likely be very few opportunities in your professional lifetime, uh, to apply such transformational technologies.

[00:33:48] Don Rosenthal: Uh, thank you again for the, for the pun, um, but

[00:33:52] Don Rosenthal: please, please, please do go out and do come up with unique and exciting, uh, important new applications, build stuff [00:34:00] that solves important problems we couldn't even try to address previously. I just want you to be sure that, uh, you're going into this with your eyes open and that you've prepared your stakeholders properly.

[00:34:13] Don Rosenthal: There, um, there are a lot of successful applications that have been built with these LLMs and a lot of the pioneers have discovered all the pitfalls and where all the dragons hide so that we can We can avoid them.

[00:34:31] Himakara Pieris: Very good. I think it's it seems to be also a function of picking the use case as well. Correct. You know, you have in some use cases, you have human surprises who could be built into the loop and you know, those would be the low hanging fruit. Could you talk a bit about how to think about picking the right use cases as well?

[00:34:50] Don Rosenthal: Yeah, that's a really important, important point. So all of the low hanging fruit, um, yeah. Uh, the things that you could build just with one single [00:35:00] monolithic, uh, uh, LLM, those have all been been gobbled up, um, and we're looking at at more, uh, uh, we're looking at more complicated problems now and more interesting problems, more valuable problems to solve.

[00:35:15] Don Rosenthal: Um, and one good way to evaluate the use case that you're looking at as was just mentioned is if your use case. Already has, um, built in human, uh, supervision or evaluation or, uh, for example, why was, why was copywriting one of the first, um, one of the first, uh, low hanging fruit use cases that was gobbled up?

[00:35:42] Don Rosenthal: Because built into the workflow of copywriting is a human editor who always reviews, corrects, sends back to the junior copywriter. Um, uh, you know, all of the copy that's been generated. [00:36:00] Before it goes out for publication, um, uh, the, um, uh, the senior editor comes up with a, with something that needs to be written, sends it off to a copywriter, the copywriter comes up with a version, the editor marks it up, sends it back, they iterate, eventually they get one that is, uh, yeah, that passes, uh, muster for the, uh, The editor, and, um, it goes to, to publication.

[00:36:29] Don Rosenthal: Um, if you think in terms of that, you'll understand that, okay, you don't need to get rid of the junior copywriters. They can use these LLMs to help them, but if you have an application where there is something Um, like a senior editor already in the workflow. Um, this is a really good, uh, application to start with.

[00:36:54] Don Rosenthal: You've got somebody there that can check for hallucinations, for inappropriate, uh, [00:37:00] uh, content. Et cetera, et cetera. Great. So you are a product leader with decades of experience in AI. Um, in addition to picking the right use case, what other advice would you have for PMs who are interested in getting into AI?

[00:37:16] Don Rosenthal: Um, well, uh, This may not be for for everybody, but I'm I'm a PM that really likes to get his hands dirty. I like to stay really technical. I don't have to be good enough to code these things, but I want to understand in some depth the, um, the technical details. That's important for a couple of reasons. Um, one of them is Uh, you want to be able to, um, properly represent what can be built in this to that can be built with this technology to your end users.

[00:37:51] Don Rosenthal: So you're not just making up stuff and you want to be able to have discussions with your technical teams. Um, in a way that makes [00:38:00] sense for them. It doesn't waste their time when you get new ideas. Uh, Hey, we might be able to use it. Uh, use LLMs for this particular application. Um, here's some more details.

[00:38:13] Don Rosenthal: Uh, how does it Give it a sanity test to check. Um, uh, does this make sense? Is the technology here already? Is there anything that we need actually, um, uh, fundamental research? Are there any new ideas that we have to develop in order to make this possible? Or is this something that with, uh, with some hard Working and good, good, uh, attention to detail we can we can build great.

[00:38:43] Himakara Pieris: I want to transition and take some questions from our audience. So there is a question about. Um, LLMs in healthcare domain specifically, are there any specific use cases in healthcare where LLMs could be used effectively, especially in healthcare diagnostics? [00:39:00] I love this question. Uh, my second company was in, uh, in, in healthcare.

[00:39:06] Don Rosenthal: Um, and yes, absolutely. Um, uh, you know, um, medicine is fundamentally a, uh, a multimodal field. You have images, you have Um, uh, you have, uh, text, et cetera. Let's stay on the text for, um, the certain for the time being. So, for example, um, we've we've seen an example that I that I discussed already that when you have a lot of unstructured text data.

[00:39:38] Don Rosenthal: This is where things like LLMs and the stuff of processors shine. Do we have that in, uh, in, in medicine? Um, absolutely. Uh, every patient's chart is now in an ERM, an electronic, uh, EMR, electronic medical records system. Most of [00:40:00] the good information has been written into the comments, um, and there's no way to automatically retrieve it.

[00:40:07] Don Rosenthal: Just like, um, there was no way to automatically retrieve, uh, the legal documents. If it's possible to get access to that and store that in, um, a vector database. Um, so that you could when you're working with a particular patient on a particular problem, you could retrieve that information for them. Um, that would be really, really useful because a lot of that information is just stuck in some text field in some database and unusable.

[00:40:39] Don Rosenthal: Uh, not accessible to the to the medical medical staff and the clinical staff. Um, it's that's also would be extremely useful if you're doing longitudinal, longitudinal studies. Um, or if you've got, uh, um, a rash of new cases that all seem to be, uh, [00:41:00] related somehow, but you don't know what is the common factor for it.

[00:41:04] Don Rosenthal: You can get the data, uh, if you have, um, uh, uh, uh, stored in embeddings, um, all of this, uh, comment data, uh, for each of the patients, you can search through that and try and, uh, you know, some common threads. And again, this is all, uh, text data. This is all language data. So this is a really good application for it.

[00:41:30] Don Rosenthal: If you're talking about specifically, um, uh, helping, uh, to diagnose based on x rays, there are good systems in computer vision that can do that. And they generate data, but they may be a bit difficult to use. And it would be really nice if they had, uh, conversational interfaces with them. So that the radiologists could talk to their data or could talk about the data in the same way they talk to their colleagues about it, instead of [00:42:00] having to learn SQL or or come up with, uh, you know, understand the limits of some other, uh, interface they have for retrieving, uh, this data.

[00:42:09] Don Rosenthal: Um, that's a really short answer. I would love to, uh, talk more about this if you're interested. But yes, I think, uh, the medical field is a really, really good opportunity for LLMs.

[00:42:24] Himakara Pieris: And we have another great question about QA. So since there wouldn't be any deterministic output, does it mean there would be no QA involved with LLMs?

[00:42:34] Don Rosenthal: Well, I, I hope not. Um, what it means is that it's gotta be a different type of QA. Uh, we've got, uh, people that are trained up and are really amazing. Um, uh, uh, For, um, products, software products that that were were, uh, that go through Q. A. Today. Um, they may or may not be the people that, um, we would use for this type of Q.

[00:42:58] Don Rosenthal: A. But remember, there are [00:43:00] always going to be systems outside of the L. L. M. S. That will be coded up by your, uh, by your, um, uh, colleagues in whatever, in C and Python, whatever, uh, that will still have to go through standard QA, um, the, um, the problem, it's, it's still open questions about the right ways to do QA.

[00:43:24] Don Rosenthal: For the output of L. L. N. S. Uh, open a I has, uh, uh, been using, uh, reinforcement learning from human feedback. Um, it's been around for a while, but they're showing how valuable it is. But that is basically getting a lot of human Raiders. reading through the outputs and saying this one's good, this one's bad, or, or, or, um, marking them, uh, grading them on a particular scale, then that information goes back to reinforce the training of the model, um, to try and, and have it, um, tend [00:44:00] more toward the, uh, the types of, uh, outputs that, um, that got good, good scores.

[00:44:12] Himakara Pieris: We have another question here about, um, data sourcing. Um, what are the best ways to get data to tune LLMs for SMS phishing attacks into that one? But, uh, it's actually, I think the larger question of how to source data for LLMs is, I think, particularly an interesting one. So, um, the. Go on a short side trip.

[00:44:36] Don Rosenthal: This is a really great question. I don't know enough about that field to really give you a hard and fast answer. But, um, the interesting thing about working with data for these systems. Is that unlike in discriminative models, like for computer vision classification, for example, where you have to generate a bunch of data, you have to clean it up.

[00:44:58] Don Rosenthal: Then you have to [00:45:00] label it. So the, um, the, the model can learn whether it's doing a good job classifying it or not, uh, language data for LLMs, uh, is self supervised. What does that mean? Um, you grab, uh, data. Um, you grab text data from the Internet, from Wikipedia, whatever. And you give it to it, uh, one, one word at a time, for example.

[00:45:28] Don Rosenthal: Um, uh, you give it the first word in a sentence, and it's trying to, going to try and predict what the second one is. It's not going to do a very good job, but that's okay. You go on to the next one. Um, what's the, what's the third word? What's the fourth word? And little by little. After, um, an astonishing amount of training.

[00:45:48] Don Rosenthal: It's able to home in on the connections between words, even if those words are not ones that are right next to it, and make a good prediction about a reasonable, um, [00:46:00] a reasonable, uh, um, prediction for the next word. You can also do this for full sentences, give it a paragraph with a sentence missing, try and fill in the, the, the, um, the sentence, et cetera, et cetera.

[00:46:14] Don Rosenthal: You still need a lot of data. Um, and you still need to clean that data, but there are really good sources for it. And you don't have to spend the time and money to label it. So, uh, if you're in the medical field, for example, um, you can, if you get the right to use them, you can go, you can use medical texts, um, to, uh, fine tune.

[00:46:41] Don Rosenthal: Uh, for the particular vocabulary, uh, for medicine and use it by, um, in the, in the cell, in the self supervised, uh, uh, ways of, uh, training.

[00:46:55] Himakara Pieris: Great. Uh, we are a few minutes over, so let's do like one final question before we [00:47:00] wrap things up. Um, so the last question is the biggest challenge for me to advocate for AI in our product is quality, more specifically uncertainty about.

[00:47:08] Himakara Pieris: Um, quality of outputs. What are some ways to address quality and predictability concerns with LLM?

[00:47:16] Don Rosenthal: Okay. Um, if I understand the question correctly, it's, it's a really good question. We know we have these, these problems. Um, we know we have, uh, we want to solve this really important problem. How do we get it to how we get it to market today?

[00:47:32] Don Rosenthal: Today? It's a lot of manual effort. Um, uh, you can, um, find, uh, workflows where there are people in the workflow to prevent, uh, inappropriate content from getting out and such, or to catch hallucinations. You'd like to be able to, uh, generate, um, text where you don't have to worry about that. That's currently beyond the state of the art, [00:48:00] but there's a ton of really good research going on.

[00:48:03] Don Rosenthal: Um, for example, uh, just yesterday, um, from, uh, DeepMind, um, there was a paper about, um, a, uh, a new model. That could be used to evaluate the quality of language generated from an LLM, and it showed that in many cases, not all cases yet, but in many cases, it was as it generated, uh, its evaluations were as good or better.

[00:48:34] Don Rosenthal: Then what people came up with, um, and, um, haven't read the whole paper. So don't ask me how they how they decided that if you're doing human evaluation, who evaluates whether it's better than human evaluation. I'll read the paper. I'll give you an answer. Um, but, um, for now, it's a lot of manual work.

[00:48:53] Don Rosenthal: There's a lot of really, really important research being done. Keep your fingers crossed. Going forward, there'll either be new [00:49:00] architectures or new models that'll help us get out of this manual mode. Great question, though.

[00:49:05] Himakara Pieris: Thank you, Don. Thank you so much for coming on the pod today.

[00:49:08] Himakara Pieris: We'll share a recording of the video and also links to the things that Don mentioned in this. And a big thank you to also everyone who signed up and shared this with the network. If you found this interesting, please go to www. smartproducts. show to listen to other episodes as well. We'll publish this one there.

[00:49:28] Himakara Pieris: Um, there as well. It's available on Apple Podcasts, Google, Spotify, and wherever you listen to your podcasts. Don, thanks again, and uh, thank you everyone.

[00:49:38] (Outro)

[00:49:38]

[00:49:41] Himakara Pieris: Smart products is brought to you by hydra.ai. Hydra helps product teams explore how they can introduce AI powered features to their products and deliver unique customer value. Learn more at www.hydra.ai.

15 odcinków

#Tech #Entrepreneur #Business #Himakara Pieris #Artificial Intelligence #Machine Learning #Product Management

Large Language Models with Don Rosenthal

Smart Products

published 12M ago

Udostępnij