In this episode, we sit down with Santu Karmaker and Dongji Feng, creators of the TELeR Taxonomy. This pivotal framework is revolutionizing the way computer scientists benchmark complex tasks in Large Language Models (LLMs).
Our conversation goes beyond the academic realm as we explore how the TELeR Taxonomy is not just a theoretical construct but a practical tool. I’ve personally leveraged this taxonomy to significantly enhance the prompts I’m crafting. Whether you’re a researcher, developer, or just LLM-curious, this episode promises to offer valuable perspectives and strategies.
As many of you know, I’ve frequently referenced the TELeR Taxonomy in past discussions. Now, I’m thrilled to offer you a firsthand account from the creators themselves. Prepare to be enlightened by their journey and vision for the future of LLMs.
Access the TELeR Taxonomy paper here: https://arxiv.org/abs/2305.11430
Santu Karmaker- https://www.linkedin.com/in/shubhra-kanti-karmaker-676893a4/
Dongji Feng- https://www.linkedin.com/in/dongjifeng/
Santu Karmaker- https://karmake2.github.io/
Dongji Feng- https://dzf0023.github.io/
Santu Karmaker- https://twitter.com/karmake2
[00:00:00] Sean Ammirati: In today’s episode of Agile Giants, I am thrilled to have two brilliant minds behind the paper that if you’ve done any workshops with me, you’ve actually heard me talk about this. Any workshops in Gen AI, the paper is TELeR, a general taxonomy for LLM prompts for benchmarking complex tasks. Joining us today are Santu and Dongji, both esteemed researchers with a wealth of experience in the field of computer science.
[00:00:34] Sean Ammirati:
Santu is an assistant professor at Auburn, with a keen interest in the intersection of natural language processing and information retrieval. He also runs the BDI lab there. Alongside Santu, Dongji is an assistant professor at Gustavus Adolphus College, and he completed his Ph.D. in the BD lab under Santu’s guidance.
[00:00:55] Sean Ammirati:
The paper that we’re going to talk about in the taxonomy. Again, it’s this framework designed to tackle the challenging that come with benchmarking complex tasks with LLMs. But I’ve also encouraged a lot of people who do GenAI workshops with me to think about this as a way to just think about the inputs that they’re giving to these LLMs.
[00:01:20] Sean Ammirati:
Have they actually given the GenAI systems enough information to actually give them good responses back. And it’s been a really helpful framework for them. So, really excited for you to meet the minds behind this paper. I really hope you enjoy this week’s conversation on Agile Giants.
[00:01:40] Sean Ammirati:
So Santu, Dongji, thanks so much for joining me today. I’m going to ask each of you questions, but feel free to choose which of you kind of fields first and which of you kind of jumps in. But I really want this to be a conversation. I’m so excited to have you today. Maybe as a way to get started, if you could give a brief overview of the paper itself and some of the main objectives behind it.
[00:02:02] Santu Karmaker:
Thanks, Sean. I’ll start. So the goal of this paper is to enable more systematic comparison across large language models when we study them in the context of complex tasks. So what I really mean is. The definition that we actually propose in the paper for a complex task is it has to be a combination of multiple subtasks.
[00:02:32] Santu Karmaker:
So what I mean by that is. Now, there is some subjectivity to this definition, but the point is, for example, if you are just doing a sentiment classification, whether there is a positive sentiment or negative sentiment with a particular statement, you could argue that this is a simple task by saying, “Oh, I’m just looking for one label”, but one could also argue like “this is a complex task, given that you first need to look at what entities are associated with a particular sentence and what kind of actions are being taken.” You need to look at all those cases and you could say like “based on this, if this happens, then I want this to be classified as this.” So the definition of complex task is actually to some degree subjective.
[00:03:32] Santu Karmaker:
But what we assume for the definition of complex task is basically you have multiple subtasks as part of your task definition and you are using large language models, which are essentially like really powerful text generation tools these days, you are going to use them. But then comes this question because a complex task can be expressed in many, many different ways. So how can we define a taxonomy that people in general can follow and can establish a general guideline when you are referring to particular kinds of prompts or tasks you are defining and giving it to large language models, which will actually allow, like better informed and fair comparison across large language models.
[00:04:07] Santu Karmaker:
So that’s kind of the goal which both me and Dongji saw as an opportunity because large language models are so popular and there was not yet a well-defined taxonomy to these kinds of studies. So that’s the motivation.
[00:04:23] Dongji Feng:
I completely agree with Dr. Santu. So because we have seen like large language model with a huge success in different like nature language process tasks, like text summarization and text generation in their traditional setting.
[00:04:36] Dongji Feng:
But there’s no, after checking the literature, we didn’t find, like a benchmark to performing the instruction of large length model on the ill-defined complex task, which Dr. Sanju just introduced what is a complex task in our define. So to address this issue, we propose this taxonomy that can be used to design the prompt.
[00:05:01] Dongji Feng:
And this taxonomy can also help the NLP researchers to study and report their performance with different large language models and draw a more accurate conclusion. So that is our motivation.
[00:05:15] Sean Ammirati:
Given the audience here, I think they’re going to be more familiar and more kind of comfortable thinking about it in that second bucket, right? How do I kind of think about this complex task that I’m trying to accomplish? And just to kind of take the air out of the balloon, I think, you know, the people who are listening to this, who are trying to use this for complex professional tasks, so this is a good framework for all of the work that you as a knowledge worker are thinking about collaborating with a large language model to achieve.
[00:05:51] Sean Ammirati:
But just to kind of frame the benchmarking part of this for a minute and to help maybe make this real. I think every time, you know, as we’re recording it just this week, Google came out with a new LLM. And people get really excited about it. And it’s like, well, how do I compare the Google LLM versus the Open AI LLM versus Facebook’s open-source LLM.
[00:06:12] Sean Ammirati:
Right. And this provides a way to sort of look at the interaction across those different LLMs and in a benchmark. And so that’s the kind of the research. If that’s fair. And you’re nodding your head, so I’m gonna go ahead and just assume that’s a fair way to think about it. I think the thing that’s interesting to me is this paper, has been really well received because I think it helps professionals and it helps knowledge workers.
[00:06:41] Sean Ammirati:
I’m curious, you know, how did the idea come up to start working on this taxonomy? I don’t know which one of you is better to start this, but I’d be curious kind of the the origin story of this.
[00:06:52] Santu Karmaker:
Yeah, so I can definitely start on this because I clearly remember the day when we said, like, okay, we want to do something like this.
[00:07:01] Santu Karmaker:
So here is a task that we wanted to do, and it was more related to understanding news media bias. And kind of the way they report an event. So real-world events are reported by humans, right? And we have like the very popular news media across the US, and they have like different political associations.
[00:07:22] Santu Karmaker:
So we were trying to analyze the way, different news media from different political bias actually report a particular event. So, now you can imagine, like, a particular event happening, and Fox News is reporting that news, and also CNN is reporting that news. And you can easily see, the facts will be there, but then they will have their own way of interpretation, kind of like, their own set of, preferences put into the story, right?
[00:07:50] Santu Karmaker:
So what we wanted the LLM to perform is let’s compare these two narratives. And see what is common between the two narratives to better understand what are the related facts, maybe right? Because what are they reporting? And then it’s possible that there is some bias or like preference or expression style that CNN is using that Fox News is not right.
[00:08:11] Santu Karmaker:
As you see now, this is a complex task, like it makes sense as a complex task because now we are asking the LLM to do multiple things based on these two narratives. We want a multiperspective summary of the two narratives, but we have specific subtasks that we want the LLM to do.
[00:08:33] Santu Karmaker:
For example, we want to find out what is the common information that is present in both CNN and Fox News. We want to see what is kind of like a unique information or perspective that is put by each of Fox and CNN, right? And we also want to see if anything is conflicting. People, when they are reporting, sometimes they have a different way to present information that kind of seems to be apparently conflicting with each other.
[00:09:01] Santu Karmaker:
Now, we want LLMs to look at these two narratives and kind of generate a summary by summarizing these three different constraints, the uniqueness of the information, and the conflicting part of those information, right? And then we said, okay, let’s just try out the popular elements.
[00:09:21] Santu Karmaker:
So there was GPT, of course, and then there was Google Bard. And then there was BLOOM model. There was Llama from Facebook. And we kind of said, like, okay, we want to do this study systematically. And we immediately realized, like, not the same prompt works the same way for different.
[00:09:44] Santu Karmaker:
Now, you probably want prompt A works better for ChatGPT, but then prompt B works better for BLOOM, right now. And then how do I make a comparison? So it’s not fair that we give a problem that is not optimized for BLOOM, but it is optimized for ChatGPT and then say, okay, ChatGPT is better, right? So we, in order to, establish a fair benchmark and kind of like a systematic study as academicians, we do that, right?
[00:10:12] Santu Karmaker:
There was no taxonomy that is present. Like, okay, what are many different ways you could prompt? Because we have so many different complex tasks and we could prompt each complex task in so many different ways. And the ellipse output is every time a little different. Now, how do I establish a common benchmark that would make sense to me?
[00:10:36] Santu Karmaker:
Rather, like without requiring me to actually go to the entire set of prompts, which is a lot of time consuming process, right? And then often, and then the data is not public of the study and the prompts that were used are not reported. Right? So how do I establish a common benchmark that still gives us some idea of the level of details and kind of the properties of a prompt that was fed to multiple elements.
[00:11:03] Santu Karmaker:
And that’s where we said, OK, we don’t find anything in the literature. That said bad news, but that’s a good news as well, because now we can propose one and we can claim like here is the first kind of taxonomy for you there, and it’s a very general one. It applies to complex tasks in general, but then we could also think about the extension of this particular taxonomy to specific use cases.
[00:11:33] Santu Karmaker:
We have a very general framework and that was the challenging part, like, how do I even come up with such a general kind of taxonomy that can cover all elements, all kind of prompting ways and all that. Yep, that’s my side, but don’t you do remember anything else when we were actually like jotting this down, Dongji?
[00:11:56] Dongji Feng:
This was originally from another project that we are just focusing on designing different prompt to evaluate different large language model for that particular narrative summarization task. So initially I was just creating different prompts with different details, and levels of details.
[00:12:12] Dongji Feng:
But because the intuition is like, if you put more details to the prompt that should enhance the effectiveness, right? That is just, like in iteration. However, the concept of the taxonomy was not considered at that point. So we raised a question mark, like, why don’t we just categorize and cluster different, like similar problems together?
[00:12:38] Dongji Feng:
And if the prompt is deeper in their different level of details, can we kind of like generally claim that is a fair comparison. So to give you an illustration, let’s imagine you are riding a bike and I am driving a car. Because my car is faster than you, it’s just saying like car is better than a bike. Maybe a car is faster than a bike; but that is not a fair comparison, right? So we want to compare the Porsche with maybe Tesla and also compare the bike, I don’t know, but with a different bike. But like compare the things in the same ways, right? So that is the intuition for this one.
[00:13:15] Sean Ammirati:
Yeah, this is great.
[00:13:17] Sean Ammirati:
You’re getting actually some of my questions already with just your answers, which is awesome. But, Dongji, you actually started to go down this already with level of detail, right? That’s one of the components of the taxonomy, but it’s probably helpful to just quickly at this point, say, like, let’s walk through what are the components of the TELeR taxonomy?
[00:13:39] Dongji Feng:
Basically, all the details are in the paper, but I can quickly go through that. So this is the category of large language model prompt for the complex task along with like following four dimensions. The first one is the term. So basically you want to interact with a large language model with only single term or like multiple terms. So that is the first dimension.
[00:13:58] Dongji Feng:
The second one is expression way. That is based on your expression style. If you are just trying to raise a question to the large language model, or you are trying to give some direct instruction to the large language model.
[00:14:20] Dongji Feng:
The third one is a role. So as that is based on like, if you are providing a role of the large language model specifically. One interesting thing is recently, people finding the ChatGPT is getting lazy because some researchers are seeing like, “Oh, because it is December and ChatGPT is trying to play as the students who are trying to have a winter break.”
[00:14:45] Dongji Feng:
So that is a way to give the role of this to ChatGPT or large language models. So the last part is a level of details. So that is a degree of the details that we will provide to the prompt. So that’s turn, expression, level of details, roles for the TELeR taxonomy.
[00:15:10] Santu Karmaker:
That’s right. Yes. I’d like add some examples to all those four dimensions. So the first one, when we call turn is basically what it means is, you know, a complex task has many, many subtasks and many, many requirements. We could mention all the subtasks in a single prompt, in a single shot, right?
[00:15:32] Santu Karmaker:
And give it to LLM and ask, hope that it will do all of it. And, or, we could do one subtask at a time, and this kind of resembles the chain of thought process, right? You kind of give one subtask to a student, and they kind of go through it, and then you again ask for something.
[00:15:53] Santu Karmaker:
Whatever you have done so far also plays a part when the LLM is actually trying to give you a response if you do it in a multi-term fashion because now there is a chain of thought process and whatever your response was previously is now part of deciding what you are going to do next, which doesn’t come with a single term.
[00:16:11] Santu Karmaker:
Then the second part of our taxonomy is the expression style. So you could kind of, because we are using LLMs to do some tasks, we can ask this to ask them to do that task in primarily two different ways. One is like an instruction, a command, right? So you can say, okay, please write a summary or based on the following text, right?
[00:16:37] Santu Karmaker:
Or you could ask it in a question style and not always that they will make a huge difference, but sometimes they do. You could ask, like, can you write a summary for the following text and based on. The way you are asking the elements responsible will also be different because they have been heavily tuned over like instruction-tuning battery, right?
[00:16:56] Santu Karmaker:
So the third level is the level of the detail, and that’s where things get the most interesting, right? And one we have, like, kind of like different layers from 0 to 6, which is like right now, the seven layers of taxonomy in the revised version where you can see incrementally increase the complexity of the details of the prompt and there are specific levels if you are more interested, you can go into the paper.
[00:17:30] Santu Karmaker:
And the fourth dimension is basically the role. So it often helps. For example, in ChatGPT, when you are using their API, you can define it all. For example, maybe you are a medical assistant trying to serve some patients. And in that case, the ChatGPT will try to prioritize knowledge from the medical literature rather than social media, because you want to be more factual.
[00:17:50] Santu Karmaker:
When you are dealing with like medical application and so on, right? And maybe you are, if you define ChatGPT to be a like a writer, then it’s fine for ChatGPT to be more, let’s say, imaginative, right? More crazy. And you can define those roles to actually kind of set hyperparameters that can be fine-tuned or like empirically set inside ChatGPT to actually behave in many, many different ways.
[00:18:14] Santu Karmaker:
And that’s what gives you some control over how you want the large language models to behave. We understand this is not it. I mean, this is not the end. This is just the beginning and this prompt has to evolve, but we have to start somewhere that is general enough. And that’s where we have this, taxonomy at this stage, but yeah, we are happy that people find it, people are already finding it useful.
[00:18:42] Santu Karmaker:
And then we can see many, many different extensions of this taxonomy to specific particular domains.
[00:18:49] Sean Ammirati:
So I think this is great. And I would actually say, so we will include a link in the show notes to the paper. I would encourage all of you, right? So a theme that’s been coming up all season is like, when you have a job or something you need to accomplish. Think about how you can go and augment some of that work by going to ChatGPT, Google, whatever your favorite LLM is. I would encourage you to actually take the level of detail diagram that’s in that chart, print it out, and just put it on your desk.
[00:19:22] Sean Ammirati:
Because it is a really helpful way to think about what are the things that may be missing in this instruction that I’m giving the LLM to make it better? So these different levels of detail that you can go through. So level zero is basically no detail. You’re almost never going to do that. But you’ll often find I’m hovering at a level one or a level two I’m not getting the results I want.
[00:19:42] Sean Ammirati:
Well, maybe some of these other things further down the taxonomy will be helpful. I think you did a really nice job there, giving some examples. And I think the conversation so far has made it pretty obvious why this is useful for researchers. I know the BDI lab, a lot like the work we do at CSL at Carnegie Mellon, does a lot of work with industry partners as well.
[00:20:07] Sean Ammirati:
I thought it’d be interesting to talk about why you think this taxonomy is valuable. From your industry interactions in addition to the research stuff that we’ve already spoken about. So I don’t know which of you wants to take that, but how do you think the industry should be thinking about the work that you’re doing?
[00:20:26] Santu Karmaker:
Yeah, I can take a shot at it. So if you think about the future workforce and kind of the way AI is getting democratized, like in just couple of months, right? The progress is so rapid. And everybody is thinking about, okay, given this speed of like technology, how will the future workspace will look like, and my take on that is basically the future is definitely going to use a lot of generative AI.
[00:21:04] Santu Karmaker:
And it is definitely be going to be some sort of some versions of conversational AI. That gives you the most flexibility.
[00:21:26] Santu Karmaker:
At the very beginning of computer science, the communication was like, okay, you have to encode everything in zero and one.
[00:22:00] Santu Karmaker:
Other than that, computer doesn’t understand it. We have evolved from that. And now then we have C programming. Then we evolved from that. We had Python, object-oriented programming, Java, and then we had SQL. We kind of made more progress and we went more natural, like over time. And then now the movement is more towards complete natural language.
[00:22:22] Santu Karmaker:
Understanding an LLM is a big step towards that, but still what natural language is optimal for what kind of prompts and what kind of task is still not like when started, right?
[00:23:34] Dongji Feng:
I think another two benefits for the individual users, such as like engineer or researchers. The first one is, you can increase the reproducibility of your prompt. And the second one is you can increase like communications with a way if you’re using this prompt and you want to explain that with other engineers. So reproducibility is very important, because if you just change your prompt a little bit, that will impact the performance of large legged model.
[00:25:01] Dongji Feng:
So our defined for dimension is just like the attributes. So this is maybe if, let’s say you are create a class in the Python or the Java, you will put the attribute to this class, right? So if you change the attribute, that will change the performance of this class. So it’s same things if we, uh, like highlight this different attributes that people will know, okay, if we change this, that will impact the, like, evaluate your performance, uh, in a better way and also further do the explanation.
[00:25:33] Dongji Feng:
So yeah, that is a true benefits for the individual users and for the, in the company wide, it is better for them to discuss with each other and generate a final conclusion of different products.
[00:25:47] Sean Ammirati:
Yeah, that’s a good job. It’s a good analogy there. I want to finish talking taxonomy where you see that going. But before we get to that, let’s kind of step up a level and let’s talk a little bit about the lab that this originated within, right? So the BDI lab, maybe could you talk just a minute about what the BDI lab is and then some of the other projects that you have going on there.
[00:26:12] Santu Karmaker:
All right. So if you think about the field of artificial intelligence, so how it evolved, that has to do something with this naming thing.
[00:26:24] Santu Karmaker:
So, AI has been there for a long time. And traditionally, interestingly, the AI community was not the code computing community, which you kind of today, you might think like, oh, AI community. Must be a sub-community with their computer science within that’s not true the the intelligence and kind of artificial intelligence community used to be completely like very very distinguished and separate from the computer science community, but then something magic happened right over time.
[00:26:56] Santu Karmaker:
People realized that in order to do faster intelligence, you need computers and computers, computing capability like exponentially improved over time. And with that, people could see that we could use this increasing computation power to build powerful models that have been proposed by the community.
[00:27:18] Santu Karmaker:
But it’s still at that point. At some point, people were not really seeing the impact of AI because we didn’t have the data that we actually really needed to actually train all those AI models to really see their power. Now, what kind of techniques that you see today in, like ChatGPT and, Palm, Google Bard, Llama.
[00:27:42] Santu Karmaker:
It’s not like a very complete novel, like method or like, or like architecture they’re proposing. It’s just the scale of the data and the models. So now, the, the time that I was kind of forming the BDI lab, which was around 2020. Now, the point was, at that point, there were many architecture proposed by these AI researchers, right?
[00:28:06] Santu Karmaker:
Neural networks, attention models, transformers, everything was there already. But the scaling up of this ChatGPT was not still there. But people knew that the more data we get, the more powerful, like massive the architectures become. We are seeing an increase in performance. That trend was there. So, people kind of correlated the big data.
[00:28:31] Santu Karmaker:
With the amount of actual actionable intelligence you can get. So there is a difference between AI model and actionable intelligence, right? So AI models are, are great and they have this learning capability. But then in order to like really have actionable intelligence that you can actually leverage and make decisions on, you need big data.
[00:28:50] Santu Karmaker:
And that’s why I kind of try to combine like big data and intelligence into a single thing. That was my kind of research agenda when I started my lab. And then since then, we have been trying to do so.
[00:29:02] Sean Ammirati:
Jbecause we’ve been saying BDI the whole time. So just BDI stands for, cause I think this is a good point to actually give that.
[00:29:10] Santu Karmaker:
Yes. So BDI stands for big data intelligence. So it’s basically the intelligence you end up actionable intelligence you can derive from big data.
[00:29:19] Sean Ammirati:
Okay. And so now give us a couple of these projects that you’re working on, right?
[00:29:26] Santu Karmaker:
So one of the projects I kind of already mentioned with you in terms of understanding multiperspective narratives, right?
[00:29:33] Santu Karmaker:
Because this is how the data taxonomy action started and this multiperspective narrative. Understanding has been, like, of interest to the DoD and intelligence community because they, they often deal with intelligence gathering and there are multiple different versions of the narratives coming out from all over the world from all different agents are all different sources, right?
[00:29:55] Santu Karmaker:
Then how do they make sense of all those massive amount of information? That is coming within a very short amount of time to actually convert them into actionable like items because if you are a human and if I give you like the like 100 years, probably you can read all of them and make sense out of it.
[00:30:13] Santu Karmaker:
But then that problem no longer remains relevant because we need to action have actions in real time. So that’s where this multiperspective narrative understanding can help users like see different perspectives of the same event. Expressed in different narratives and kind of make sense out of it.
[00:30:31] Santu Karmaker:
That is one of the major projects we are leading in our lab. The other like big project that we are running in our lab is, I call it like the virtual interactive data scientist. So think of this as more like, we call it VIDS, Virtual Interactive Data Scientist. And think of it as, okay let’s say you want, you are talking to this LLM, right?
[00:30:52] Santu Karmaker:
Or you are talking to like more product-level assistants like Siri, Alexa or Google Home. And then you are, you want to say, okay, hey, city, what is going to be the traffic in the city of Atlanta in the next couple of hours? So based on that, you could kind of try to predict how, which route you should be taking, because right now, if you look at Google map, they do some projections, right?
[00:31:19] Santu Karmaker:
But then now think about your company, let’s say you are an airlines company and you have data for your flight delays and everything, right? So then you don’t want to share that with like ChatGPT and give it to OpenAI or if you are concerned about like, it contains sensitive data.
[00:31:42] Santu Karmaker:
You cannot just give it out. So then, given today’s large language models, there is no way you can frame a data science problem without really knowing how Phyton works. These are like machine learning. platforms where that people use for deep learning today, right? So not even, I’m pretty sure like some of the audience who are listening here don’t know how to like frame a TensorFlow pipeline, right?
[00:32:09] Santu Karmaker:
We don’t expect that. That’s a safe assumption. Right. That’s a safe assumption. We make the assumption and yes, that is true. We don’t, I want to make this distinction between AI enthusiast and AI expert because AI experts are building those models and they know how it works. Yeah. Enthusiasts.
[00:32:26] Santu Karmaker:
Don’t have to be an expert there. I don’t need to teach them neural networks, but they can, they are experts of their own domain. But I want them to be, to, to be able to like benefit from these like massive power of, of big data and AI. So how we do that through a conversational agent that we are introducing this idea of conversational data science where you talk to Siri or Alexa like they are your personal data scientists.
[00:32:55] Santu Karmaker:
Now, think about this. They are not going to be very accurate and expert as a, as a, as an expert experience data scientist, but it’s free. It has no privacy risk because you are, we are going to give you the conversational agent on your local machine. You can take it and then you have a demo and you can have a conversation with that conversational agent and you can kind of in natural language, you can express, okay, this is my data set.
[00:33:21] Santu Karmaker:
How, what my data set looks like, which is in your local machine. Private, secure, and then you can express like, I want to predict what how many flights are going to be delayed next week so that I can make proper actions and maybe which of the airports are going to like face those those difficulties so that we can better serve the customers.
[00:33:38] Santu Karmaker:
Right? And then, uh, based on our framework, what we will do, our conversation legend agent will try to understand what the users goal is and the data set how it looks like and try to create that training and testing data set with. And the objective functions, all the technicalities that happens that takes place by a human data scientist.
[00:34:01] Santu Karmaker:
And once you can create those, you can always show them to those. Existing automatic pipelining machine, uh, machine learning pipelines to actually execute those things and build a model and give you some predictions. So what we’re trying to do is to give an abstraction between existing AutoML tools, which are this TensorFlow, PyTorch and all, versus the humans and the, the, our virtual interactive data scientists.
[00:34:25] Santu Karmaker:
It’s actually a virtual data scientist that is having a conversation with you to understand your data science needs, convert them into a concrete machine learning task and fit them to the auto ML pipeline that you can execute without even knowing how to do machine learning. That’s
[00:34:41] Sean Ammirati:
fascinating. And I was going to ask you for use cases, but you did a nice job sort of weaving the use cases into that answer.
[00:34:47] Sean Ammirati:
So that’s great. Let’s come back to the, uh, to the taxonomy here though. So, uh, as, as we sort of moved to wrap up, where do you see the teller taxonomy going
[00:34:56] Santu Karmaker:
from here? I think it’s a more, uh, imaginative kind of way of answering this rather than like. Really like laying out what should be the direction. So as someone who has kind of initiated this, I want this to be used by practitioners and developers, and as well as researchers, as they see fit.
[00:35:21] Santu Karmaker:
So I would hate me, hate that. The fact that I will kind of like propose the next kind of extension of this rather, what I would like to see is people kind of trying to see what are the limitations of this of this taxonomy, for example, once you take things to the real domain, right, then you will see, okay, I have this use case, which is not covered in the teletext on them.
[00:35:47] Santu Karmaker:
For example, we devised our taxonomy. If you remember, So, Our original paper had six taxonomies from zero to five. And now if you go to our archive version, now we have six different levels. One extra level that we introduced is the retrieval augmented prompting. What it basically means is, Let’s say you are prompting ChatGPT something and maybe that idea or knowledge is so new that ChatGPT has not been trained yet on that.
[00:36:13] Santu Karmaker:
So it doesn’t have a clue what you are talking about. But if you kind of give some relevant content related to that and make it part of the prompt, maybe ChatGPT based on their other embeddings, whatever. Numbers they have learned from like big data. They can make sense more and give you serve you better now This application came from another domain that I was trying to like apply and see.
[00:36:35] Santu Karmaker:
Oh ritual augmented thing is a thing And we don’t have that in our taxonomy. So let’s try to fit it in into our taxonomy, right? So I kind of try to see the the future of teletextonomy as apply it and in many many different ways As possible and find out with corner cases that teletextonomy doesn’t handle and let’s try to fit it into the teletextonomy to make it more general to make it robust so that this becomes a general platform that everybody sees and kind of knows about so that you don’t have to explain every time.
[00:37:09] Santu Karmaker:
What exact details of the prompt? You can just say, okay, I used teletextonomy level four with system role defined in a, in a, in a question style. And that should be enough to for people to relate to. Oh, you use that kind. Okay. Makes sense. So that’s where I’m seeing the future should be. People should be able to relate themselves, relate their prompts with a particular level.
[00:37:32] Santu Karmaker:
And, and type of prompt from the text, from the taxonomies so that it becomes like a, like a practice standard across like researchers, practitioners, and developers, don’t you, you might have a different view. I don’t know. Right. I
[00:37:51] Dongji Feng:
mean, soon after we publish, like, submit this version to the archive. We saw many, uh, like computer science, uh, bloggers and who shared our paper on their, like, Twitters and, or like any other social medias.
[00:38:05] Dongji Feng:
Actually, I also found, uh, one Chinese blog who shared the latest AI stuff, techniques post their paper on the Weibo, which is a Chinese social media, just like the Twitter. So the idea is we hope like more and more people to use it and, trying to find the bug of it. And fix it with us together. Just like how people trying to find the bug in ChatGPT. Now let’s get our open source project.
[00:38:33] Sean Ammirati:
Fantastic. Well, guys, this has been an amazing conversation for everybody listening. I would encourage you to follow Santu and Dongji on Social media. We’ll make sure to include their socials in the show notes as well.