Artwork

Treść dostarczona przez Massive Studios. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Massive Studios lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.
Player FM - aplikacja do podcastów
Przejdź do trybu offline z Player FM !

Synthetic Data for AI

25:58
 
Udostępnij
 

Manage episode 412962639 series 2285741
Treść dostarczona przez Massive Studios. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Massive Studios lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.

Kalyan Veeramachaneni (@kveeramac, CEO/Founder @DataCebo) discusses the generation and value proposition of synthetic data for GenAI.
SHOW: 813
CLOUD NEWS OF THE WEEK -
http://bit.ly/cloudcast-cnotw
NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST -
"CLOUDCAST BASICS"
SHOW NOTES:

Topic 1 - Our topic for today is synthetic data. While the concept and need for synthetic data has been around for a long time, it isn’t a topic that typically comes to the forefront and something we haven’t talked about until today. Today is a bit of crossing the streams between developers and testing data and using GenAI to achieve this goal. For this, we’re joined by Kalyan, CEO and Co-Founder of DataCebo. Welcome to the show
Topic 2 - First, for those not familiar, what is synthetic data? What is the use case and need? What problem is it solving today?
Topic 2a - Hopefully, listeners out there are making the connection to the advantages of GenAI for synthetic data, but take us through your original concept at MIT and the history of Synthetic Data Vault (SDV).
Topic 3 - We recently did a show on the security and privacy of training LLMs where we covered the need to mask PII for the training of models for compliance. I can also see bias issues coming into play or maybe training data that doesn’t exist in the real world (weather models example). What are some of the use cases that you’ve seen require synthetic data sets. Are there certain industries (healthcare, financials, etc.) that benefit?
Topic 4 - You were designing this based on GenAI before GenAI was “cool”. How has the rise of LLMs impacted this space?
Topic 5 - If I understand this correctly, organizations would put generative AI on a problem to describe a need for a data set, the model would then evaluate the available data and create a quality synthetic or “fake” dataset. How would the organization verify the quality of the dataset? How would they validate that a synthetic data set is as good as the original data?
Topic 6 - Let’s talk about resources for a bit. When I think of GenAI and training, I think of large amounts of hardware and in particular GPU’s that might have limited availability. Is that true here? Also, is this on-prem or in the cloud, or both?
FEEDBACK?

  continue reading

Rozdziały

1. Synthetic Data for AI (00:00:00)

2. [Ad] Out-of-the-box insights from digital leaders (00:11:55)

3. (Cont.) Synthetic Data for AI (00:12:33)

912 odcinków

Artwork

Synthetic Data for AI

The Cloudcast

1,286 subscribers

published

iconUdostępnij
 
Manage episode 412962639 series 2285741
Treść dostarczona przez Massive Studios. Cała zawartość podcastów, w tym odcinki, grafika i opisy podcastów, jest przesyłana i udostępniana bezpośrednio przez Massive Studios lub jego partnera na platformie podcastów. Jeśli uważasz, że ktoś wykorzystuje Twoje dzieło chronione prawem autorskim bez Twojej zgody, możesz postępować zgodnie z procedurą opisaną tutaj https://pl.player.fm/legal.

Kalyan Veeramachaneni (@kveeramac, CEO/Founder @DataCebo) discusses the generation and value proposition of synthetic data for GenAI.
SHOW: 813
CLOUD NEWS OF THE WEEK -
http://bit.ly/cloudcast-cnotw
NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST -
"CLOUDCAST BASICS"
SHOW NOTES:

Topic 1 - Our topic for today is synthetic data. While the concept and need for synthetic data has been around for a long time, it isn’t a topic that typically comes to the forefront and something we haven’t talked about until today. Today is a bit of crossing the streams between developers and testing data and using GenAI to achieve this goal. For this, we’re joined by Kalyan, CEO and Co-Founder of DataCebo. Welcome to the show
Topic 2 - First, for those not familiar, what is synthetic data? What is the use case and need? What problem is it solving today?
Topic 2a - Hopefully, listeners out there are making the connection to the advantages of GenAI for synthetic data, but take us through your original concept at MIT and the history of Synthetic Data Vault (SDV).
Topic 3 - We recently did a show on the security and privacy of training LLMs where we covered the need to mask PII for the training of models for compliance. I can also see bias issues coming into play or maybe training data that doesn’t exist in the real world (weather models example). What are some of the use cases that you’ve seen require synthetic data sets. Are there certain industries (healthcare, financials, etc.) that benefit?
Topic 4 - You were designing this based on GenAI before GenAI was “cool”. How has the rise of LLMs impacted this space?
Topic 5 - If I understand this correctly, organizations would put generative AI on a problem to describe a need for a data set, the model would then evaluate the available data and create a quality synthetic or “fake” dataset. How would the organization verify the quality of the dataset? How would they validate that a synthetic data set is as good as the original data?
Topic 6 - Let’s talk about resources for a bit. When I think of GenAI and training, I think of large amounts of hardware and in particular GPU’s that might have limited availability. Is that true here? Also, is this on-prem or in the cloud, or both?
FEEDBACK?

  continue reading

Rozdziały

1. Synthetic Data for AI (00:00:00)

2. [Ad] Out-of-the-box insights from digital leaders (00:11:55)

3. (Cont.) Synthetic Data for AI (00:12:33)

912 odcinków

Όλα τα επεισόδια

×
 
Loading …

Zapraszamy w Player FM

Odtwarzacz FM skanuje sieć w poszukiwaniu wysokiej jakości podcastów, abyś mógł się nią cieszyć już teraz. To najlepsza aplikacja do podcastów, działająca na Androidzie, iPhonie i Internecie. Zarejestruj się, aby zsynchronizować subskrypcje na różnych urządzeniach.

 

Skrócona instrukcja obsługi