in process of creation Artificial intelligence is booming and data is the new oil. So why can’t you sell your own?
From big tech companies to startups, AI makers are licensing e-books, images, videos, audio, and more from data brokers, all in an effort to train more powerful (and legally easier to defend) artificial intelligence products. Shutterstock has struck deals with Meta, Google, Amazon and Apple to provide millions of images for model training, while OpenAI has signed deals with several news organizations to train its models on news archives.
In many cases, the individual creators and owners of this data don’t see a dime change hands. A startup called Vana hopes to change that.
Anna Kazlauskas and Art Abal co-founded Vana in 2021 after meeting in a course at the MIT Media Lab focused on building technology for emerging markets. Before founding Vana, Kazlauskas studied computer science and economics at MIT, eventually leaving to launch Iambiq, a fintech automation startup out of Y Combinator. Abar is a corporate lawyer by training and education and was a partner at Boston Consulting Group The Cadmus Group before heading up impact sourcing at data annotation company Appen.
With Vana, Kazlauskas and Abal set out to build a platform that would let users “pool” their data, including chats, voice recordings and photos, into datasets that would then be used to generate AI model training. They also hope to create more personalized experiences by fine-tuning public models based on this data, such as daily motivational voicemails based on your health goals, or art-generating apps that understand your style preferences.
“Vana’s infrastructure actually creates a treasure trove of user-owned data,” Kazlauskas told TechCrunch. “It does this by allowing users to aggregate their personal data in a non-custodial way…Vana allows users to own AI models and use their data in AI applications.”
Here’s how Vana markets its platform and API to developers:
The Vana API connects users’ personal data across platforms… to allow you to personalize your applications. Your applications can instantly access your users’ personalized AI models or underlying data, simplifying onboarding and eliminating compute cost concerns… We believe users should be able to bring their personal data to you from walled gardens like Instagram, Facebook and Google applications so you can create amazing, personalized experiences from the first time a user interacts with your consumer AI application.
Creating an account with Vana is fairly simple. After confirming your email, you can attach data to your digital avatar (such as selfies, self-descriptions, and audio recordings) and explore applications built with the Vana platform and datasets. Application choices range from ChatGPT-style chatbots and interactive storybooks to Hinge profile generators.
Now, you might ask, in this age of heightened data privacy awareness and ransomware attacks, why would anyone volunteer their personal information to an anonymous startup, let alone a venture-backed one? (Vana has raised $20 million to date from Paradigm, Polychain Capital, and other backers.) Can any profit-driven company really be trusted not to misuse or mishandle any monetizable data in its possession?
In response to the question, Kazlauskas emphasized that the whole point of Vana is to allow users to “take back control of their data,” noting that Vana users can choose to self-host their data, rather than storing it on Vana’s servers and controlling it. How the data is used. Data is shared with applications and developers. She also believes that because Vana makes money by charging users a monthly subscription fee (starting at $3.99) and levying “data transaction” fees on developers (such as transferring data sets for AI model training), the company has no incentive to exploit users and the vast amounts of personal data they carry with them.
“We want to create models that are owned and managed by all users who contribute data, and allow users to bring their data and models into any application,” Kazlauskas said.
Now, though wana It won’t sell users’ data to companies for use in generating AI model training (at least that’s what it claims), it wants to allow users to do this themselves if they choose – starting with their Reddit posts.
This month, Vana launched what it calls a Reddit Data DAO (Digital Autonomous Organization), a program that aggregates Reddit data from multiple users (including their karma and post history) and lets them collectively decide how to use the combined data. After joining a Reddit account, submitting a data request to Reddit, and uploading data to the DAO, users have the right to vote with other members of the DAO on decisions such as licensing combined data to generative AI companies to share profits.
This comes in response to Reddit’s recent moves to commercialize data on its platform.
Reddit has not previously restricted access to posts and communities for the purpose of generating AI training. But late last year, ahead of its initial public offering, it changed course. Since the policy change, Reddit has received more than $203 million in licensing fees from companies including Google.
“Broad ideas [with the DAO is] “Freeing user data from major platforms trying to hoard and monetize it,” Kazlauskas said. “This is the first and our push to help people aggregate data into user-owned datasets to train artificial intelligence part of the intelligent model. “
Unsurprisingly, Reddit (which is not working with Vana in any official capacity) is not happy with the DAO.
Reddit banned Vana from the subreddit dedicated to discussing the DAO. A Reddit spokesperson accused Vana of “exploiting” its data export system, which is designed to comply with data privacy regulations such as GDPR and the California Consumer Privacy Act.
“Our data arrangements allow us to put guardrails around such entities, even public information,” the spokesperson told TechCrunch. “Reddit does not share non-public personal data with commercial businesses, and when Reddit users request that we export their data, they receive it from us in compliance with applicable law. There are clear terms between Reddit and the reviewed organizations “
But does Reddit really have reason to worry?
Kazlauskas expects the DAO to grow to the point where it affects how Reddit charges customers for data. Assuming that does happen, that’s still a long way off. The DAO has more than 141,000 members, a small fraction of Reddit’s 73 million strong user base. Some of these members may be bots or duplicate accounts.
Then there is the question of how to fairly distribute the payments the DAO might receive from data buyers.
Currently, the DAO awards “tokens” (cryptocurrency) to users corresponding to their Reddit karma. But karma may not be the best measure of the quality of contributions to a dataset—especially in smaller Reddit communities where there are fewer opportunities to earn contributions.
Kazlauskas floated the idea that members of a DAO could choose to share their cross-platform and demographic data, making the DAO potentially more valuable and incentivizing signups. But it also requires users to trust Vana more to handle their sensitive data responsibly.
Personally, I don’t think Vana’s DAO has reached critical mass. There are too many obstacles on the road. I do think, however, that this will not be the last grassroots attempt to exert control over the data that is increasingly used to train generative AI models.
Startups like Spawning are looking at ways to let creators set rules for how their data is used for training, while vendors like Getty Images, Shutterstock and Adobe continue to experiment with compensation schemes. But no one has cracked the code yet.Is it okay yes Broken? Given the cutthroat nature of the generative AI industry, this is certainly a tall order. But maybe someone will find a way—or policymakers will force one.
#Vana #plans #users #rent #Reddit #data #train #artificial #intelligence
Discover more from Yawvirals Gurus' Zone
Subscribe to get the latest posts sent to your email.