Dataloop expects funding to expand its suite of data annotation tools.


Data annotation, or the process of adding labels to images, text, audio, and other sample data, is typically a key step in developing AI systems. Most systems learn predictions by pairing labels with specific data samples, such as “bear” with a photo of a black bear. A system trained on many labeled examples of different types of contracts, for example, will eventually learn to distinguish between those contracts and even accept a contract it has never seen before.

The problem is, annotation has historically been a manual and labor-intensive process assigned to workers on platforms like Amazon’s Mechanical Turk. But with growing interest in AI — and the data used to train that AI — an entire industry has emerged around annotation and labeling tools.

Dataloop, one of the many startups vying to establish themselves in the early-stage market, today announced that it has raised $33 million in a Series B round led by Nokia Growth Partners (NGP) Capital and Alpha Wave Global. Dataloop develops software and services to automate aspects of data preparation, aiming to shave time off the AI ​​system development process.

“I worked at Intel for over 13 years, and that’s where I met Avishar, Dataloop’s co-founder and CPO. “With Avi, I left Intel and founded Dataloop. [Buschi]Our CBO, joined us as the third co-founder after he held executive positions. [at] Technology companies and [lead] Go to market with business and venture-backed startups.

Dataloop originally focused on data annotation for computer vision and video analytics. But in recent years, the company has added new tools for text, audio, form and document data and allowed customers to integrate custom data applications built in-house.

One of the recent additions to the Dataloop platform is data management dashboards for unstructured data. (In contrast Structured data, or data arranged in a standard format; Unstructured Data is not organized in a common model or pattern.) each provides tools for data publication and metadata retrieval, as well as a query language for querying datasets and viewing data samples.

Image Credits: Data cycle

All AI models are learned from humans through the process of data identification. The tagging process is essentially a cognitive encoding process where one teaches the machine the rules using positive and negative examples of data,” Shlomo said. “The main goal of every AI application is to use the customer data to create a ‘data flywheel effect’: better product leads to more users, which in turn leads to more data, which in turn leads to better product.”

Dataloop competes with heavyweights in the data annotation and labeling space, including Scale AI, which has raised more than $600 million in venture capital. Labelbox is another major competitor that recently secured more than $110 million in a financing round led by SoftBank. Beyond the startup realm, tech giants including Google, Amazon, Snowflake and Microsoft offer their own data annotation services.

Dataloop must be doing something right. Shlomo said the company currently has “hundreds” of customers in retail, agriculture, robotics, autonomous vehicles and construction, though he declined to disclose revenue figures.

An open question is whether the Dataloop platform will solve some of the major challenges in data labeling today. Last year, a paper from MIT found that data labeling is highly inconsistent and can compromise the accuracy of AI systems. A growing body of academic research suggests that editors introduce their own biases when they label information—for example, phrases in African American English (a modern dialect spoken primarily by black Americans) are more toxic than their general American English counterparts. These biases often manifest in unfortunate ways; Consider moderation algorithms that are more likely to ban black users than white users.

Data loggers are also underpaid. Annotators who contributed captions to ImageNet, one of the most popular open-source computer vision libraries, reportedly earned an average salary of $2 an hour.

Shlomo says organizations need to use Dataloop tools to affect change — not necessarily Dataloop itself.

“We see the low pay for pilots as a market failure. “Data annotation shares many qualities with software development, one of which is the impact talent has on productivity,” Shlomo said.[As for bias,] Bias in AI starts with the questions the AI ​​developer chooses to ask and the guidelines they provide to naming companies. We call it ‘priority bias’. For example, you cannot detect color bias unless you ask for skin color in your labeling process. The underlying bias issue is one that industry and regulators need to address. Technology alone will not solve the problem.

To date, Dataloop, which has 60 employees, has raised $50 million in venture capital. The company plans to increase its workforce to 80 employees by the end of the year.



Source link

Related posts

Leave a Comment

eighteen − two =