Menu
Matt Damon is Jason Bourne Again

Matt Damon is Jason Bourne Again

Galaxy Note 4 Screen is the World's Best

Galaxy Note 4 Screen is the World's Best

Iggy Azalea Sex Tape Controversy Heats Up

Iggy Azalea Sex Tape Controversy Heats Up

Paris Hilton just bought the Cutest and Smallest Dog

Paris Hilton just bought the Cutest and Smallest Dog

Kim Kardashian Flames Media about Kanye West Attack

Kim Kardashian Flames Media about Kanye West Attack

Crowdsourcing Feature Discovery

Mar 18 2014, 1:16pm CDT | by , in News | Technology News

Crowdsourcing Feature Discovery
 
 

YouTube Videos Comments

Full Story

Crowdsourcing Feature Discovery

By Ben Lorica

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans(1) take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies.”

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate(2) features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

The modeling phase is a traditional machine-learning competition (entries compete on standard quantitative metrics), using data sets that incorporate features culled from the earlier phase. More than algorithms(3), companies gain access to models that incorporate ideas generated by teams of data scientists. CrowdAnalytix enriches data sets with features proposed by teams of data scientists, surfacing (potentially unconventional) ideas that may prove useful for their models.


(1) The key question that I pointed out in my earlier post was: can this approach scale? Panos Ipeirotis recently noted: “… Google Books and ReCAPTCHA project are really testing the scalability limits of this approach.”
(2) Judging is subjective, and is based on the “explanation and rationale” that accompany each feature.
(3) In the end, many teams who enter machine-learning competitions coalesce around a few algorithms (Random Forest is a favorite). Winners tend to distinguish themselves through feature engineering.

This post originally appeared on O’Reilly Data (“Crowdsourcing Feature discovery”). It’s republished with permission.

Source: Forbes

 

You Might Also Like

Updates


Sponsored Update


Advertisement


More From the Web

Shopping Deals

 
 
 

<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

 

Comments

blog comments powered by Disqus

Latest stories

Best Buy Black Friday
Best Buy Black Friday
When it comes to Black Friday, it is hard to picture a bigger ordeal than the lines that form around a Best Buy store as customers wait for the doors to open. The people camped out days in advance should give you some...
 
 
Microsoft Sends Out Windows 9 Event Press Invites For September 30th
Microsoft Sends Out Windows 9 Event Press Invites For September 30th
The event is said to have a targeted focus on power users as well as enterprise customers.
 
 
Albert Pujols, Josh Hamilton Return to Los Angeles Angels' Lineup
Albert Pujols, Josh Hamilton Return to Los Angeles Angels' Lineup
Injured stars Albert Pujols and Josh Hamilton returned to the Los Angeles Angels' lineup on Sept. 16. Pujols had a hamstring cramp on Monday while Hamilton sustained a shoulder injury which sidelined him for 11 games.
 
 
Major NFL Sponsors Displeased With Recent NFL Actions
Major NFL Sponsors Displeased With Recent NFL Actions
Anheuser-Busch Releases Critical Statement of NFL's Handling of Ray Rice and Adrian Peterson.
 
 
 

About the Geek Mind

The “geek mind” is concerned with more than just the latest iPhone rumors, or which company will win the gaming console wars. I4U is concerned with more than just the latest photo shoot or other celebrity gossip.

The “geek mind” is concerned with life, in all its different forms and facets. The geek mind wants to know about societal and financial issues, both abroad and at home. If a Fortune 500 decides to raise their minimum wage, or any high priority news, the geek mind wants to know. The geek mind wants to know the top teams in the National Football League, or who’s likely to win the NBA Finals this coming year. The geek mind wants to know who the hottest new models are, or whether the newest blockbuster movie is worth seeing. The geek mind wants to know. The geek mind wants—needs—knowledge.

Read more about The Geek Mind.