Menu
Jeff Goldblum Stars in funny GE Link Ad

Jeff Goldblum Stars in funny GE Link Ad

When Will The Apple Watch Come Out?

When Will The Apple Watch Come Out?

Pee-Wee Herman stars in New TV on the Radio Video

Pee-Wee Herman stars in New TV on the Radio Video

Windows 10 Preview Download Release is Today

Windows 10 Preview Download Release is Today

William Shatner and Leonard Nemoy to Star in New Volkswagen e-Golf Commercial

William Shatner and Leonard Nemoy to Star in New Volkswagen e-Golf Commercial

Crowdsourcing Feature Discovery

Mar 18 2014, 1:16pm CDT | by , in News | Technology News

Crowdsourcing Feature Discovery
 
 

YouTube Videos Comments

Full Story

Crowdsourcing Feature Discovery

By Ben Lorica

Data scientists were among the earliest and most enthusiastic users of crowdsourcing services. Lukas Biewald noted in a recent talk that one of the reasons he started CrowdFlower was that as a data scientist he got frustrated with having to create training sets for many of the problems he faced. More recently, companies have been experimenting with active learning (humans(1) take care of uncertain cases, models handle the routine ones). Along those lines, Adam Marcus described in detail how Locu uses Crowdsourcing services to perform structured extraction (converting semi/unstructured data into structured data).

Another area where crowdsourcing is popping up is feature engineering and feature discovery. Experienced data scientists will attest that generating features is as (if not more) important than choice of algorithm. Startup CrowdAnalytix uses public/open data sets to help companies enhance their analytic models. The company has access to several thousand data scientists spread across 50 countries and counts a major social network among its customers. Its current focus is on providing “enterprise risk quantification services to Fortune 1000 companies.”

CrowdAnalytix breaks up projects in two phases: feature engineering and modeling. During the feature engineering phase, data scientists are presented with a problem (independent variable(s)) and are asked to propose features (predictors) and brief explanations for why they might prove useful. A panel of judges evaluate(2) features based on the accompanying evidence and explanations. Typically 100+ teams enter this phase of the project, and 30+ teams propose reasonable features.

The modeling phase is a traditional machine-learning competition (entries compete on standard quantitative metrics), using data sets that incorporate features culled from the earlier phase. More than algorithms(3), companies gain access to models that incorporate ideas generated by teams of data scientists. CrowdAnalytix enriches data sets with features proposed by teams of data scientists, surfacing (potentially unconventional) ideas that may prove useful for their models.


(1) The key question that I pointed out in my earlier post was: can this approach scale? Panos Ipeirotis recently noted: “… Google Books and ReCAPTCHA project are really testing the scalability limits of this approach.”
(2) Judging is subjective, and is based on the “explanation and rationale” that accompany each feature.
(3) In the end, many teams who enter machine-learning competitions coalesce around a few algorithms (Random Forest is a favorite). Winners tend to distinguish themselves through feature engineering.

This post originally appeared on O’Reilly Data (“Crowdsourcing Feature discovery”). It’s republished with permission.

Source: Forbes

 

Updates


Sponsored Update


Advertisement


More From the Web

Shopping Deals

 
 
 

<a href="/latest_stories/all/all/31" rel="author">Forbes</a>
Forbes is among the most trusted resources for the world's business and investment leaders, providing them the uncompromising commentary, concise analysis, relevant tools and real-time reporting they need to succeed at work, profit from investing and have fun with the rewards of winning.

 

 

Comments

blog comments powered by Disqus

Latest stories

Molly Sims is Expecting Her Second Child with Scott Stuber
Molly Sims is Expecting Her Second Child with Scott Stuber
Molly Sims is pregnant with her husband Scott Stuber’s second baby.
 
 
Gisele Bundchen was Sexiest Model in Chanel Show
Gisele Bundchen was Sexiest Model in Chanel Show
Chanel's designer Karl Lagerfeld again beat everyone in staging the coolest fashion show in Paris. He had his models fake a feminist demonstration. One model stood out. Gisele Bundchen.
 
 
Elizabeth McGovern all set to Tour with her Folk Rock Band
Elizabeth McGovern all set to Tour with her Folk Rock Band
The main songstress of Sadie and the Hotheads, Elizabeth McGovern, is all set to go on a tour to the US of A with her folk rock band.
 
 
Liv Tyler and Molly Sims reveal second Pregnancy
Liv Tyler and Molly Sims reveal second Pregnancy
Two of the most beautiful Hollywood actresses revealed that they are expecting their second child.