In this work, we address these two challenges by proposing a novel data sampling methodology: relying on the wisdom of crowdsourced experts. That is, instead of processing all the tweets posted in the Twitter network, we only rely on tweets from a handful of expert users. Since tweets posted by these expert users constitute only a small fraction of all the tweets in the network, using this expert tweet stream (or, expert sample) helps overcome scalability issues related to real-time data processing. Comparing the expert sample to another widely used sampling methodology (namely, random sampling) reveals that expert sampling has numerous potential advantages for data mining and content retrieval tasks such as content search, real-time event detection, product sentiment analysis etc.
To show the utility of the expert stream for content-centric applications, we compare Twitter search functionality implemented over the whole Twitter stream (or, crowd stream) to one implemented over the expert stream only. Surprisingly, despite being two orders of magnitude smaller, the expert stream captures most of the relevant information posted by the whole Twitter crowd. Moreover, search results from expert stream are of significantly better quality and contain far fewer spam posts as compared to crowd results. Our findings add another dimension to longstanding crowds vs. experts debate by concluding that wisdom of experts is better than wisdom of crowds in the context of certain content-centric applications. These findings have serious implications for the design of future content retrieval systems.