Bid Optimizing and Inventory Scoring in Targeted Online Advertising

Selected as the 2012 winner of the best paper award in the industry and government track by ACM SIGKDD

PDF: http://m6d.com/blog/wp-content/uploads/2012/08/Inventory-Targeting-for-Bid-Optimization.pdf

In “Bid Optimizing and Inventory Scoring in Targeted Online Advertising,” Claudia Perlich and team developed methods for measuring the impact of publisher inventory on ad effectiveness after controlling for the specific user and the advertiser in question. These methods can be applied in general to assess inventory quality, and specifically to define bid optimization strategies that maximize ad effectiveness. After deploying these methods in actual campaigns, m6d realized a better than 20 percent increase in ad effectiveness without incurring a cost increase. Looking to the future, Perlich demonstrated the potential of this research to expand the scope of bid optimization by bringing together, in real time, additional information from the cookie, such as age and activity level, and combining the performance-centric optimization approach with other tuning parameters.

PDF: http://m6d.com/blog/wp-content/uploads/2012/08/Inventory-Targeting-for-Bid-Optimization.pdf



Approximate Frequency Counts over Data Streams

PDF: http://www.vldb.org/conf/2002/S10P03.pdf

Joint paper by on of Standford’s Pagerank original paper author and Google labs researcher about the hot topic of data streams analytics, presents algorithms for computing frequency counts exceeding a user-specified threshold. The paper deftly combines theory, algorithms, and experiments, introducing novel algorithms
for sticky sampling and lossy counting with many applications in
streaming big data analysis, data mining, in web-server logs analytics. A beautifully written paper, it has garnered a truly amazing number of citations over the last decade.


Eduardo J. Ruiz, Vagelis Hristidis, Carlos Castillo, Aristides Gionis, Alejandro Jaimes. Correlating Financial Time Series with Micro-Blogging Activity. ACM International Conference on Web Search and Data Mining

Paper PDF:

"We study the problem of correlating micro-blogging activity with stock-market events, defined as changes in the price and traded volume of stocks. Specifically, we collect messages related to a number of companies, and we search for correlations between stock-market events for those companies and features extracted from the micro- blogging messages. … Features in the first group measure the overall activity in the micro-blogging platform, such as number of posts, number of re-posts, and so on. Features in the second group measure properties of an induced interaction graph, for instance, the number of connected components, statistics on the degree distribution, and other graph-based properties. We present detailed experimental results measuring the correlation of the stock market events with these features, using Twitter as a data source. Our results show that the most correlated features are the number of connected components and the number of nodes of the interaction graph. The correlation is stronger with the traded volume than with the price of the stock. However, by using a simulator we show that even relatively small correlations between price and micro-blogging features can be exploited to drive a stock trading strategy that outperforms other baseline strategies."



Phrase based document classification for web scale, By Ron Bekkerman, LinkedIn, Matan Gavish Stanford University

PDF: http://www.cs.umass.edu/~ronb/papers/kdd2011.pdf

Read More


Yahoo LDA: Seriously fast Latent Dirichlet Allocation (LDA) implementation on Hadoop


Gtihub Code:

Source: blog.smola.org


Blog: http://www.daltonclark.com/blog/2011/03/30/cloudera-hadoop-amazon-howto/

Source: daltonclark.com


Paper: http://www.umiacs.umd.edu/~jimmylin/publications/Lin_etal_KDD2011.pdf

Source: umiacs.umd.edu


Most publicized paper of a Hadoop

SIGMOD 2011 Paper: Apache Hadoop Goes Realtime at Facebook

Paper: http://borthakur.com/ftp/RealtimeHadoopSigmod2011.pdf

Presentation: http://borthakur.com/ftp/SIGMODRealtimeHadoopPresentation.pdf

Source: nosql.mypopescu.com

SVD Feature Reduction for a Recommendation-System in Ruby

Lecture Notes: Advanced Natural Language Processing, Jerry Zhu, Spring 2010

Video: Map-reduce in R with Amazon EMR

Distributed Computing: Paxos consensus made simple

Ebook: Mining of Massive Datasets

Mapreduce & Hadoop Algorithms in Academic Papers

Amazon Elastic MapReduce : Bootstrap Actions for Hadoop