- 博客(0)
- 资源 (6)
- 收藏
- 关注
spark 高级数据分析
大数据是这几年科技和应用领域炙手可热的话题,而Spark 又是大数据领域里最活跃的技
术。对Spark 这个技术,国内研究比较多的是原理和源代码,而许多客户抱怨Spark 应用
落地难。造成这一现象的一个主要原因是Spark 技术比较新,许多应用还处在探索阶段。
Cloudera 公司作为全球大数据领域的领头羊,在给全球客户提供最高质量大数据平台的同
时,也积累了许多Spark 应用方面的宝贵经验。本书四位作者均为Cloudera 公司的数据科
学家,也长期为客户提供专业的数据分析服务。可以说,本书的出版将为Spark 数据分析
项目的落地起到巨大的推动作用。
2018-06-11
A LargeScale Analysis of Query Logs for Assessing Personalization Opportunities
推荐系统论文,大规模数据分析Query logs, the patterns of activity left by millions of users, contain
a wealth of information that can be mined to aid personalization.
We perform a large-scale study of Yahoo! search engine
logs, tracking 1.35 million browser-cookies over a period of
6 months. We define metrics to address questions such as 1) How
much history is available?, 2) How do users’ topical interests vary,
as reflected by their queries?, and 3) What can we learn from user
clicks? We find that there is significantly more expected history for
the user of a randomly picked query than for a randomly picked
user. We show that users exhibit consistent topical interests that
vary between users. We also see that user clicks indicate a variety
of special interests. Our findings shed light on user activity and can
inform future personalization efforts.
2018-06-11
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人