puffsun-CSDN博客

This blog stop updating, please go to http://codethoughts.info

本博客停止更新，新博文将发表于 http://codethoughts.info 欢迎订阅。--------------------------------------------------------------------------------------------This blog stop updating now, please subscribe http://cod...

2015-04-04 11:43:48 231

原创基础数据结构和算法十四：Directed Graphs

In directed graphs, edges are one-way: the pair of vertices that defines each edge is an ordered pair that specifies a one-way adjacency. Many applications (for example, graphs that represent the ...

2013-12-15 22:40:30 872

原创基础数据结构和算法十三：Undirected Graphs (2)

Design pattern for graph processing. Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation. T...

2013-12-13 22:51:29 248

原创基础数据结构和算法十三：Undirected Graphs

A graph is a set of vertices and a collection of edges that each connect a pair of vertices. Vertex names are not important to the definition, but we need a way to refer to vertices. By convention, ...

2013-12-13 20:15:59 327

原创 Ruby array slicing - weird behavior

If you play around array slicing in irb, it will behavior like below: irb(main):027:0> a = [1,2,3]=> [1, 2, 3]irb(main):028:0> a[2,1] => [3]irb(main):029:0> a[4,1] ...

2013-12-12 09:42:43 94

Ruby中Enumerable#inject用法示范

Enumerable#inject是Ruby核心库中的一个简洁而且强大的API，今天读到一段简洁的代码之后，对这个API产生了浓厚的兴趣，索性搜寻一下资料，总结一下它的用法。代码如下： def text_at(*args) args.inject(@feed) { |s, r| s.send(:at, r)}.inner_textend这段代码完成的功能是：取出X...

2013-12-06 17:36:50 224

原创 Trapped by String#split of Ruby

Today I was trapped by kind of wierd behavior of Ruby's String#split, here's an example:def parse_inline_styles(text) segments = text.split(%r{(</?.*?>)}).reject {|x| x.empty?} segments...

2013-12-05 18:33:21 76

原创基础数据结构和算法十二：Hash table

Search algorithms that use hashing consist of two separate parts. The first part is to compute a hash function that transforms the search key into an array index. Ideally, different keys would map...

2013-12-02 22:06:01 196

原创基础数据结构和算法十一：Red-black binary search tree

The insertion algorithm for 2-3 trees just described is not difficult to understand; now, we will see that it is also not difficult to implement. We will consider a simple representation known as...

2013-12-01 12:12:35 228

原创基础数据结构和算法十：2-3 search tree

Binary search tree works well for a wide variety of applications, but they have poor worst-case performance. Now we introduce a type of binary search tree where costs are guaranteed to be logarit...

2013-11-30 11:07:02 352

原创基础数据结构和算法九：Binary Search Tree

A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all no...

2013-11-28 22:39:46 203

原创基础数据结构和算法八：Binary search

Binary search needs an ordered array so that it can use array indexing to dramatically reduce the number of compares required for each search, using the classic and venerable binary search algorithm...

2013-11-28 21:21:41 184

原创基础数据结构和算法七：Priority queue & Heap sort

Some important applications of priority queues include simulation systems, where the keys correspond to event times, to be processed in chronological order; job scheduling, where the keys correspond...

2013-11-27 19:47:59 334

原创基础数据结构和算法六：Quick sort

Quick sort is probably used more widely than any other. It is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster...

2013-11-21 19:33:14 211

原创基础数据结构和算法五：Merge sort

One of mergesort’s most attractive properties is that it guarantees to sort any array of N items in time proportional to N * log N. Its prime disadvantage is that it uses extra space proportional...

2013-11-20 21:44:40 217

原创基础数据结构和算法四：Shell sort

Shellsort is a simple extension of insertion sort that gains speed by allowing exchanges of array entries that are far apart, to produce partially sorted arrays that can be efficiently sorted, ev...

2013-11-20 19:11:30 161

原创 Comparing two sorting algorithms

Generally we compare algorithms by■ Implementing and debugging them■ Analyzing their basic properties■ Formulating a hypothesis about comparative performance■ Running experiments to validate...

2013-11-19 21:16:59 108

原创基础数据结构和算法三：Insertion Sort

As in selection sort, the items to the left of the current index are in sorted order during the sort, but they are not in their final position, as they may have to be moved to make room for smaller ...

2013-11-19 21:06:47 158

原创基础数据结构和算法二：Selection sort

One of the simplest sorting algorithms works as follows: First, find the smallest item in the array and exchange it with the first entry (itself if the first entry is already the smallest). Then,...

2013-11-19 20:57:06 130

原创基础数据结构和算法一：UnionFind

The problem that we consider is not a toy problem; it is a fundamental computational task, and the solution that we develop is of use in a variety of applications, from percolation in physical ch...

2013-11-19 20:47:04 112

原创 Availability and Reliability with HBase

AvailabilityAvailability in the context of HBase can be defined as the ability of the system to handle failures. The most common failures cause one or more nodes in the HBase cluster to fall off t...

2013-08-25 10:53:19 92

原创 Failed to Run Pig Script with Macro

Pig version:[root@n8 examples]# pig -versionApache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:48:21 Hadoop version:[root@n8 examples]# hadoop versionHadoop 2.0.0-cd...

2013-08-16 19:44:29 147

原创 Solution to Hive Thrift Client Hang without Any Return

Env:Cloudera Manager 4.6.1 with CDH4.3Hadoop 2.0.0-CDH4.3Hive 0.10.0-CDH4.3CentOS 6.4 X86_64 Hive started successfully: [root@n8 hive]# netstat -anlp | grep 10000tcp 0 0 0.0.0.0:...

2013-08-12 19:38:33 101

原创如何制作Hive数据文件

在学习Hive的过程中我经常遇到的问题是没有合适的数据文件，比如在读《Programming Hive》这本书的时候就因为Employees这张表没有提供示例数据而倍感挫折。因为Hive默认用'\001'（Ctrl+A）作为字段(Fields)分隔符，'\002'(Ctrl+B)作为集合元素(Collections Items)分隔符，'\003'作为Map类型Key/Values分隔符。在编...

2013-08-10 12:05:04 130

原创 Hive - 创建Index失败，原因暂未知

运行环境Cloudera Hive 0.10-CDH4 在我机器上安装的Hive里有如下的表： hive (human_resources)> describe formatted employees;OKcol_name data_type comment# col_name data_type comment ...

2013-08-10 00:08:46 950

原创 Cascading Terminology and Concepts

Cascading is a data processing API and processing query planner used for defining, sharing, and executing data-processing workflows on a single computing node or distributed computing cluster. On a ...

2013-08-02 23:17:37 112 1

原创 Cascading Kick Start: Word Counting

If you know Hadoop, you're undoubtedly have seen WordCount before, WordCount serves as a hello world for Hadoop apps. This simple program provides a great test case for parallel processing:It req...

2013-07-31 19:36:29 87

原创 Joins with Apache Crunch

Apache Crunch is a Java library for creating MapReduce pipelines that is based on Google's FlumeJava library. Like other high-level tools for creating MapReduce jobs, such as Apache Hive, Apache Pig...

2013-07-30 19:46:21 96

原创 Getting Started with Apache Crunch

The Apache Crunch Java library provides a framework for writing, testing, and running MapReduce pipelines. Its goal is to make pipelines that are composed of many user-defined functions simple to wr...

2013-07-29 23:10:34 94

原创 Accelerating Comparison by Providing RawComparator

When a job is in sorting or merging phase, Hadoop leverage RawComparator for the map output key to compare keys. Built-in Writable classes such as IntWritable have byte-level implementation that are...

2013-07-27 21:25:07 76

原创 MapReduce Algorithm - Secondary Sort

Secondary sort is used to sort to allow some records to arrive at a reducer ahead of other records, it requires an understanding of both data arrangement and data flow (partitioning, sorting and gro...

2013-07-25 19:34:46 144

原创 MapReduce Algorithm - Semi-joins

In relational world, semi-join can be defined as a join between two tables returns rows from the first table where one or more matches are found in the second table. The difference between a semi-jo...

2013-07-25 18:15:04 78

原创 MapReduce Algorithm - Another Way to Do Map-side Join

Map-side join is also known as replicated join, and gets is name from the fact that the smallest of the datasets is replicated to all the map hosts. You can find a implementation in Hadoop in Action...

2013-07-25 17:51:08 127

原创 Homework - HBase Shell, Java Client and MapReduce Job

Env:Single Node with CentOS 6.2 x86_64, 2 processors, 4Gb memoryCDH4.3 with Cloudera Manager 4.5HBase 0.94.6-cdh4.3.0 HBase 0.94.6-cdh4.3.0 HBase shell exercise:[root@n8 ~]# hbase shel...

2013-07-21 23:36:22 131

原创 Running MapReduce Job with HBase

Generally there are three different ways of interacting with HBase from a MapReduce application. HBase can be used as data source at the beginning of a job, as a data sink at the end of a job or as ...

2013-07-21 01:50:23 90

原创 Adding HBase Library into Java Classpath

Suppose you write some Java code to operate HBase via HBase Java client interface, you compile and package the java source code into a jar, called examples.jar. In Hadoop cluster you can use "hbase c...

2013-07-20 14:17:36 83

原创 Moving Data in/out of Hadoop Filesystem

Hadoop has a number of built-in mechanisms that can facilitate ingress and egress operations, to name a few:Embedded NameNode HTTP serverWebHDFS and Hadoop interfacesHbase built-in API, be sp...

2013-07-18 23:11:51 81

原创 Enabling Oozie Web Console in CDH3, CDH4 with/without Cloudera Manager

To enable Oozie's web console, you must download and add the ExtJS library to the Oozie server. If you have not already done this, proceed as follows. If you use CDH3, you must do:Download th...

2013-07-16 23:36:37 74

原创指定Flume日志分类级别

用UDP或TCP接受syslog格式日志的时候，比如：flume dump 'syslogUdp(5140)' 这个命令使用UDP在5140端口接收日志。这时候假如你希望从命令行测试能否成功接收：echo '<37>Hello from cmd.' |nc -u localhost 5140 一定要在测试文本头加上<37>用来对日志进行分类，否则flum...

2013-07-16 08:41:14 756

原创 PageRank Algorithm in MapReduce

In chapter 5 of Data-Intensive Text Processing with MapReduce, it introduces how to implement PageRank algorithm in MapReduce way. Here I am not going to talk more about PageRank itself, please refe...

2013-07-14 12:12:29 139

CSS3.The.Missing.Manual.3rd.Edition

CSS3 lets you create professional-looking websites, but learning its finer points can be tricky—even for seasoned web developers. This Missing Manual shows you how to take your HTML and CSS skills to the next level, with valuable tips, tricks, and step-by-step instructions.

2014-09-26

Ruby on Rails Tutorial 中文版第二版

Ruby on Rails Tutorial 第二版中文版， Rails 快速入门及进阶必备教程之一。虽说本书采用了 Rails3，跟最新的 Rails 版本有一定差距，但是实际应用中并不过时，因为从企业的角度来说升级 Rails 主版本需要考量的因素很多，大部分企业现在应该仍然在用 Rails3，甚至 Rails2. 因此这本书仍然是学习 Rails 值得参考的数目之一。

2014-04-20

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

CSS3.The.Missing.Manual.3rd.Edition

Ruby on Rails Tutorial 中文版 第二版

空空如也

Ruby on Rails Tutorial 中文版第二版