- 博客(71)
- 资源 (1)
- 收藏
- 关注
原创 kudu - impala
partition nums equal to num of cores in clusterkudu optimizes sql if =, <=, '\<', '\>', >=, BETWEEN, or IN used, but not for !=, LIKE, or any other predicate type
2018-08-12 15:54:49 252
转载 cassandra, hbase and mongodb
cassandra, AP system, weak consistency, heavy write, high availibility, good for online use hbase, CP system, good support on batch analytics, good for analytics, not typical for online use mongodb,...
2018-07-27 11:36:58 242
翻译 Java offheap memory
- MappedByteBufferpublic void copyFile(String filename,String srcpath,String destpath)throws IOException { File source = new File(srcpath+"/"+filename); File dest = new File(destpath+"/...
2018-07-18 09:11:15 325
原创 kafka
- user caselog collectionmessage systemuser activitystream processingevent source- designkafka broker leader, multiple brokers contend for being leader by creating ephemeral node in zookeeper. only on...
2018-07-16 17:40:38 204
转载 JVM trouble shooting
- JPS, TOP and JSTACK, jps to find java info, like classname, parameters of main, JVM arguments, pid, jps -m -ltop to find the most CPU-bound thread, top -Hp pidjstack to dump stacks of thread, jstac...
2018-07-10 18:01:33 212
原创 review list
devopsspringboot and microservicesgmlparser, design patternpersistable queue, java volatile, atomacity and concurrency, mockitovolatile is not atomic, happen-before, happen-after for memory visibilit...
2018-06-25 11:58:10 305
翻译 Software Design
- Design PrinciplesOpen-close, open for extension, close for modificationLiskov substitution, any subclass can be in the place where base class isDemeter, least known principleinterface segregation, p...
2018-06-01 12:19:49 330
原创 submit spark code to yarn
- configure spark to submit code to remote yarn val sparkConf = new SparkConf().setAppName(s"Bulk Import $manualNbr").setMaster("yarn").set("deploy-mode", "client")// ...
2018-05-27 16:11:37 244
转载 compile spark source code
Change scala version to the scala version in your machine: ./dev/change-scala-version.sh <version>Shutdown zinc: ./build/zinc-<version>/bin/zinc -shutdownCompile Spark: ./build/mvn -Pyarn ...
2018-05-25 18:12:01 184
翻译 LSM Log-Structured Merge-Tree
- Sequential access is better than random access -> WAL, append update to log- Memstore in memory for quick lookup -> Memstore which flushes data to store file when reaches valve- Merge multiple...
2018-05-12 19:43:51 136
翻译 B tree vs B+ tree
- B tree (key+data in every node), O(log(d)(n))d is degree of treeh is height of tree, h<= log(d)((n+1)/2)non-leaf node has n-1 key and n pointers, d<=n<=2dheights of each leaf are samenodes...
2018-05-12 19:03:07 252
翻译 HBase MapReduce
- Data Locality, block placement policy. the first copy is written to the data node where region server runs.- TableInputFormat, divide table at region boundaries by start row and end rowstatic class ...
2018-05-12 15:54:01 123
翻译 HBase Filters, Counters &amp; Coprocessors
- Filter -> FilterBase. setFilter(filter) method on Get and Scan- CompareFilter, operator + comparator , matched data is keptCompareFilter(CompareOp valueCompareOp, WritableByteArrayComparable valu...
2018-05-12 12:02:29 126
翻译 HBase Region Split
- Split Policy (ConstantSizeRegionSplitPolicy, IncreasingToUpperBoundRegionSplitPolicy, SteppingSplitPolicy)- Split Point, The first row of center block of the biggest file of the store- Split Workflo...
2018-05-09 17:55:27 150
翻译 HBase Concept
- Data Model, sparse, distributed, persisted multidimensional sorted map(row:string, column:string, time:int64) -> string //both key and value are uninterpreted bytesRowsingle row read and update i...
2018-05-08 21:23:51 140
翻译 Java GC
young generation and old generation. 1 eden and 2 survivor spaces.minor GC, mark and copy, from eden and one survivor to the other survivorfull GC, mark, sweep and compact generationsboth will stop th...
2018-05-07 17:55:39 139
翻译 bloom filter
- space efficient look up for fixed number of static elements. - may have, definitely no haven: number of elementsk: number of hash functions, k = n*ln2/mm: number of bits, >= n*lg(1/E)*lgeE: expec...
2018-05-07 13:07:58 105
翻译 spark - Running on Cluster
- package spark app (maven)<plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.3</version>
2018-05-05 09:21:14 186
翻译 spark - Tuning and Debugging Spark
- submit application (sparkconf object cannot be changed after SparkContext creationmethod 1bin/spark-submit \—class com.example.MyApp \—master local[4] \—name “My Spark App” \—conf spark.ui.port=...
2018-05-04 18:42:03 137
翻译 spark - Advanced Spark Programming
- Accumulatorval blankLines = new LongAccumulatorsc.register(blankLines)put accumulate in transformation for debugging purpose because of speculative task. it's not accurate. But in action, the accum...
2018-05-03 20:04:33 198
翻译 spark - Loading and Saving Data
- File FormatsText Filesc.textFile, load a text filesc.wholeTextFiles, load multiple files (filename, entire content) under specified dirJSONsc.textFile.map to JSON object (people.add(mapper.readValue...
2018-05-03 18:03:54 120
翻译 scala notes (7) - Advanced Type and Implicit
- advanced typessingleton typedef setTitle(title: String): this.type = { ...; this } // for subtypesdef set(obj: Title.type): this.type = { useNextArgAs = obj; this } //take object as parameter, no ...
2018-04-29 22:48:12 116
翻译 scala notes (6) - Annotation, Future and Type Parameter
- Annotationclass MyContainer[@specialized T]def country: String @Localized@Test(timeout = 0, expected = classOf[org.junit.Test.None])def testSomeFeature() { ... }Java annotation can be mixed with Sc...
2018-04-27 15:34:52 113
翻译 spark - Pair RDD (Key/Value Pairs)
- Create Pair RDDfrom regular RDD by calling map function.val pairs = lines.map(x => (x.split(" ")(0), x))transformation on Pair RDD (data: {(1,2),(3,4),(3,6)})reduceByKey => {(1,2), (3,10)}grou...
2018-04-27 10:24:24 342
翻译 scala notes (5) - pattern and case class
- Pattern and Case Class ch match{ case _ if Character.isDigit(ch) => .. case '+' => ... case _ => ...}prefix match { case "0" | "0x" | "0X" => ...}case variable should be lowercase....
2018-04-26 12:08:49 92
翻译 scala notes (4) - collection
- CollectionArray is equivalent of Java array, it's mutable in terms of value update. but not sizesequenceVector is immutable equivalent of ArrayBuffer which is indexed sequence with fast random acces...
2018-04-25 18:04:21 109
翻译 scala notes (3) - Files &amp; Regular Expression, Trait, Operation and Function
- Files & Regular Expressionsread from file, url and string, remember to close sourceval source = Source.fromFile("myfile.txt", "UTF-8")val source1 = Source.fromURL("http://horstmann.com", "UTF-8...
2018-04-25 11:14:26 113
翻译 scala notes (2) - Class, Object, Package &amp;amp; Import and Inheritance
- Classclass Counter { private var value = 0 // You must initialize the field, otherwise it's abstract class. def increment() { value += 1 } // Methods are public by default def current() ...
2018-04-24 19:05:24 128
翻译 scala notes (1) - Basic, Control & Function, Array and Map & Tuple
- Basicsval greeting: String = nullval xma, ymax = 100 // both are setString -> StringOps //intersect, sorted...Int -> RichInt // 1.to(10)primitive -> Rich*BigInt & BigDecimal // * can be...
2018-04-24 12:02:34 108
翻译 Programming with RDD
- Passing functions to Spark (be careful the reference to the containing object which need to be serializable)class SearchFunctions(val query: String) {def isMatch(s: String): Boolean = {s.contains(...
2018-04-23 18:51:18 80
翻译 scala type parameters
- type bounds class Pair[T <: Comparable[T]](val first: T, val second: T) {def smaller = if (first.compareTo(second) < 0) first else second //compareTo}class Pair[T](val first: T, val seco
2018-04-23 18:23:37 643
转载 MapReduce Features
- Counters (values are definitive only once job has successfully completed)Task CountersFilesystem CountersJob Counters (only in application master. doesn't need to send across network, mainly about t...
2018-04-22 19:52:21 73
翻译 MapReduce Types and Formats
- typesmap: (K1, V1) → list(K2, V2)combiner: (K2, list(V2)) → list(K2, V2)reduce: (K2, list(V2)) → list(K3, V3)- partition (HashPartitioner)public abstract class Partitioner<KEY, VALUE> {public ...
2018-04-21 19:45:47 82
翻译 MapReduce Workflow
check output foldercalculate splitsapplication master gets progress and completion reports from tasks. it also requests containers for map tasks and reduce tasks. it starts container by the nodemanage...
2018-04-21 16:13:32 287
翻译 MapReduce Application
- Configurationconf.addDefaultResource, conf.addResource, configuration overridden <property><name>fs.defaultFS</name><value>file:/// or hdfs://namenode</value></pr...
2018-04-21 11:22:59 225
翻译 Hadoop I/O
- checksum, CRC-32C, for every 512 bits, write, last datanode of the pipeline verifies checksumread, block verification on client readrawlocalfilesystem, to disable checksum- compression, (default is ...
2018-04-20 15:11:40 88
翻译 YARN (Yet Another Resource Negotiator) - Cluster Manager
- what is yarn- Yarn application run- Resources requestall requests up front (Spark) or dynamic request (MapReduce, mapper tasks requests are up front, but reduce tasks are dynamic)- application lifes...
2018-04-19 17:24:24 197
翻译 HDFS
- suitable very large size, terabyte, petabyte write once and read many times handle node failure without noticeable interruption- not suitable for some applications with, low-latency data access, HBa...
2018-04-19 14:51:12 225
原创 Map
HashMap get containsKey next o(1) o(1) o(h/n)Map key to array index to get complexity to O(1) (constant time).resize when table size >= threshold (= table size * load fact...
2018-04-19 13:52:36 156 1
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人