chenlongzhen_tech-CSDN博客

原创 [声明] 此账号不在更新,请关注 http://blog.csdn.net/tech_chenlongzhen

[声明] 此账号不在更新,请关注 http://blog.csdn.net/tech_chenlongzhen

2019-07-26 14:25:44 372

原创 [声明] 此账号不在更新,请关注 http://blog.csdn.net/tech_chenlongzhen

[声明] 此账号不在更新,请关注 http://blog.csdn.net/tech_chenlongzhen

2018-01-19 14:27:42 874

原创 ESL 7 模型的评估与选择

bias, variance and Model Complexity预测值与真实值得算是函数一般用两种：检验误差或者泛化误差，是在独立的预测样本上的预测误差：对于模型的评估和选择有两个目标：模型评估：比较多个模型的performance从中选取最优模型评价：估计它在测试集合的泛化误差数据集一般划分为：训练集，验证集，检验集bias variance decomposition对于k-near

2017-12-19 20:29:51 1148

原创 nn pic model preprocess note

inception 预处理https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.pydef preprocess_input(x): x /= 255. x -= 0.5 x *= 2. return xvgg 预处理 if len(x.shape) == 3:

2017-04-13 16:41:51 10694 1

原创 temp

以CNN为例CNN 结构tensorflow代码实现alexnet分类keras代码vgg分类主要结构数据输入层/ Input layer卷积计算层/ CONV layerReLU激励层 / ReLU layer池化层 / Pooling layer全连接层 / FC layerBatch Normalization层(可能有)relu(Cifar-10的训练走向)将神经元的输

2017-04-10 15:40:22 726

原创 deeplearning简介

深度学习介绍什么是深度学习multiple layers of nonlinear processing unitsthe supervised or unsupervised learning of feature representations in each layer, with the layers forming a hierarchy from low-level to high-

2017-04-10 15:17:15 825

原创 rcnn 笔记

pip lineRCNN 1. image -> selective search ->proposals经典方法Deformable parts model经典方法： 1. SGD应用 2. NMS（non-maximum suppression）对后期testing处理 3. Data mining hard examples 的使用(概率接近1的为hardexample，剔除，再用0

2017-04-10 14:56:30 1006

原创 tensorflow 入门简介

计算模型首先构造好整个计算链路可以对链路进行优化分布式调度基于层模型每个层的计算,固定实现 forward/backward必须手动指定目标GPU卡概念使用张量表示数据使用图来表示计算任务在绘画的上下文中执行图通过变量维护状态使用feed和fetch可以为任意的操作赋值或者从其中获取数据numpy vs tensorflowtensorflow 计算图tensorflow一

2017-04-10 14:29:22 436

原创 tf-idf doc

tf-idf针对用户的微博内容进行用户的关键词提取，作为每个用户打标签的数据基础。 tf-idf原理参见百度百科项目实现流程整个项目实现流程主要有三步： 1. 遍历data文件夹下的所有最后一次调用时间戳后的id_post（id和微博内容）新文件，计算tf和idf，单独存入本地data/tf和idf下（一个id_post文件对应一个tf和idf文件） 2. 汇总每个idf文件成为一个最终的id

2016-05-16 14:01:58 753

原创 split

#include <string>#include <iostream>#include <fstream>#include <vector>#include <sstream>using namespace std;//void split(string& s, string& delim, vector< string >* ret)//{// size_t last = 0;

2016-05-10 14:50:45 656

原创 knn python

KNN简介来自百度百科以及 mlapp 邻近算法，或者说K最近邻(kNN，k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。所谓K最近邻，就是k个最近的邻居的意思，说的是每个样本都可以用它最接近的k个邻居来代表。 kNN算法的核心思想是如果一个样本在特征空间中的k个最相邻的样本中的大多数属于某一个类别，则该样本也属于这个类别，并具有这个类别上样本的特性。该方法

2016-04-23 15:20:17 896

原创逻辑回归推导

参考网址：《PRML》Logistic回归（逻辑回归，LR）的推导代码： python逻辑回归代码

2016-04-22 11:00:14 691

原创 xgboost note

参数记录param = {'bst:max_depth':3, 'bst:subsample':0.5, 'bst:min_child_weight':1,'bst:eta':0.3, 'silent':1,'objective':'binary:logistic'}param['nthread'] = 250 iter ：ａｕｃ：0.661716221418param = {'bst:max

2016-04-13 11:25:53 1083

转载 Centos升级gcc4.8

http://www.mudbest.com/centos%E5%8D%87%E7%BA%A7gcc4-4-7%E5%8D%87%E7%BA%A7gcc4-8%E6%89%8B%E8%AE%B0/

2016-04-09 22:51:57 464

转载 ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found

ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /root/.pyenv/versions/miniconda2-latest/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so)http://www.cnblogs

2016-04-09 22:45:52 10155

原创 pandas处理数据例子

API ： pandas apiimport pandas as pdimport osimport numpy as npdef add_prop(group): births = group.births.astype(float) group['prop'] = births / births.sum() return groupdef top1000(gro

2016-03-30 21:50:08 4237

原创 linux用户管理

*以下内容来自网络用户管理配置文件usr info : /etc/passwdpasswd info: /etc/shadowgroup info: /etc/groupuser setting file: /etc/login.defs /etc/default/useraddnew user info /etc/skellogin info /etc/motd /etc/issue

2016-03-25 18:36:53 473

原创 rpm|yum包管理|源代码包安装

rpm安装 rpm -ivh packageremove rpm -e package挂载光盘 mount /dev/cdrom /mnt/cdrom查询 rpm -q sudo rpm -qa | grep sambaother intall settings – excludedocs –prefix –test just for test , not really

2016-03-24 10:41:59 596

原创 coursera scala week one

http://blog.csdn.net/unhappypeople/article/details/17199951杨辉三角 def pascal(c:Int,r:Int): Int ={ if (c == 0 || c == r|| r==0) 1 else pascal(c-1,r-1) + pascal(c,r-1) }括号平衡def balance(

2016-03-20 14:56:50 478

原创框架

目前个人的构思如下:(红色部分未实现,或有问题) 将ml框架分为: 1. getdata 2 featureEngineering 3. train and evaluation 4 push para 这几个模块. 其中,在进行实验时可能对feature进行不同的feature进行不同的个性化的处理验证效果好坏, 例如: 网址:www.baidu.com 要改为分类型标签如是百度设为1

2016-03-15 14:06:15 479

原创 scala中的递归和currying

以 f(a) a=1,2,3…n求和为例。 1. 线性递归def sumFactorials(f:Int=>Int,a:Int,b:Int):Int={ if (a >b) 0 else f(a) + sumFactorials(f,a+1,b) }2.尾递归 def sumFacorials(f:Int=>Int,a:Int,b:Int):Int={ def loo

2016-03-13 20:42:02 755

原创 spark.driver.maxResultSize || java.lang.OutOfMemoryError

16/03/11 12:05:56 ERROR TaskSetManager: Total size of serialized results of 4 tasks (1800.7 MB) is bigger than spark.driver.maxResultSize (1024.0 MB) java.lang.OutOfMemoryError: Direct buffer memory.s

2016-03-11 16:46:05 3221

原创 scala 删除hdfs文件demo

def delete(master:String,path:String): Unit ={ println("Begin delete!--" + master+path) val output = new org.apache.hadoop.fs.Path(master+path) val hdfs = org.apache.hadoop.fs.FileSys

2016-03-11 10:19:05 4673

原创 Task not serializable

# Task not serializable 问题出现查阅google返现是不能将rdd的map放在另外一个class中, 而这个class不能序列化网址: http://stackoverflow.com/questions/29295838/org-apache-spark-sparkexception-task-not-serializable 我的办法是: 将此类中map

2016-03-08 21:28:37 687

原创二分搜索

#include <iostream>#include <vector>using namespace std;int main() { vector<int> nums={1,2,3,4,5,6,7}; int find_num = 9; auto beg = nums.begin(), end = nums.end(); auto mid = beg + (e

2016-02-15 17:30:55 292

原创 python学习手册简记

正则例子匹配hello开头 world结尾字符串中间的任意字符保存在group中.import rematch = re.match('Hello[ \t]*(.*)world','Hello python world')match.group(1)'python'match = re.match('/(.*)/(.*)/(.*)','/usr/home/lumberjack')mat

2016-01-07 12:08:13 834

原创 numpy 简记

基础不同类型存储# define a new dtypefrom numpy import *t = dtype([('name',str_,40),('numitems',int32),('price',float32)])In [9]: tOut[9]: dtype([('name', 'S40'), ('numitems', '<i4'), ('price', '<f4')]) item

2016-01-05 17:16:28 747

原创 python argparse 模板

# Parse arguments parser = argparse.ArgumentParser(description = "auto change CTR segmentation and winprice ") parser.add_argument('-r','--target_ctr', action = 'store', type = float,

2015-12-10 17:34:14 468

原创 python configParser 模板

if __name__ == "__main__": """ Part 0 - Read Parameters""" if len(sys.argv) != 4: print("Usage : %s feature_config input_file output_file" % __file__) sys.exit(-1) config = Co

2015-12-10 11:14:10 390

转载 python 多线程/多进程

多线程例子#coding=utf-8'''多线程＝用一个ｃｐｕ进程 pid 唯一标示符使用kill 杀死进程主线程创造一个进程的时候，会创造一个线程，这个线程被称为主线程一个进程里只有一个主线程python里的多线程，不是真正意义上的多线程。全局锁在任意的指定时间里，有且只有一个线程在运行a b c'''import threadingimport timedef test(p):

2015-12-05 12:22:36 662

转载 ipythonnotebook + spark

参考:http://blog.jobbole.com/86232/测试sparkpython在sparkhome下run-tests测试在Spark中使用IPython Notebook当搜索有用的Spark小技巧时，我发现了一些文章提到在PySpark中配置IPython notebook。IPython notebook对数据科学家来说是个交互地呈现科学和理论工作的必备工具，它集成了文本和Pyt

2015-11-20 09:51:21 5137

原创 HADOOP+R+RHIPE安装

hadoop是现在很流行的大数据处理的平台，而R语言是统计分析数据挖掘的强大工具，而它在大数据处理方面有所欠缺，其解决方案有并行计算，RHADOOP，和RHIPE。尝试安装RHIPE。安装环境环境版本 centos（64bit） 6.5 java jdk 1.6.0_45 R 3.1.2 Rhipe 0.73 Google protocol buffer

2015-11-19 11:59:52 3118 2

原创 shell脚本学习sed

sed-i 替换并保存's/pattern/replace_string/' 只替换第一处's/pattern/replace_string/g' 替换所有's/pattern/replace_string/Ng' 从第N处开始匹配# ex 直接替换文本[clz@localhost shell_learn]$ sed -i 's/cecho.sh/xxxxxxxxxxxxxx/' file

2015-11-17 14:28:00 462

转载 shell脚本学习正则表达

正则表达式-c 计算匹配的行数-o 只输出匹配出的字符串-E egrep-v 打印匹配之外的行-l 从多文件中查找输出含有表达式的文件名字-L 与-l相反-n 显示匹配的所在行-R -r递归-i 忽略pattern中的大小写-q 静默若grep 匹配成功返回0要匹配给定文本中的所有单词，可以使用下面的正则表达式# 单词匹配 ?表示可能出现的空格 [a-zA-Z]单词（

2015-11-17 14:27:23 505

转载 shell 脚本学习 awk

格式awk ' BEGIN{ print "start" } pattern { commands } END{ print "end" } file特殊变量[clz@localhost ~]$ echo -e "line1 f2 f3\nline2 f4 f5\nline3 f6 f7" | awk '{print "Line no:"NR,"No of fields:"NF , "$0="$0

2015-11-17 13:38:15 464

转载 shell 脚本学习2

交互输入自动化[cloudera@quickstart shell]$ head input.data1hello# shell[cloudera@quickstart shell]$ cat vim interactive.shcat: vim: No such file or directory#!/bin/bash#Filename": interactive.shread -

2015-11-16 14:30:12 756

原创 PLA code

感知机参考自机器学习基石# /usr/bin/env python2.7# encoding=utf-8import numpy as npimport random,osdef verify(weight,array_x,array_y): ''' verify prediction :param weight: itered weight :param a

2015-11-07 17:54:27 472

转载 pyenv python多版本共存

http://seisman.info/python-pyenv.html

2015-10-27 17:26:44 442

xgboost boostedTree 陈天奇

R语言核心技术手册 英文版 R in a nutshell

mapreduce cookbook 第二版 完整版 带书签 非扫描

空空如也

R语言核心技术手册英文版 R in a nutshell

mapreduce cookbook 第二版完整版带书签非扫描