phoebus_si-CSDN博客

原创设定了所有种子后每次结果还是不一样 pytorch可重复可复现问题

查了能查的地方看了很多解决方案确实没有解决我遇到的这种情况的。废了很长时间才发现问题，所以记录一下，可能和我一样不小心遇到了这样的问题导致的无法重现结果。先说场景和问题：首先我的代码里对所有的种子都是有设定的，但是还是无法复现之前的结果def setup_seed(seed): torch.manual_seed(seed) os.environ['PYTHONH...

2020-05-08 00:35:21 11002 1

原创 gensim.models.keyedvectors 用gensim加载词向量时常见的两个错误 KeyedVectors.load_word2vec_format的报错

错误一：EOFError: unexpected end of input; is count incorrect or file otherwise damaged?错误二：ValueError: invalid vector on line 0 (is this really the text format?)用gensim加载预训练好的词向量时，可以发现词向量文件顶层是w...

2020-05-05 02:59:52 9318 4

原创 linux查看下层文件夹的所占磁盘大小显示M兆/G 查看当前磁盘剩余大小

du -h 文件夹名即查看下一层文件夹（包含其子文件夹各自的大小）占用磁盘大小du -h 文件夹名即查看当前文件夹所在的磁盘的用量显示结果：du -h即查看所有磁盘的用量显示结果：...

2020-04-07 01:04:59 638

原创清华镜像下载提高pip install速度 gensim用pip install速度过慢

命令：pip install -i https://pypi.tuna.tsinghua.edu.cn/simple gensim同理很多包都可以通过清华镜像来安装

2020-04-04 16:16:01 1542

翻译 Zero-Shot Learning - The Good, the Bad and the Ugly全文翻译（结合原文享用）

Zero-Shot Learning - The Good, the Bad and the Ugly 原文链接：https://arxiv.org/abs/1703.04394 配合享用，翻译有问题的地方请多多指教Zero-short Learning – 优点，缺点和丑陋摘要由于zero-shot learning的重要性，最近提出的方法的数量大幅度...

2020-03-26 21:25:40 2324 1

翻译 An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild

An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild全文翻译，核心名词依然用英文表示哪里翻译的不好请多多指教：）野外目标识别广义零样本学习的实证研究与分析摘要零样本学习(ZSL)方法已经在假设测试数据仅仅来自于未知类别的不符合现实情...

2020-03-20 00:23:08 1004

原创 ValueError: Tried to convert 'g' to a tensor and failed. Error: None values not supported多卡GPU调用代码报错

def average_gradients(tower_grads): average_grads=[] for grad_and_vars in zip(*tower_grads): grads=[] for g, _ in grad_and_vars: expend_g=tf.expand_dims(g,0) ...

2020-02-03 22:57:34 7521 4

原创已知随机变量X的协方差矩阵求去X的特征值特征向量 PCA投影矩阵

已知随机变量X的协方差矩阵求去X的特征值特征向量 PCA投影矩阵相关的知识都忘记了，去查的时候没有耐心看别人长篇大论讲解，就只简单记录了一下如果从协方差矩阵来计算特征值和特征向量。定义：1.特征值（lambda）*E与A相等。 2.此处A直接用协方差矩阵即可，“用X的协方差矩阵求取的特征值和用X求取的是相等的” 3.全程不必求X是什么，求得...

2020-01-04 22:48:47 7161 1

原创如何证明在启发函数h(n)是可采纳的情况下树搜索的A*算法的最优性

思路：证明最优解A的祖先节点n要比次优解B先被拓展即可。关于树搜索A*算法的具体概念可以参考：https://blog.csdn.net/zhulichen/article/details/78786493详细证明思路如图...

2020-01-01 21:10:43 3288 1

原创 Linux忘记将任务放后台不能终止程序将前台任务放在后台继续运行（nohup, setsid, &, disown）

本文链接：https://blog.csdn.net/weixin_40015791/article/details/103621927今天将任务在服务器跑上，但是忘记放后台了，运行了七八个小时又不舍得关掉终端，一关掉（或者网断）就会终止任务，执行命令没有用nohup和&。此时应该：亡羊补牢，为没有使用nohup与setsid的进程加上忽略HUP信号的功能，并放在后台中。只...

2019-12-19 20:33:28 1095

原创 tf.cond报错Initializer for variable is from inside control-flow construct such as a loop or condition

完整报错信息：ValueError: Initializer for variable lambda_5/cond/mrcnn_mask_conv1/kernel/ is from inside a control-flow construct, such as a loop or conditional. When creating a variable inside a loop or con...

2019-11-16 23:45:55 1184 1

原创 typeError: Cannot interpret feed_dict key as Tensor: Can not convert a Int into a Tensor

报错信息：TypeError: Cannot interpret feed_dict key as Tensor: Can not convert a int into a Tensor报错代码： [s, t, c] = sess.run([ lstm.s, lstm.t, lstm.c], feed_dict={lstm.i...

2019-07-29 23:35:44 4209 1

原创 InvalidArgumentError: You must feed a value for placeholder tensor 'Placeholder_11' with dtype float

报错信息：InvalidArgumentError:Youmustfeedavalueforplaceholdertensor'Placeholder_11'withdtypedoubleandshape[3,300]报错代码： [s, t, c] = sess.run([ lstm.s, lstm.t, lstm.c],...

2019-07-29 23:31:08 2000

原创 Could not install packages due to an EnvironmentError: pip install 安装时遇到环境错误的解决办法

具体报错 Found existing installation: setuptools 18.5 Uninstalling setuptools-18.5:Could not install packages due to an EnvironmentError: [('/System/Library/Frameworks/Python.framework/Versions/2....

2019-06-19 19:05:53 2956

原创 Cannot uninstall 'numpy'. It is a distutils installed project and thus we cannot accurately determin

pip2 install tensorflow 安装python27版本的tensorflow的时候报错，报错信息如下：Installing collected packages: wrapt, tensorflow-estimator, numpy, six, keras-preprocessing, absl-py, astor, funcsigs, mock, backports...

2019-06-19 18:52:39 5198

原创 error: could not create 'xxxxxx': Permission denied

先说解决办法————将目标文件夹赋予权限即可：sudo chmod -R 777 xxxxxx然后输入密码。我再重新安装python27的时候遇到了这个问题 pip install python2 时候报错信息如下： creating build/temp.macosx-10.14-intel-2.7 creating build/temp.macosx-10.1...

2019-06-19 18:29:12 8412

原创中文情感分类任务如何对bert语言模型微调，微调后的模型如何使用

要想在中文情感分类任务中完成bert语言模型的微调，需要有bert开源的代码，然后在bert开源数据中下载chinese_L-12_H-768_A-12，最后还要有中文情感数据，数据格式为（类别id\t句子）。如果bert代码和中文情感数据没有，可以在我分享的资源中下载。如果三者都有了按照以下操作即可完成微调，并对微调后的模型进行使用。run_classifier.py中找到proces...

2019-05-21 16:58:47 7290 15

原创 zsh: command not found: activate/conda等报错等解决办法

在安装之前zsh之前是可以使用这些命令的，安装之后不能使用只需要一个命令：source ~/.bash_profile

2019-05-21 16:24:47 4207

原创 sort: Illegal byte sequence Error 编码问题 Bash Shell Mac

#命令：cat dvd_positive_1000.txt | sort -R > shuffle_dvd_positive.txt#报错：sort: Illegal byte sequence解决办法1，在terminal中输入LANG=C 然后输入命令如果上述方法不管用就尝试解决办法2：在terminal中输入LC_ALL=C然后输入命令...

2019-05-18 22:45:33 4343

原创 Mac终端显示中文乱码 VIM打开CSV文件中文乱码

先说解决办法：打开termina依次输入以下命令：echo "export LANGUAGE=en_US.UTF-8export LANG=en_US.UTF-8export LC_ALL=en_US.UTF-8">>~/.bash_profile然后source ~/.bash_profile同时也可以解决Perl的warningsperl: w...

2019-05-18 16:57:17 1005

原创读文件时报错UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbd in position 1326: invalid start byte

file = open(open_file, 'r', encoding='utf-8', errors='ignore'）解决办法见上

2019-05-18 12:34:35 3604 1

原创 Keras Bi-lstm 报错AttributeError:'Tensor' object has no attribute 'get_config' 如何用Keras实现双向LSTM

已知单向LSTM可以通过以下两行命令获取x = Embedding(max_features, embedding_dims, input_length=maxlen)(input_layer)lstm_layer=LSTM(128)(x)后想将lstm换成bi-lstm，也可直接调用下述包来实现：from keras.layers import Bidir...

2019-05-16 14:04:20 2972 2

原创 Keras显示召回率(classification metrics can't handle a mix of multi-label-indicator targets) model.predict

本来程序中用了model.evaluate来求loss和准确率score, acc = model.evaluate(X_test, y_test, batch_size=batch_size)后来想加上recall，查了半天也没找到model.evaluate能返回recall。后来就想换个函数：y_pred=model.predict(X_test, batch_size=...

2019-05-15 19:54:42 47143 23

原创 tf.reshap()和tf.tail() Contents: [Dimension(None), 1]. Consider casting elements to a supported types

在tensorflow/Keras中batch_size是变化的且构图时初始化为None而造成的Cannot convert unknown dimension (None) 或者Contents: [Dimension(None), 1]. Consider casting elements to a supported types报错的解决办法。我在Kears构图过程中中想用tf.t...

2019-05-04 13:07:41 3627

原创 Keras 如何增加可训练的变量作为权重weight 并给已有的layer加权

这个小问题为难了自己一天，所以fix后决定记录以下。这篇博客是按照自己发现问题到定位问题再到解决问题的顺序来记录的，如果各位大神有更好的解决办法请指出～如果想直接上手大家可以直接看最后代码。在我的项目中需要一个可训练的vector（1,hidden_num）来对一个(batch_size,hidden_num)的矩阵每一维完成点乘在代码中看，目标是完成以下操作semantic...

2019-05-04 12:50:40 14219 9

原创 VAE手写体识别项目实现（详细注释）从小项目通俗理解变分自编码器（Variational Autoencoder, VAE）tu

项目及代码来源：https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/variational_autoencoder.py在看代码前可以简单理解vae的基本概念上，推荐一篇知乎文章：https://zhuanlan.zhihu.com/p/55557709还有...

2019-05-01 03:23:21 2329 2

原创可直接运行的gan小项目（详细注释）结合代码通俗理解gan的原理

大家先看底下对这个gan小项目来源和介绍，然后再看代码，copy下来可直接运行。中文注释是我自己理解的过程中加的，对好不了解gan但是想直接上手的同志比较友好，如果注释有不合理的地方欢迎指出～这个小项目有利于不了解gan原理的同志迅速上手和了解gan原理#-*-coding= utf-8 -*-import tensorflow as tfimport numpy as np#i...

2019-05-01 02:39:42 2896 1

原创 python 递归和动态规划 DP算法两种方法求解最长回文子串问题

题目：给定一个字符串s，找到s中最长的回文子串。你可以假设s的最大长度为 1000。示例 1：输入: "babad"输出: "bab"注意: "aba" 也是一个有效答案。示例 2：输入: "cbbd"输出: "bb"第一种解法：递归，时间复杂度较高class Solution(object): def longestPalindrome...

2019-04-25 03:24:05 913 1

原创如何在keras构造的分类模型中将bert预训练出的句子向量（两行代码即可得出）作为一部分输入加入模型

如何在keras构造的分类模型中将bert预训练出的句子向量（两行代码即可得出）作为一部分输入加入模型分三步走：第一步：下载预训练好的bert模型并安装bert-as-service1.首先需要先下载bertgit clone https://github.com/google-research/bert.git2.然后下载好预训练好的bert模型我做的是中文分类任务，所...

2019-04-25 02:43:35 3542 2

原创 InvalidArgumentError: seq_lens() > input.dims()[[Node: hidden/bidirectional_rnn/bw/ReverseSequence

报错信息：Google了一下没有找到对应的解决方案，看了bidrection_rrn源码，发现问题出现在一个tf.reverse_sequence函数中。https://www.tensorflow.org/api_docs/python/tf/reverse_sequence错误的主要原因是在对seq进行反转时，没有满足以下条件：The elements ofseq...

2019-03-18 16:45:15 1634 2

原创 tf.contrib.rnn.LSTMCell(self.u) 报错module “tf.contrib.rnn” has no attribute 'LSTMCell‘解决办法

问题：cell_fw = tf.contrib.rnn.LSTMCell(self.u)报错module “tf.contrib.rnn” has no attribute 'LSTMCell‘解决办法：更新tensorflow到最新版本即可。更新方法：先激活tensorflow环境：activate tensorflow然后cpu使用命令：pip install...

2019-02-23 22:59:24 1572 2

原创 tf.nn.embedding_lookup(...,...,,max_norm=1)报错module has no attribute 'max_norm’

问题：tf.nn.embedding_lookup(...,...,,max_norm=1)报错module “tf.nn.embedding_lookup” has no attribute 'max_norm‘解决办法：更新tensorflow到最新版本即可。解决思路：见了这个问题后我首先想去查一下tf.nn.embedding_lookup()的api,发现官方文档中确实是有...

2019-02-23 22:57:36 375

原创 git push的时候报错：error: src refspec xxxxx does not match any. 错误原因和解决方法

先说一下自己的修改经历：我本身想push到的是一个master的分支bugfix/V1120 然而当我执行 git push origin bugfix/V1120 报错 error: src refspec xxxxx does not match any. 首先我想可能是分支名写错了。但是反复确认后还是有这个错误。...

2018-11-20 20:12:33 5521 1

原创 pylint中的Trailing whitespace (trailing-whitespace)如何解决？

句子末尾多空格或者tab，删除即可

2018-11-07 19:59:21 20597

原创如何使用自动check python文件的Google风格的检测工具pylint？

1.pip install pylint2.复制https://github.com/vinitkumar/googlecl/blob/master/googlecl-pylint.rc中的配置文件到任意路径PATH，可直接复制以下代码[MASTER]# Specify a configuration file.#rcfile=# Python code to execute,...

2018-11-07 17:03:29 722

原创 Perl实战项目：随机生成成语交叉显示在网格（设置不同难度、显示在网页并给出正确答案）

一．设计思路，算法说明1.基本流程A.设置三个难度，每个难度的可能出现的成语基数不同，成语出现的方式不同：难度一：成语基数小，随机性低，全部正序出现（从左到右和从上到下）；难度二：成语基数适中，随机性适中，以倒序优先，无可满足的倒序情况则正序出现；难度三：成语基数最大，随机性高，以倒序优先（同上），部分成语有重合叠加部分。B.生成随机汉字：根据难度划分要读取成语的数量。将成...

2018-10-17 00:12:58 1009

原创 Perl语言：人机成语接龙游戏及评分系统

运行结果截图：读取成语的文件格式：格式为机器出一个成语，玩家接龙一个，机器再次接龙，以此类推。评分系统详细Rule为：您一共有三次机会;每答错一次接龙会失去一次机会\n\t但相应的系统会给出最多十条提示成语可选以继续进行比赛;每个成语对应的分值由成语的“常用程度”和“可接成语数量”即难度系数来决定;15秒内接上正确的成语则得对应分数;超过15秒则按照幂函数对分值加权，时间越久分值越低;本次接龙比赛...

2018-03-04 23:01:43 525

中科院高级人工智能符号主义知识点总结.pdf

中国科学院大学高级人工智能课程符号主义部分罗平老师部分所有考点重点总结和证明有完整的思路曲线对每一个考点都有涵盖和展开证明如归结原理的完备性

2020-01-04

ag_news_csv.tgz

496,835 条来自 AG 新闻语料库 4 大类别超过 2000 个新闻源的新闻文章，数据集仅仅援用了标题和描述字段。每个类别分别拥有 30,000 个训练样本及 1900 个测试样本。 README： AG's News Topic Classification Dataset Version 3, Updated 09/09/2015 ORIGIN AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html . The AG's news topic classification dataset is constructed by Xiang Zhang ([email protected]) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015). DESCRIPTION The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600. The file classes.txt contains a list of classes corresponding to each label. The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".

2019-05-28

ag_news数据集

2019-05-28

fine_tuning_data.zip 可直接用bert进行微调的中文情绪数据

具体使用方法可以看我的博客：https://blog.csdn.net/weixin_40015791/article/details/90410083 下面也会简单介绍一下：在bert开源代码中的run_classifier.py中找到 processors = { "cola": ColaProcessor, "mnli": MnliProcessor, "mrpc": MrpcProcessor, "xnli": XnliProcessor, "intentdetection":IntentDetectionProcessor, "emotion":EmotionProcessor, #新加上这一行 } 然后在该文件中增加一个class： class EmotionProcessor(DataProcessor): """Processor for the MRPC data set (GLUE version).""" def get_train_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_train_data.tsv")), "train") #此处的名字和文件夹中的训练集的名字要保持一致 def get_dev_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_val_data.tsv")), "dev") def get_test_examples(self, data_dir): """See base class.""" return self._create_examples( self._read_tsv(os.path.join(data_dir, "fine_tuning_test_data.tsv")), "test") def get_labels(self): """See base class.""" return ["0", "1","2","3","4","5","6"] #七分类则从0到6 def _create_examples(self, lines, set_type): """Creates examples for the training and dev sets.""" examples = [] for (i, line) in enumerate(lines): if i == 0: continue guid = "%s-%s" % (set_type, i) if set_type == "test": label = "0" text_a = tokenization.convert_to_unicode(line[0]) else: label = tokenization.convert_to_unicode(line[0]) text_a = tokenization.convert_to_unicode(line[1]) examples.append( InputExample(guid=guid, text_a=text_a, text_b=None, label=label)) return examples 最后直接调用即可，运行的命令如下： python run_classifier.py \ --task_name=emotion \ --do_train=true \ --do_eval=true \ --data_dir=data \ #把数据解压到同一级的文件夹中，此处是该文件夹名字data --vocab_file=chinese_L-12_H-768_A-12/vocab.txt \ #中文数据要微调的原始bert模型 --bert_config_file=chinese_L-12_H-768_A-12/bert_config.json \ --init_checkpoint=chinese_L-12_H-768_A-12/bert_model.ckpt \ --max_seq_length=128 \ --train_batch_size=32 \ --learning_rate=2e-5 \ --num_train_epochs=3.0 \ --output_dir=output #生成文件所在的文件夹大概9个小时，最后文件夹中会有三个文件后缀分别为index/meta/00000-of-00001,分别将这个改成bert_model.ckpt.index/bert_model.ckpt.meta/bert_model.ckpt.data-00000-of-00001，再在同一个文件夹中放入chinese_L-12_H-768_A-12中的vocab.txt和bert_config.json 即最后该文件夹中有5个文件。然后像调用chinese_L-12_H-768_A-12一样将文件夹名改成自己的文件夹名即可。 bert-serving-start -model_dir output -num_worfer=3 即可调用微调后的语言通用模型。

2019-05-21

来自于NLPCC2013，解析成txt文件不均衡分类中文情感分析7类情感.zip

来自于NLPCC2013，解析后每一行为情感\t句子共有七类情感，且分布不均衡，划分训练集和测试集后数据数量为 1488 anger_data.txt 186 anger_test.txt 186 anger_val.txt 8:1:1 2459 disgust_data.txt 307disgust_test.txt 307 disgust_val.txt 8:1:1 201 fear_data.txt 50 fear_test.txt 50 fear_val.txt 4:1:1 2298happiness_data.txt 287happiness_test.txt 287happiness_val.txt 8:1:1 3286 like_data.txt 410 like_test.txt 410 like_val.txt 8:1:1 1917 sadness_data.txt 239 sadness_test.txt 239 sadness_val.txt 8:1:1 626 surprise_data.txt 78 surprise_test.txt 78 surprise_val.txt 8:1:1

2019-05-17

Multi-Domain Sentiment Datase英文情感分析数据正负情感 semantic_data.zip

Multi-Domain Sentiment Dataset解析成txt文件，只提取出文本和对应标签。 positive和negative二分类。包括dvd，kitchen，books，electronics四个domain数据，每一个domain分别有positive和negative数据各1000条。每一行为lable\tSentence。详见https://www.cs.jhu.edu/~mdredze/publications/sentiment_acl07.pdf

2019-05-17

CapsuleNetWork：从TensorFlow复现代码理解胶囊网络（DynamicRoutingBetweenCapsules）

从TensorFlow复现代码理解胶囊网络（Dynamic Routing Between Capsules）论文链接：https://arxiv.org/abs/1710.09829 Tensorflow代码复现链接：https://github.com/naturomics/CapsNet-Tensorflow

2018-10-17

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人