graviti
产品服务
解决方案
知识库
公开数据集
关于我们
avatar
Stanford Sentiment Treebank
Text
许可协议: GNU GPL v3

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt file. The Matlab code of this paper will show you how to read this format if you are not familiar with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt contains the sentence index (corresponding to the index in datasetSentences.txt file) followed by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}

License

This dataset is licensed under the GNU General Public License v3 or later

数据概要
数据格式
text,
数据量
--
文件大小
11.38MB
发布方
Stanford
Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California.
| 数据量 -- | 大小 11.38MB
Stanford Sentiment Treebank
Text
许可协议: GNU GPL v3

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt file. The Matlab code of this paper will show you how to read this format if you are not familiar with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt contains the sentence index (corresponding to the index in datasetSentences.txt file) followed by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}

License

This dataset is licensed under the GNU General Public License v3 or later

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号