Stanford Sentiment Treebank
Text
NLP
|...
许可协议: Unknown

Overview

This dataset includes:

  1. original_rt_snippets.txt contains 10,605 processed snippets from the original pool of Rotten Tomatoes HTML files. Please note that some snippet may contain multiple sentences.
  2. dictionary.txt contains all phrases and their IDs, separated by a vertical line |
  3. sentiment_labels.txt contains all phrase ids and the corresponding sentiment labels, separated by a vertical line.
  4. SOStr.txt and STree.txt encode the structure of the parse trees. STree encodes the trees in a parent pointer format. Each line corresponds to each sentence in the datasetSentences.txt file. The Matlab code of this paper will show you how to read this format if you are not familiar with it.
  5. datasetSentences.txt contains the sentence index, followed by the sentence string separated by a tab. These are the sentences of the train/dev/test sets.
  6. datasetSplit.txt contains the sentence index (corresponding to the index in datasetSentences.txt file) followed by the set label separated by a comma: 1=train,2=test, 3=dev

Citation

Please use the following citation when referencing the dataset:

@incollection{SocherEtAl2013:RNTN,
title = {{Parsing With Compositional Vector Grammars}},
author = {Richard Socher and Alex Perelygin and Jean Wu and Jason Chuang and Christopher Manning
and Andrew Ng and Christopher Potts},
booktitle = {{EMNLP}},
year = {2013}
}
数据概要
数据格式
Text,
数据量
--
文件大小
11.38MB
发布方
Stanford
Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California.
数据集反馈
出错了
刚刚
timeout_error
立即开始构建AI
出错了
刚刚
timeout_error