Skip to main content

Dataset

Introduction

REASONER is an explainable recommendation dataset. It contains the ground truths for multiple explanation purposes, for example, enhancing the recommendation persuasiveness, informativeness and satisfaction. In this dataset, the ground truth annotators are exactly the people who produce the user-item interactions, and they can make selections from the explanation candidates with multi-modalities. This dataset can be widely used for explainable recommendation, unbiased recommendation, psychology-informed recommendation and so on. Please see our paper for more details.

The dataset contains the following files.

 REASONER-Dataset
│── dataset
│ ├── interaction.csv
│ ├── user.csv
│ ├── video.csv
│ ├── bigfive.csv
│ ├── tag_map.csv
│ ├── video_map.csv
│── preview
│── README.md

How to Obtain the Dataset

You can directly download the REASONER dataset through the following three links:

  • Google Drive

  • Baidu Netdisk

  • OneDrive

Data description

1. interaction.csv

This file contains the user's annotation records on the video, including the following fields:

Field Name:DescriptionTypeExample
user_idID of the userint640
video_idID of the viewed videoint643650
likeWhether user like the video: 0 means no, 1 means yesint640
persuasiveness_tagThe user selected tags for the question "Which tags are the reasons that you would like to watch this video?" before watching the videolist[4728,2216,2523]
ratingUser rating for the video, the range is 1.0~5.0float643.0
reviewUser review for the videostrThis animation is very interesting, my friends and I like it very much.
informativeness_tagThe user selected tags for the question "Which features are most informative for this video?" after watching the videolist[2738,1216,2223]
satisfaction_tagThe user selected tags for the question "Which features are you most satisfied with?" after watching the video.list[738,3226,1323]
watch_againIf the system only show the satisfaction_tag to the user, whether the she would like to watch this video? 0 means no, 1 means yesint640

Note that if the user chooses to like the video, the watch_again item has no meaning and is set to 0.

2. user.csv

This file contains user profiles.

Field Name:DescriptionTypeExample
user_idID of the userint641005
ageUser age (indicated by ID)int643
genderUser gender: 0 means female, 1 means maleint640
educationUser education level (indicated by ID)int643
careerUser occupation (indicated by ID)int6420
incomeUser income (indicated by ID)int643
addressUser address (indicated by ID)int6423
hobbyUser hobbiesstrdrawing and soccer.

3. video.csv

This file contains information of videos.

Field Name:DescriptionTypeExample
video_idID of the videoint641
titleTitle of the videostrTake it once a day to prevent depression.
infoIntroduction of the videostrJust like it, once a day
tagsID of the video tagslist[112,33,1233]
durationDuration of the video in secondsint64120
categoryCategory of the video (indicated by ID)int643

4. bigfive.csv

We administered the Big Five Personality Test based on CBF-PI-15 to the annotators, and their responses to 15 questions, along with a user_id column, are stored in the bigfive.csv file. The CBF-PI-15 scale utilizes a Likert six-point scoring system with the following score interpretations:

  • 0: Completely Not Applicable
  • 1: Mostly Not Applicable
  • 2: Somewhat Not Applicable
  • 3: Somewhat Applicable
  • 4: Mostly Applicable
  • 5: Completely Applicable

In this scale, questions 2 and 5 are reverse-scored. The dimensions and corresponding items are as follows:

  • Neuroticism Dimension (Items 7, 11, and 12)
  • Conscientiousness Dimension (Items 6, 8, and 15)
  • Agreeableness Dimension (Items 1, 9, and 13)
  • Openness Dimension (Items 3, 4, and 10)
  • Extraversion Dimension (Items 2, 5, and 14)

The questions are described as follows:

QuestionDescription
Q1I think most people are basically well-intentioned
Q2I get bored with crowded parties
Q3I'm a person who takes risks and breaks the rules
Q4i like adventure
Q5I try to avoid crowded parties and noisy environments
Q6I like to plan things out at the beginning
Q7I worry about things that don't matter
Q8I work or study hard
Q9Although there are some liars in the society, I think most people are still credible
Q10I have a spirit of adventure that no one else has
Q11I often feel uneasy
Q12I'm always worried that something bad is going to happen
Q13Although there are some dark things in human society (such as war, crime, fraud), I still believe that human nature is generally good
Q14I enjoy going to social and entertainment gatherings
Q15It is one of my characteristics to pay attention to logic and order in doing things

We refer the users to [1] and [2] for more details about the Big Five Personality Test.

[1] https://www.xinlixue.cn/web/xinliliangbiao/rengeliangbiao/2020-04-01/849.html

[2] Zhang, X., Wang, M-C, Luo, J., He, L. The development and psychometrics evaluation of a very shorten version of the Chinese Big five personality inventory. PLoS ONE.

5. tag_map.csv

Mapping relationship between the tag ID and the tag content. We add 7 additional tags that all videos contain, namely "preview 1, preview 2, preview 3, preview 4, preview 5, title, content".

Field Name:DescriptionTypeExample
tag_idID of the tagint641409
tag_contentThe content corresponding to the tagstrcute baby

6. video_map.csv

Mapping relationship between the video ID and the folder name in preview.

Field Name:DescriptionTypeExample
video_idID of the videoint641
folder_nameThe folder name corresponding to the videostr83062078

7. preview

Each video contains 5 image previews.

The mapping relationship between the folder name and the video ID is in video_map.csv.

Statistics

1. The basic statistics of REASONER

We have collected the basic information of the REASONER dataset and listed it in the table below. "u-v" represents the number of interactions between users and videos, "u-t" represents the number of tags clicked by users, and "Q1, Q2, Q3" respectively represent the persuasiveness, informativeness, and satisfaction of the tags.

#User#Video#Tag#u-v#u-t (Q1)#u-t (Q2)#u-t (Q3)
2,9974,6726,11558,497263,885271,456256,079

2. Statistics on the users

3. Statistics on the videos

Codes for accessing our data

We provide code to read the data into data frame with pandas.

import pandas as pd

# access interaction.csv
interaction_df = pd.read_csv('interaction.csv', sep='\t', header=0)
# get the first ten lines
print(interaction_df.head(10))
# get each column
# ['user_id', 'video_id', 'like', 'persuasiveness_tag', 'rating', 'review', 'informativeness_tag', 'satisfaction_tag', 'watch_again', ]
for col in interaction_df.columns:
print(interaction_df[col][:10])

# access user.csv
user_df = pd.read_csv('user.csv', sep='\t', header=0)
print(user_df.head(10))
# ['user_id', 'age', 'gender', 'education', 'career', 'income', 'address', 'hobby']
for col in user_df.columns:
print(user_df[col][:10])

# access video.csv
video_df = pd.read_csv('video.csv', sep='\t', header=0)
print(video_df.head(10))
# ['video_id', 'title', 'info', 'tags', 'duration', 'category']
for col in video_df.columns:
print(video_df[col][:10])

# access bigfive.csv
bigfive_df = pd.read_csv('bigfive.csv', sep='\t', header=0)
print(bigfive_df.head(10))
# ['user_id', 'Q1', ..., 'Q15']
for col in bigfive_df.columns:
print(bigfive_df[col][:10])

# access tag_map.csv
tag_map_df = pd.read_csv('tag_map.csv', sep='\t', header=0)
print(tag_map_df.head(10))
# ['tag_id', 'tag_content']
for col in tag_map_df.columns:
print(tag_map_df[col][:10])

# access video_map.csv
video_map_df = pd.read_csv('video_map.csv', sep='\t', header=0)
print(video_map_df.head(10))
# ['video_id', 'folder_name']
for col in video_map_df.columns:
print(video_map_df[col][:10])