Dataset

Introduction

REASONER is an explainable recommendation dataset. It contains the ground truths for multiple explanation purposes, for example, enhancing the recommendation persuasiveness, informativeness and satisfaction. In this dataset, the ground truth annotators are exactly the people who produce the user-item interactions, and they can make selections from the explanation candidates with multi-modalities. This dataset can be widely used for explainable recommendation, unbiased recommendation, psychology-informed recommendation and so on. Please see our paper for more details.

The dataset contains the following files.

 REASONER-Dataset
  │── dataset
  │   ├── interaction.csv
  │   ├── user.csv
  │   ├── video.csv
  │   ├── bigfive.csv 
  │   ├── tag_map.csv 
  │   ├── video_map.csv 
  │── preview
  │── README.md

How to Obtain the Dataset

You can directly download the REASONER dataset through the following three links:

Data description

1. interaction.csv

This file contains the user's annotation records on the video, including the following fields:

Field Name:	Description	Type	Example
user_id	ID of the user	int64	0
video_id	ID of the viewed video	int64	3650
like	Whether user like the video: 0 means no, 1 means yes	int64	0
persuasiveness_tag	The user selected tags for the question "Which tags are the reasons that you would like to watch this video?" before watching the video	list	[4728,2216,2523]
rating	User rating for the video, the range is 1.0~5.0	float64	3.0
review	User review for the video	str	This animation is very interesting, my friends and I like it very much.
informativeness_tag	The user selected tags for the question "Which features are most informative for this video?" after watching the video	list	[2738,1216,2223]
satisfaction_tag	The user selected tags for the question "Which features are you most satisfied with?" after watching the video.	list	[738,3226,1323]
watch_again	If the system only show the satisfaction_tag to the user, whether the she would like to watch this video? 0 means no, 1 means yes	int64	0

Note that if the user chooses to like the video, the watch_again item has no meaning and is set to 0.

2. user.csv

This file contains user profiles.

Field Name:	Description	Type	Example
user_id	ID of the user	int64	1005
age	User age (indicated by ID)	int64	3
gender	User gender: 0 means female, 1 means male	int64	0
education	User education level (indicated by ID)	int64	3
career	User occupation (indicated by ID)	int64	20
income	User income (indicated by ID)	int64	3
address	User address (indicated by ID)	int64	23
hobby	User hobbies	str	drawing and soccer.

3. video.csv

This file contains information of videos.

Field Name:	Description	Type	Example
video_id	ID of the video	int64	1
title	Title of the video	str	Take it once a day to prevent depression.
info	Introduction of the video	str	Just like it, once a day
tags	ID of the video tags	list	[112,33,1233]
duration	Duration of the video in seconds	int64	120
category	Category of the video (indicated by ID)	int64	3

4. bigfive.csv

We administered the Big Five Personality Test based on CBF-PI-15 to the annotators, and their responses to 15 questions, along with a user_id column, are stored in the bigfive.csv file. The CBF-PI-15 scale utilizes a Likert six-point scoring system with the following score interpretations:

0: Completely Not Applicable
1: Mostly Not Applicable
2: Somewhat Not Applicable
3: Somewhat Applicable
4: Mostly Applicable
5: Completely Applicable

In this scale, questions 2 and 5 are reverse-scored. The dimensions and corresponding items are as follows:

Neuroticism Dimension (Items 7, 11, and 12)
Conscientiousness Dimension (Items 6, 8, and 15)
Agreeableness Dimension (Items 1, 9, and 13)
Openness Dimension (Items 3, 4, and 10)
Extraversion Dimension (Items 2, 5, and 14)

The questions are described as follows:

Question	Description
Q1	I think most people are basically well-intentioned
Q2	I get bored with crowded parties
Q3	I'm a person who takes risks and breaks the rules
Q4	i like adventure
Q5	I try to avoid crowded parties and noisy environments
Q6	I like to plan things out at the beginning
Q7	I worry about things that don't matter
Q8	I work or study hard
Q9	Although there are some liars in the society, I think most people are still credible
Q10	I have a spirit of adventure that no one else has
Q11	I often feel uneasy
Q12	I'm always worried that something bad is going to happen
Q13	Although there are some dark things in human society (such as war, crime, fraud), I still believe that human nature is generally good
Q14	I enjoy going to social and entertainment gatherings
Q15	It is one of my characteristics to pay attention to logic and order in doing things

We refer the users to [1] and [2] for more details about the Big Five Personality Test.

[1] https://www.xinlixue.cn/web/xinliliangbiao/rengeliangbiao/2020-04-01/849.html

[2] Zhang, X., Wang, M-C, Luo, J., He, L. The development and psychometrics evaluation of a very shorten version of the Chinese Big five personality inventory. PLoS ONE.

5. tag_map.csv

Mapping relationship between the tag ID and the tag content. We add 7 additional tags that all videos contain, namely "preview 1, preview 2, preview 3, preview 4, preview 5, title, content".

Field Name:	Description	Type	Example
tag_id	ID of the tag	int64	1409
tag_content	The content corresponding to the tag	str	cute baby

6. video_map.csv

Mapping relationship between the video ID and the folder name in preview.

Field Name:	Description	Type	Example
video_id	ID of the video	int64	1
folder_name	The folder name corresponding to the video	str	83062078

7. preview

Each video contains 5 image previews.

The mapping relationship between the folder name and the video ID is in video_map.csv.

Statistics

1. The basic statistics of REASONER

We have collected the basic information of the REASONER dataset and listed it in the table below. "u-v" represents the number of interactions between users and videos, "u-t" represents the number of tags clicked by users, and "Q1, Q2, Q3" respectively represent the persuasiveness, informativeness, and satisfaction of the tags.

#User	#Video	#Tag	#u-v	#u-t (Q1)	#u-t (Q2)	#u-t (Q3)
2,997	4,672	6,115	58,497	263,885	271,456	256,079

2. Statistics on the users

3. Statistics on the videos

Codes for accessing our data

We provide code to read the data into data frame with pandas.

import pandas as pd

# access interaction.csv
interaction_df = pd.read_csv('interaction.csv', sep='\t', header=0)
# get the first ten lines
print(interaction_df.head(10))
# get each column 
# ['user_id', 'video_id', 'like', 'persuasiveness_tag', 'rating', 'review', 'informativeness_tag', 'satisfaction_tag', 'watch_again', ]
for col in interaction_df.columns:
    print(interaction_df[col][:10])

# access user.csv
user_df = pd.read_csv('user.csv', sep='\t', header=0)
print(user_df.head(10))
# ['user_id', 'age', 'gender', 'education', 'career', 'income', 'address', 'hobby']
for col in user_df.columns:
    print(user_df[col][:10])
  
# access video.csv
video_df = pd.read_csv('video.csv', sep='\t', header=0)
print(video_df.head(10))
# ['video_id', 'title', 'info', 'tags', 'duration', 'category']
for col in video_df.columns:
  print(video_df[col][:10])

# access bigfive.csv
bigfive_df = pd.read_csv('bigfive.csv', sep='\t', header=0)
print(bigfive_df.head(10))
# ['user_id', 'Q1', ..., 'Q15']
for col in bigfive_df.columns:
    print(bigfive_df[col][:10])

# access tag_map.csv
tag_map_df = pd.read_csv('tag_map.csv', sep='\t', header=0)
print(tag_map_df.head(10))
# ['tag_id', 'tag_content']
for col in tag_map_df.columns:
    print(tag_map_df[col][:10])
  
# access video_map.csv
video_map_df = pd.read_csv('video_map.csv', sep='\t', header=0)
print(video_map_df.head(10))
# ['video_id', 'folder_name']
for col in video_map_df.columns:
    print(video_map_df[col][:10])

Introduction​

How to Obtain the Dataset​

Data description​

1. interaction.csv​

2. user.csv​

3. video.csv​

4. bigfive.csv​

5. tag_map.csv​

6. video_map.csv​

7. preview​

Statistics​

1. The basic statistics of REASONER​

2. Statistics on the users​

3. Statistics on the videos​

Codes for accessing our data​