We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling. The dataset includes 81,743 unique photos in 20,211 sequences, aligned to descriptive and story language.
| 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|
|
|
|
|
|
| The dog was ready to go. | He had a great time on the hike. | And was very happy to be in the field. | His mom was so proud of him. | It was a beautiful day for him. |
Photos by kameraschwein / CC BY-NC-ND 2.0
Photos by mharvey75 / CC BY-NC 2.0, janelle / CC BY-NC-ND 2.0, lance_mountain / CC BY-NC-ND 2.0
Photos by rbieber / CC BY-NC-ND 2.0