Abstract: Scene classification in very high-resolution (VHR) remote sensing (RS) images is a challenging task due to the complex and diverse content of the images. Recently, convolution neural ...
Abstract: Dense video captioning requires localization and description of multiple events in long videos. Prior works detect events in videos solely relying on the visual content and completely ignore ...