February 1, 2020

3441 words 17 mins read

Paper Group AWR 75

Exploring spectro-temporal features in end-to-end convolutional neural networks. On Slicing Sorted Integer Sequences. Warp and Learn: Novel Views Generation for Vehicles and Other Objects. FFA-Net: Feature Fusion Attention Network for Single Image Dehazing. Transductive Zero-Shot Learning with Visual Structure Constraint. Generalisation dynamics of …

Exploring spectro-temporal features in end-to-end convolutional neural networks


Title	Exploring spectro-temporal features in end-to-end convolutional neural networks
Authors	Sean Robertson, Gerald Penn, Yingxue Wang
Abstract	Triangular, overlapping Mel-scaled filters (“f-banks”) are the current standard input for acoustic models that exploit their input’s time-frequency geometry, because they provide a psycho-acoustically motivated time-frequency geometry for a speech signal. F-bank coefficients are provably robust to small deformations in the scale. In this paper, we explore two ways in which filter banks can be adjusted for the purposes of speech recognition. First, triangular filters can be replaced with Gabor filters, a compactly supported filter that better localizes events in time, or Gammatone filters, a psychoacoustically-motivated filter. Second, by rearranging the order of operations in computing filter bank features, features can be integrated over smaller time scales while simultaneously providing better frequency resolution. We make all feature implementations available online through open-source repositories. Initial experimentation with a modern end-to-end CNN phone recognizer yielded no significant improvements to phone error rate due to either modification. The result, and its ramifications with respect to learned filter banks, is discussed.
Tasks	Speech Recognition
Published	2019-01-01
URL	http://arxiv.org/abs/1901.00072v1
PDF	http://arxiv.org/pdf/1901.00072v1.pdf
PWC	https://paperswithcode.com/paper/exploring-spectro-temporal-features-in-end-to
Repo	https://github.com/sdrobert/more-or-let
Framework	tf

On Slicing Sorted Integer Sequences


Title	On Slicing Sorted Integer Sequences
Authors	Giulio Ermanno Pibiri
Abstract	Representing sorted integer sequences in small space is a central problem for large-scale retrieval systems such as Web search engines. Efficient query resolution, e.g., intersection or random access, is achieved by carefully partitioning the sequences. In this work we describe and compare two different partitioning paradigms: partitioning by cardinality and partitioning by universe. Although the ideas behind such paradigms have been known in the coding and algorithmic community since many years, inverted index compression has extensively adopted the former paradigm, whereas the latter has received only little attention. As a result, an experimental comparison between these two is missing for the setting of inverted index compression. We also propose and implement a solution that recursively slices the universe of representation of a sequence to achieve compact storage and attain to fast query execution. Albeit larger than some state-of-the-art representations, this slicing approach substantially improves the performance of list intersections and unions while operating in compressed space, thus offering an excellent space/time trade-off for the problem.
Tasks
Published	2019-07-01
URL	https://arxiv.org/abs/1907.01032v2
PDF	https://arxiv.org/pdf/1907.01032v2.pdf
PWC	https://paperswithcode.com/paper/on-slicing-sorted-integer-sequences
Repo	https://github.com/jermp/s_indexes
Framework	none

Warp and Learn: Novel Views Generation for Vehicles and Other Objects


Title	Warp and Learn: Novel Views Generation for Vehicles and Other Objects
Authors	Andrea Palazzi, Luca Bergamini, Simone Calderara, Rita Cucchiara
Abstract	In this work we introduce a new self-supervised, semi-parametric approach for synthesizing novel views of a vehicle starting from a single monocular image. Differently from parametric (i.e. entirely learning-based) methods, we show how a-priori geometric knowledge about the object and the 3D world can be successfully integrated into a deep learning based image generation framework. As this geometric component is not learnt, we call our approach semi-parametric. In particular, we exploit man-made object symmetry and piece-wise planarity to integrate rich a-priori visual information into the novel viewpoint synthesis process. An Image Completion Network (ICN) is then trained to generate a realistic image starting from this geometric guidance. This careful blend between parametric and non-parametric components allows us to i) operate in a real-world scenario, ii) preserve high-frequency visual information such as textures, iii) handle truly arbitrary 3D roto-translations of the input and iv) perform shape transfer to completely different 3D models. Eventually, we show that our approach can be easily complemented with synthetic data and extended to other rigid objects with completely different topology, even in presence of concave structures and holes (e.g. chairs). A comprehensive experimental analysis against state-of-the-art competitors shows the efficacy of our method both from a quantitative and a perceptive point of view. Supplementary material, animated results, code and data are available at: https://github.com/ndrplz/semiparametric
Tasks	3D Object Detection, Image Generation, Object Detection
Published	2019-07-24
URL	https://arxiv.org/abs/1907.10634v2
PDF	https://arxiv.org/pdf/1907.10634v2.pdf
PWC	https://paperswithcode.com/paper/semi-parametric-object-synthesis
Repo	https://github.com/ndrplz/semiparametric
Framework	pytorch

FFA-Net: Feature Fusion Attention Network for Single Image Dehazing


Title	FFA-Net: Feature Fusion Attention Network for Single Image Dehazing
Authors	Xu Qin, Zhilin Wang, Yuanchao Bai, Xiaodong Xie, Huizhu Jia
Abstract	In this paper, we propose an end-to-end feature fusion at-tention network (FFA-Net) to directly restore the haze-free image. The FFA-Net architecture consists of three key components: 1) A novel Feature Attention (FA) module combines Channel Attention with Pixel Attention mechanism, considering that different channel-wise features contain totally different weighted information and haze distribution is uneven on the different image pixels. FA treats different features and pixels unequally, which provides additional flexibility in dealing with different types of information, expanding the representational ability of CNNs. 2) A basic block structure consists of Local Residual Learning and Feature Attention, Local Residual Learning allowing the less important information such as thin haze region or low-frequency to be bypassed through multiple local residual connections, let main network architecture focus on more effective information. 3) An Attention-based different levels Feature Fusion (FFA) structure, the feature weights are adaptively learned from the Feature Attention (FA) module, giving more weight to important features. This structure can also retain the information of shallow layers and pass it into deep layers. The experimental results demonstrate that our proposed FFA-Net surpasses previous state-of-the-art single image dehazing methods by a very large margin both quantitatively and qualitatively, boosting the best published PSNR metric from 30.23db to 36.39db on the SOTS indoor test dataset. Code has been made available at GitHub.
Tasks	Image Dehazing, Single Image Dehazing
Published	2019-11-18
URL	https://arxiv.org/abs/1911.07559v2
PDF	https://arxiv.org/pdf/1911.07559v2.pdf
PWC	https://paperswithcode.com/paper/ffa-net-feature-fusion-attention-network-for
Repo	https://github.com/zhilin007/FFA-Net
Framework	pytorch

Transductive Zero-Shot Learning with Visual Structure Constraint


Title	Transductive Zero-Shot Learning with Visual Structure Constraint
Authors	Ziyu Wan, Dongdong Chen, Yan Li, Xingguang Yan, Junge Zhang, Yizhou Yu, Jing Liao
Abstract	To recognize objects of the unseen classes, most existing Zero-Shot Learning(ZSL) methods first learn a compatible projection function between the common semantic space and the visual space based on the data of source seen classes, then directly apply it to the target unseen classes. However, in real scenarios, the data distribution between the source and target domain might not match well, thus causing the well-known \textbf{domain shift} problem. Based on the observation that visual features of test instances can be separated into different clusters, we propose a new visual structure constraint on class centers for transductive ZSL, to improve the generality of the projection function (i.e. alleviate the above domain shift problem). Specifically, three different strategies (symmetric Chamfer-distance, Bipartite matching distance, and Wasserstein distance) are adopted to align the projected unseen semantic centers and visual cluster centers of test instances. We also propose a new training strategy to handle the real cases where many unrelated images exist in the test dataset, which is not considered in previous methods. Experiments on many widely used datasets demonstrate that the proposed visual structure constraint can bring substantial performance gain consistently and achieve state-of-the-art results. The source code is available at \url{https://github.com/raywzy/VSC}.
Tasks	Zero-Shot Learning
Published	2019-01-06
URL	https://arxiv.org/abs/1901.01570v2
PDF	https://arxiv.org/pdf/1901.01570v2.pdf
PWC	https://paperswithcode.com/paper/transductive-zero-shot-learning-with-visual
Repo	https://github.com/raywzy/VSC
Framework	pytorch

Generalisation dynamics of online learning in over-parameterised neural networks


Title	Generalisation dynamics of online learning in over-parameterised neural networks
Authors	Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová
Abstract	Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data. Here we study the generalisation dynamics of two-layer neural networks in a teacher-student setup, where one network, the student, is trained using stochastic gradient descent (SGD) on data generated by another network, called the teacher. We show how for this problem, the dynamics of SGD are captured by a set of differential equations. In particular, we demonstrate analytically that the generalisation error of the student increases linearly with the network size, with other relevant parameters held constant. Our results indicate that achieving good generalisation in neural networks depends on the interplay of at least the algorithm, its learning rate, the model architecture, and the data set.
Tasks
Published	2019-01-25
URL	http://arxiv.org/abs/1901.09085v1
PDF	http://arxiv.org/pdf/1901.09085v1.pdf
PWC	https://paperswithcode.com/paper/generalisation-dynamics-of-online-learning-in
Repo	https://github.com/sgoldt/pyscm
Framework	none

Multimodal 3D Object Detection from Simulated Pretraining


Title	Multimodal 3D Object Detection from Simulated Pretraining
Authors	Åsmund Brekke, Fredrik Vatsendvik, Frank Lindseth
Abstract	The need for simulated data in autonomous driving applications has become increasingly important, both for validation of pretrained models and for training new models. In order for these models to generalize to real-world applications, it is critical that the underlying dataset contains a variety of driving scenarios and that simulated sensor readings closely mimics real-world sensors. We present the Carla Automated Dataset Extraction Tool (CADET), a novel tool for generating training data from the CARLA simulator to be used in autonomous driving research. The tool is able to export high-quality, synchronized LIDAR and camera data with object annotations, and offers configuration to accurately reflect a real-life sensor array. Furthermore, we use this tool to generate a dataset consisting of 10 000 samples and use this dataset in order to train the 3D object detection network AVOD-FPN, with finetuning on the KITTI dataset in order to evaluate the potential for effective pretraining. We also present two novel LIDAR feature map configurations in Bird’s Eye View for use with AVOD-FPN that can be easily modified. These configurations are tested on the KITTI and CADET datasets in order to evaluate their performance as well as the usability of the simulated dataset for pretraining. Although insufficient to fully replace the use of real world data, and generally not able to exceed the performance of systems fully trained on real data, our results indicate that simulated data can considerably reduce the amount of training on real data required to achieve satisfactory levels of accuracy.
Tasks	3D Object Detection, Autonomous Driving, Object Detection
Published	2019-05-19
URL	https://arxiv.org/abs/1905.07754v1
PDF	https://arxiv.org/pdf/1905.07754v1.pdf
PWC	https://paperswithcode.com/paper/multimodal-3d-object-detection-from-simulated
Repo	https://github.com/Ozzyz/carla-data-export
Framework	none

Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction


Title	Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction
Authors	Jason Ku, Alex D. Pon, Steven L. Waslander
Abstract	We present MonoPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction. First, using the fundamental relations of a pinhole camera model, detections from a mature 2D object detector are used to generate a 3D proposal per object in a scene. The 3D location of these proposals prove to be quite accurate, which greatly reduces the difficulty of regressing the final 3D bounding box detection. Simultaneously, a point cloud is predicted in an object centered coordinate system to learn local scale and shape information. However, the key challenge is how to exploit shape information to guide 3D localization. As such, we devise aggregate losses, including a novel projection alignment loss, to jointly optimize these tasks in the neural network to improve 3D localization accuracy. We validate our method on the KITTI benchmark where we set new state-of-the-art results among published monocular methods, including the harder pedestrian and cyclist classes, while maintaining efficient run-time.
Tasks	3D Object Detection, Object Detection
Published	2019-04-02
URL	http://arxiv.org/abs/1904.01690v1
PDF	http://arxiv.org/pdf/1904.01690v1.pdf
PWC	https://paperswithcode.com/paper/monocular-3d-object-detection-leveraging
Repo	https://github.com/ZhixinLai/3D-detection-with-monocular-RGB-image
Framework	none

Scene Text Visual Question Answering


Title	Scene Text Visual Question Answering
Authors	Ali Furkan Biten, Ruben Tito, Andres Mafla, Lluis Gomez, Marçal Rusiñol, Ernest Valveny, C. V. Jawahar, Dimosthenis Karatzas
Abstract	Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.
Tasks	Question Answering, Visual Question Answering
Published	2019-05-31
URL	https://arxiv.org/abs/1905.13648v2
PDF	https://arxiv.org/pdf/1905.13648v2.pdf
PWC	https://paperswithcode.com/paper/scene-text-visual-question-answering
Repo	https://github.com/xinke-wang/Awesome-Text-VQA
Framework	none

Glyce: Glyph-vectors for Chinese Character Representations


Title	Glyce: Glyph-vectors for Chinese Character Representations
Authors	Yuxian Meng, Wei Wu, Fei Wang, Xiaoya Li, Ping Nie, Fan Yin, Muyu Li, Qinghong Han, Xiaofei Sun, Jiwei Li
Abstract	It is intuitive that NLP tasks for logographic languages like Chinese should benefit from the use of the glyph information in those languages. However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found. In this paper, we address this gap by presenting Glyce, the glyph-vectors for Chinese character representations. We make three major innovations: (1) We use historical Chinese scripts (e.g., bronzeware script, seal script, traditional Chinese, etc) to enrich the pictographic evidence in characters; (2) We design CNN structures (called tianzege-CNN) tailored to Chinese character image processing; and (3) We use image-classification as an auxiliary task in a multi-task learning setup to increase the model’s ability to generalize. We show that glyph-based models are able to consistently outperform word/char ID-based models in a wide range of Chinese NLP tasks. We are able to set new state-of-the-art results for a variety of Chinese NLP tasks, including tagging (NER, CWS, POS), sentence pair classification, single sentence classification tasks, dependency parsing, and semantic role labeling. For example, the proposed model achieves an F1 score of 80.6 on the OntoNotes dataset of NER, +1.5 over BERT; it achieves an almost perfect accuracy of 99.8% on the Fudan corpus for text classification. Code found at https://github.com/ShannonAI/glyce.
Tasks	Chinese Word Segmentation, Dependency Parsing, Document Classification, Image Classification, Language Modelling, Machine Translation, Multi-Task Learning, Part-Of-Speech Tagging, Semantic Role Labeling, Semantic Textual Similarity, Sentence Classification, Sentiment Analysis, Text Classification
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10125v4
PDF	https://arxiv.org/pdf/1901.10125v4.pdf
PWC	https://paperswithcode.com/paper/glyce-glyph-vectors-for-chinese-character
Repo	https://github.com/ShannonAI/glyce
Framework	pytorch

Orthogonal Deep Neural Networks


Title	Orthogonal Deep Neural Networks
Authors	Kui Jia, Shuai Li, Yuxin Wen, Tongliang Liu, Dacheng Tao
Abstract	In this paper, we introduce the algorithms of Orthogonal Deep Neural Networks (OrthDNNs) to connect with recent interest of spectrally regularized deep learning methods. OrthDNNs are theoretically motivated by generalization analysis of modern DNNs, with the aim to find solution properties of network weights that guarantee better generalization. To this end, we first prove that DNNs are of local isometry on data distributions of practical interest; by using a new covering of the sample space and introducing the local isometry property of DNNs into generalization analysis, we establish a new generalization error bound that is both scale- and range-sensitive to singular value spectrum of each of networks’ weight matrices. We prove that the optimal bound w.r.t. the degree of isometry is attained when each weight matrix has a spectrum of equal singular values, among which orthogonal weight matrix or a non-square one with orthonormal rows or columns is the most straightforward choice, suggesting the algorithms of OrthDNNs. We present both algorithms of strict and approximate OrthDNNs, and for the later ones we propose a simple yet effective algorithm called Singular Value Bounding (SVB), which performs as well as strict OrthDNNs, but at a much lower computational cost. We also propose Bounded Batch Normalization (BBN) to make compatible use of batch normalization with OrthDNNs. We conduct extensive comparative studies by using modern architectures on benchmark image classification. Experiments show the efficacy of OrthDNNs.
Tasks	Image Classification
Published	2019-05-15
URL	https://arxiv.org/abs/1905.05929v2
PDF	https://arxiv.org/pdf/1905.05929v2.pdf
PWC	https://paperswithcode.com/paper/orthogonal-deep-neural-networks
Repo	https://github.com/Yuxin-Wen/OrthDNNs
Framework	pytorch

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning


Title	CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning
Authors	Daojian Zeng, Haoran Zhang, Qianying Liu
Abstract	Joint extraction of entities and relations has received significant attention due to its potential of providing higher performance for both tasks. Among existing methods, CopyRE is effective and novel, which uses a sequence-to-sequence framework and copy mechanism to directly generate the relation triplets. However, it suffers from two fatal problems. The model is extremely weak at differing the head and tail entity, resulting in inaccurate entity extraction. It also cannot predict multi-token entities (e.g. \textit{Steven Jobs}). To address these problems, we give a detailed analysis of the reasons behind the inaccurate entity extraction problem, and then propose a simple but extremely effective model structure to solve this problem. In addition, we propose a multi-task learning framework equipped with copy mechanism, called CopyMTL, to allow the model to predict multi-token entities. Experiments reveal the problems of CopyRE and show that our model achieves significant improvement over the current state-of-the-art method by 9% in NYT and 16% in WebNLG (F1 score). Our code is available at https://github.com/WindChimeRan/CopyMTL
Tasks	Entity Extraction, Multi-Task Learning, Relation Extraction
Published	2019-11-24
URL	https://arxiv.org/abs/1911.10438v1
PDF	https://arxiv.org/pdf/1911.10438v1.pdf
PWC	https://paperswithcode.com/paper/copymtl-copy-mechanism-for-joint-extraction
Repo	https://github.com/WindChimeRan/CopyMTL
Framework	pytorch

Pykg2vec: A Python Library for Knowledge Graph Embedding


Title	Pykg2vec: A Python Library for Knowledge Graph Embedding
Authors	Shih Yuan Yu, Sujit Rokka Chhetri, Arquimedes Canedo, Palash Goyal, Mohammad Abdullah Al Faruque
Abstract	Python library for knowledge graph embedding and representation learning.
Tasks	Graph Embedding, Knowledge Graph Embedding, Representation Learning
Published	2019-06-04
URL	https://arxiv.org/abs/1906.04239v1
PDF	https://arxiv.org/pdf/1906.04239v1.pdf
PWC	https://paperswithcode.com/paper/pykg2vec-a-python-library-for-knowledge-graph
Repo	https://github.com/Sujit-O/pykg2vec
Framework	tf

Self-training with Noisy Student improves ImageNet classification


Title	Self-training with Noisy Student improves ImageNet classification
Authors	Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le
Abstract	We present a simple self-training method that achieves 88.4% top-1 accuracy on ImageNet, which is 2.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. On robustness test sets, it improves ImageNet-A top-1 accuracy from 61.0% to 83.7%, reduces ImageNet-C mean corruption error from 45.7 to 28.3, and reduces ImageNet-P mean flip rate from 27.8 to 12.2. To achieve this result, we first train an EfficientNet model on labeled ImageNet images and use it as a teacher to generate pseudo labels on 300M unlabeled images. We then train a larger EfficientNet as a student model on the combination of labeled and pseudo labeled images. We iterate this process by putting back the student as the teacher. During the generation of the pseudo labels, the teacher is not noised so that the pseudo labels are as accurate as possible. However, during the learning of the student, we inject noise such as dropout, stochastic depth and data augmentation via RandAugment to the student so that the student generalizes better than the teacher.
Tasks	Data Augmentation, Image Classification
Published	2019-11-11
URL	https://arxiv.org/abs/1911.04252v2
PDF	https://arxiv.org/pdf/1911.04252v2.pdf
PWC	https://paperswithcode.com/paper/self-training-with-noisy-student-improves
Repo	https://github.com/adventure2165/Summarization_self-training_with_noisy_student_improves_imagenet_classification
Framework	none

One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation


Title	One-pass Multi-task Networks with Cross-task Guided Attention for Brain Tumor Segmentation
Authors	Chenhong Zhou, Changxing Ding, Xinchao Wang, Zhentai Lu, Dacheng Tao
Abstract	Class imbalance has emerged as one of the major challenges for medical image segmentation. The model cascade (MC) strategy significantly alleviates the class imbalance issue via running a set of individual deep models for coarse-to-fine segmentation. Despite its outstanding performance, however, this method leads to undesired system complexity and also ignores the correlation among the models. To handle these flaws, we propose a light-weight deep model, i.e., the One-pass Multi-task Network (OM-Net) to solve class imbalance better than MC does, while requiring only one-pass computation. First, OM-Net integrates the separate segmentation tasks into one deep model, which consists of shared parameters to learn joint features, as well as task-specific parameters to learn discriminative features. Second, to more effectively optimize OM-Net, we take advantage of the correlation among tasks to design both an online training data transfer strategy and a curriculum learning-based training strategy. Third, we further propose sharing prediction results between tasks and design a cross-task guided attention (CGA) module which can adaptively recalibrate channel-wise feature responses based on the category-specific statistics. Finally, a simple yet effective post-processing method is introduced to refine the segmentation results. Extensive experiments are conducted to demonstrate the effectiveness of the proposed techniques. Most impressively, we achieve state-of-the-art performance on the BraTS 2015 testing set and BraTS 2017 online validation set. Using these proposed approaches, we also won joint third place in the BraTS 2018 challenge among 64 participating teams. The code is publicly available at https://github.com/chenhong-zhou/OM-Net.
Tasks	Brain Tumor Segmentation, Medical Image Segmentation, Semantic Segmentation
Published	2019-06-05
URL	https://arxiv.org/abs/1906.01796v2
PDF	https://arxiv.org/pdf/1906.01796v2.pdf
PWC	https://paperswithcode.com/paper/one-pass-multi-task-networks-with-cross-task
Repo	https://github.com/chenhong-zhou/OM-Net
Framework	none