February 1, 2020

3404 words 16 mins read

Paper Group AWR 132

Performance and Comparisons of STDP based and Non-STDP based Memristive Neural Networks on Hardware. ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT). A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models. Kandinsky Patterns. Reinforcement Learning Upside Down: Don’t Pre …

Performance and Comparisons of STDP based and Non-STDP based Memristive Neural Networks on Hardware


Title	Performance and Comparisons of STDP based and Non-STDP based Memristive Neural Networks on Hardware
Authors	Zhiri Tang
Abstract	With the development of research on memristor, memristive neural networks (MNNs) have become a hot research topic recently. Because memristor can mimic the spike timing-dependent plasticity (STDP), the research on STDP based MNNs is rapidly increasing. However, although state-of-the-art works on STDP based MNNs have many applications such as pattern recognition, STDP mechanism brings relatively complex hardware framework and low processing speed, which block MNNs’ hardware realization. A non-STDP based unsupervised MNN is constructed in this paper. Through the comparison with STDP method on the basis of two common structures including feedforward and crossbar, non-STDP based MNNs not only remain the same advantages as STDP based MNNs including high accuracy and convergence speed in pattern recognition, but also better hardware performance as few hardware resources and higher processing speed. By virtue of the combination of memristive character and simple mechanism, non-STDP based MNNs have better hardware compatibility, which may give a new viewpoint for memristive neural networks’ engineering applications.
Tasks
Published	2019-07-22
URL	https://arxiv.org/abs/1907.09126v4
PDF	https://arxiv.org/pdf/1907.09126v4.pdf
PWC	https://paperswithcode.com/paper/non-stdp-based-unsupervised-memristive-neural
Repo	https://github.com/GerinTang/InnovateFPGA2018_PR039
Framework	none

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)


Title	ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)
Authors	Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
Abstract	This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants methods. The dataset, the evaluation kit as well as the results are publicly available at https://rrc.cvc.uab.es/?ch=14
Tasks	Scene Text Detection, Scene Text Recognition, Text Spotting
Published	2019-09-16
URL	https://arxiv.org/abs/1909.07145v1
PDF	https://arxiv.org/pdf/1909.07145v1.pdf
PWC	https://paperswithcode.com/paper/icdar2019-robust-reading-challenge-on
Repo	https://github.com/cs-chan/Total-Text-Dataset
Framework	none

A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models


Title	A Good Sample is Hard to Find: Noise Injection Sampling and Self-Training for Neural Language Generation Models
Authors	Chris Kedzie, Kathleen McKeown
Abstract	Deep neural networks (DNN) are quickly becoming the de facto standard modeling method for many natural language generation (NLG) tasks. In order for such models to truly be useful, they must be capable of correctly generating utterances for novel meaning representations (MRs) at test time. In practice, even sophisticated DNNs with various forms of semantic control frequently fail to generate utterances faithful to the input MR. In this paper, we propose an architecture agnostic self-training method to sample novel MR/text utterance pairs to augment the original training data. Remarkably, after training on the augmented data, even simple encoder-decoder models with greedy decoding are capable of generating semantically correct utterances that are as good as state-of-the-art outputs in both automatic and human evaluations of quality.
Tasks	Text Generation
Published	2019-11-08
URL	https://arxiv.org/abs/1911.03373v1
PDF	https://arxiv.org/pdf/1911.03373v1.pdf
PWC	https://paperswithcode.com/paper/a-good-sample-is-hard-to-find-noise-injection
Repo	https://github.com/kedz/noiseylg
Framework	pytorch

Kandinsky Patterns


Title	Kandinsky Patterns
Authors	Heimo Mueller, Andreas Holzinger
Abstract	Kandinsky Figures and Kandinsky Patterns are mathematically describable, simple self-contained hence controllable test data sets for the development, validation and training of explainability in artificial intelligence. Whilst Kandinsky Patterns have these computationally manageable properties, they are at the same time easily distinguishable from human observers. Consequently, controlled patterns can be described by both humans and computers. We define a Kandinsky Pattern as a set of Kandinsky Figures, where for each figure an “infallible authority” defines that the figure belongs to the Kandinsky Pattern. With this simple principle we build training and validation data sets for automatic interpretability and context learning. In this paper we describe the basic idea and some underlying principles of Kandinsky Patterns and provide a Github repository to invite the international machine learning research community to a challenge to experiment with our Kandinsky Patterns to expand and thus make progress in the field of explainable AI and to contribute to the upcoming field of explainability and causability.
Tasks
Published	2019-06-03
URL	https://arxiv.org/abs/1906.00657v1
PDF	https://arxiv.org/pdf/1906.00657v1.pdf
PWC	https://paperswithcode.com/paper/190600657
Repo	https://github.com/human-centered-ai-lab/dat-kandinsky-patterns
Framework	none

Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions


Title	Reinforcement Learning Upside Down: Don’t Predict Rewards – Just Map Them to Actions
Authors	Juergen Schmidhuber
Abstract	We transform reinforcement learning (RL) into a form of supervised learning (SL) by turning traditional RL on its head, calling this Upside Down RL (UDRL). Standard RL predicts rewards, while UDRL instead uses rewards as task-defining inputs, together with representations of time horizons and other computable functions of historic and desired future data. UDRL learns to interpret these input observations as commands, mapping them to actions (or action probabilities) through SL on past (possibly accidental) experience. UDRL generalizes to achieve high rewards or other goals, through input commands such as: get lots of reward within at most so much time! A separate paper [61] on first experiments with UDRL shows that even a pilot version of UDRL can outperform traditional baseline algorithms on certain challenging RL problems. We also introduce a related simple but general approach for teaching a robot to imitate humans. First videotape humans imitating the robot’s current behaviors, then let the robot learn through SL to map the videos (as input commands) to these behaviors, then let it generalize and imitate videos of humans executing previously unknown behavior. This Imitate-Imitator concept may actually explain why biological evolution has resulted in parents who imitate the babbling of their babies.
Tasks
Published	2019-12-05
URL	https://arxiv.org/abs/1912.02875v1
PDF	https://arxiv.org/pdf/1912.02875v1.pdf
PWC	https://paperswithcode.com/paper/reinforcement-learning-upside-down-dont
Repo	https://github.com/haron1100/Upside-Down-Reinforcement-Learning
Framework	pytorch

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models


Title	Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Authors	Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova
Abstract	Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training. Due to the cost of applying such models to down-stream tasks, several model compression techniques on pre-trained language representations have been proposed (Sun et al., 2019; Sanh, 2019). However, surprisingly, the simple baseline of just pre-training and fine-tuning compact models has been overlooked. In this paper, we first show that pre-training remains important in the context of smaller architectures, and fine-tuning pre-trained compact models can be competitive to more elaborate methods proposed in concurrent work. Starting with pre-trained compact models, we then explore transferring task knowledge from large fine-tuned models through standard knowledge distillation. The resulting simple, yet effective and general algorithm, Pre-trained Distillation, brings further improvements. Through extensive experiments, we more generally explore the interaction between pre-training and distillation under two variables that have been under-studied: model size and properties of unlabeled task data. One surprising observation is that they have a compound effect even when sequentially applied on the same data. To accelerate future research, we will make our 24 pre-trained miniature BERT models publicly available.
Tasks	Language Modelling, Model Compression, Sentiment Analysis
Published	2019-08-23
URL	https://arxiv.org/abs/1908.08962v2
PDF	https://arxiv.org/pdf/1908.08962v2.pdf
PWC	https://paperswithcode.com/paper/well-read-students-learn-better-the-impact-of
Repo	https://github.com/google-research/bert
Framework	tf

Deep Hough Voting for 3D Object Detection in Point Clouds


Title	Deep Hough Voting for 3D Object Detection in Point Clouds
Authors	Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas
Abstract	Current 3D object detection methods are heavily influenced by 2D detectors. In order to leverage architectures in 2D detectors, they often convert 3D point clouds to regular grids (i.e., to voxel grids or to bird’s eye view images), or rely on detection in 2D images to propose 3D boxes. Few works have attempted to directly detect objects in point clouds. In this work, we return to first principles to construct a 3D detection pipeline for point cloud data and as generic as possible. However, due to the sparse nature of the data – samples from 2D manifolds in 3D space – we face a major challenge when directly predicting bounding box parameters from scene points: a 3D object centroid can be far from any surface point thus hard to regress accurately in one step. To address the challenge, we propose VoteNet, an end-to-end 3D object detection network based on a synergy of deep point set networks and Hough voting. Our model achieves state-of-the-art 3D detection on two large datasets of real 3D scans, ScanNet and SUN RGB-D with a simple design, compact model size and high efficiency. Remarkably, VoteNet outperforms previous methods by using purely geometric information without relying on color images.
Tasks	3D Object Detection, Object Detection
Published	2019-04-21
URL	https://arxiv.org/abs/1904.09664v2
PDF	https://arxiv.org/pdf/1904.09664v2.pdf
PWC	https://paperswithcode.com/paper/deep-hough-voting-for-3d-object-detection-in
Repo	https://github.com/qq456cvb/VoteNet
Framework	tf

Object landmark discovery through unsupervised adaptation


Title	Object landmark discovery through unsupervised adaptation
Authors	Enrique Sanchez, Georgios Tzimiropoulos
Abstract	This paper proposes a method to ease the unsupervised learning of object landmark detectors. Similarly to previous methods, our approach is fully unsupervised in a sense that it does not require or make any use of annotated landmarks for the target object category. Contrary to previous works, we do however assume that a landmark detector, which has already learned a structured representation for a given object category in a fully supervised manner, is available. Under this setting, our main idea boils down to adapting the given pre-trained network to the target object categories in a fully unsupervised manner. To this end, our method uses the pre-trained network as a core which remains frozen and does not get updated during training, and learns, in an unsupervised manner, only a projection matrix to perform the adaptation to the target categories. By building upon an existing structured representation learned in a supervised manner, the optimization problem solved by our method is much more constrained with significantly less parameters to learn which seems to be important for the case of unsupervised learning. We show that our method surpasses fully unsupervised techniques trained from scratch as well as a strong baseline based on fine-tuning, and produces state-of-the-art results on several datasets. Code can be found at https://github.com/ESanchezLozano/SAIC-Unsupervised-landmark-detection-NeurIPS2019 .
Tasks
Published	2019-10-21
URL	https://arxiv.org/abs/1910.09469v1
PDF	https://arxiv.org/pdf/1910.09469v1.pdf
PWC	https://paperswithcode.com/paper/object-landmark-discovery-through
Repo	https://github.com/ESanchezLozano/SAIC-Unsupervised-landmark-detection-NeurIPS2019
Framework	pytorch


Title	Gradio: Hassle-Free Sharing and Testing of ML Models in the Wild
Authors	Abubakar Abid, Ali Abdalla, Ali Abid, Dawood Khan, Abdulrahman Alfozan, James Zou
Abstract	Accessibility is a major challenge of machine learning (ML). Typical ML models are built by specialists and require specialized hardware/software as well as ML experience to validate. This makes it challenging for non-technical collaborators and endpoint users (e.g. physicians) to easily provide feedback on model development and to gain trust in ML. The accessibility challenge also makes collaboration more difficult and limits the ML researcher’s exposure to realistic data and scenarios that occur in the wild. To improve accessibility and facilitate collaboration, we developed an open-source Python package, Gradio, which allows researchers to rapidly generate a visual interface for their ML models. Gradio makes accessing any ML model as easy as sharing a URL. Our development of Gradio is informed by interviews with a number of machine learning researchers who participate in interdisciplinary collaborations. Their feedback identified that Gradio should support a variety of interfaces and frameworks, allow for easy sharing of the interface, allow for input manipulation and interactive inference by the domain expert, as well as allow embedding the interface in iPython notebooks. We developed these features and carried out a case study to understand Gradio’s usefulness and usability in the setting of a machine learning collaboration between a researcher and a cardiologist.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02569v1
PDF	https://arxiv.org/pdf/1906.02569v1.pdf
PWC	https://paperswithcode.com/paper/gradio-hassle-free-sharing-and-testing-of-ml
Repo	https://github.com/gradio-app/gradio-UI
Framework	tf

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition


Title	Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition
Authors	Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
Abstract	Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specifically, TASN consists of 1) a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, 2) an attention-based sampler which highlights attended parts with high resolution, and 3) a feature distiller, which distills part features into a global one by weight sharing and feature preserving strategies. Extensive experiments verify that TASN yields the best performance under the same settings with the most competitive approaches, in iNaturalist-2017, CUB-Bird, and Stanford-Cars datasets.
Tasks	Fine-Grained Image Classification, Fine-Grained Image Recognition
Published	2019-03-14
URL	https://arxiv.org/abs/1903.06150v2
PDF	https://arxiv.org/pdf/1903.06150v2.pdf
PWC	https://paperswithcode.com/paper/looking-for-the-devil-in-the-details-learning
Repo	https://github.com/researchmm/tasn
Framework	mxnet

Low-Rank Tucker Approximation of a Tensor From Streaming Data


Title	Low-Rank Tucker Approximation of a Tensor From Streaming Data
Authors	Yiming Sun, Yang Guo, Charlene Luo, Joel Tropp, Madeleine Udell
Abstract	This paper describes a new algorithm for computing a low-Tucker-rank approximation of a tensor. The method applies a randomized linear map to the tensor to obtain a sketch that captures the important directions within each mode, as well as the interactions among the modes. The sketch can be extracted from streaming or distributed data or with a single pass over the tensor, and it uses storage proportional to the degrees of freedom in the output Tucker approximation. The algorithm does not require a second pass over the tensor, although it can exploit another view to compute a superior approximation. The paper provides a rigorous theoretical guarantee on the approximation error. Extensive numerical experiments show that that the algorithm produces useful results that improve on the state of the art for streaming Tucker decomposition.
Tasks
Published	2019-04-24
URL	http://arxiv.org/abs/1904.10951v1
PDF	http://arxiv.org/pdf/1904.10951v1.pdf
PWC	https://paperswithcode.com/paper/low-rank-tucker-approximation-of-a-tensor
Repo	https://github.com/udellgroup/tensorsketch
Framework	tf

Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity


Title	Hyperparameter-Free Out-of-Distribution Detection Using Softmax of Scaled Cosine Similarity
Authors	Engkarat Techapanurak, Masanori Suganuma, Takayuki Okatani
Abstract	The ability to detect out-of-distribution (OOD) samples is vital to secure the reliability of deep neural networks in real-world applications. Considering the nature of OOD samples, detection methods should not have hyperparameters that need to be tuned depending on incoming OOD samples. However, most of the recently proposed methods do not meet this requirement, leading to compromised performance in real-world applications. In this paper, we propose a simple, hyperparameter-free method based on softmax of scaled cosine similarity. It resembles the approach employed by modern metric learning methods, but it differs in details; the differences are essential to achieve high detection performance. We show through experiments that our method outperforms the existing methods on the evaluation test recently proposed by Shafaei et al., which takes the above issue of hyperparameter dependency into account. We also show that it achieves at least comparable performance to other methods on the conventional test, where their hyperparameters are chosen using explicit OOD samples. Furthermore, it is computationally more efficient than most of the previous methods, since it needs only a single forward pass.
Tasks	Metric Learning, Out-of-Distribution Detection
Published	2019-05-25
URL	https://arxiv.org/abs/1905.10628v3
PDF	https://arxiv.org/pdf/1905.10628v3.pdf
PWC	https://paperswithcode.com/paper/hyperparameter-free-out-of-distribution
Repo	https://github.com/engkarat/cosine-ood-detector
Framework	pytorch

Deep Metric Learning Beyond Binary Supervision


Title	Deep Metric Learning Beyond Binary Supervision
Authors	Sungyeon Kim, Minkyo Seo, Ivan Laptev, Minsu Cho, Suha Kwak
Abstract	Metric Learning for visual similarity has mostly adopted binary supervision indicating whether a pair of images are of the same class or not. Such a binary indicator covers only a limited subset of image relations, and is not sufficient to represent semantic similarity between images described by continuous and/or structured labels such as object poses, image captions, and scene graphs. Motivated by this, we present a novel method for deep metric learning using continuous labels. First, we propose a new triplet loss that allows distance ratios in the label space to be preserved in the learned metric space. The proposed loss thus enables our model to learn the degree of similarity rather than just the order. Furthermore, we design a triplet mining strategy adapted to metric learning with continuous labels. We address three different image retrieval tasks with continuous labels in terms of human poses, room layouts and image captions, and demonstrate the superior performance of our approach compared to previous methods.
Tasks	Image Captioning, Image Retrieval, Metric Learning, Semantic Similarity, Semantic Textual Similarity
Published	2019-04-21
URL	http://arxiv.org/abs/1904.09626v1
PDF	http://arxiv.org/pdf/1904.09626v1.pdf
PWC	https://paperswithcode.com/paper/deep-metric-learning-beyond-binary
Repo	https://github.com/tjddus9597/Beyond-Binary-Supervision-CVPR19
Framework	pytorch

Gliding vertex on the horizontal bounding box for multi-oriented object detection


Title	Gliding vertex on the horizontal bounding box for multi-oriented object detection
Authors	Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, Xiang Bai
Abstract	Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of fast R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.
Tasks	Object Detection, Object Detection In Aerial Images, Pedestrian Detection, Scene Text Detection
Published	2019-11-21
URL	https://arxiv.org/abs/1911.09358v1
PDF	https://arxiv.org/pdf/1911.09358v1.pdf
PWC	https://paperswithcode.com/paper/gliding-vertex-on-the-horizontal-bounding-box
Repo	https://github.com/xuannianz/EfficientDet
Framework	tf

SuSi: Supervised Self-Organizing Maps for Regression and Classification in Python


Title	SuSi: Supervised Self-Organizing Maps for Regression and Classification in Python
Authors	Felix M. Riese, Sina Keller
Abstract	In many research fields, the sizes of the existing datasets vary widely. Hence, there is a need for machine learning techniques which are well-suited for these different datasets. One possible technique is the self-organizing map (SOM), a type of artificial neural network which is, so far, weakly represented in the field of machine learning. The SOM’s unique characteristic is the neighborhood relationship of the output neurons. This relationship improves the ability of generalization on small datasets. SOMs are mostly applied in unsupervised learning and few studies focus on using SOMs as supervised learning approach. Furthermore, no appropriate SOM package is available with respect to machine learning standards and in the widely used programming language Python. In this paper, we introduce the freely available Supervised Self-organizing maps (SuSi) Python package which performs supervised regression and classification. The implementation of SuSi is described with respect to the underlying mathematics. Then, we present first evaluations of the SOM for regression and classification datasets from two different domains of geospatial image analysis. Despite the early stage of its development, the SuSi framework performs well and is characterized by only small performance differences between the training and the test datasets. A comparison of the SuSi framework with existing Python and R packages demonstrates the importance of the SuSi framework. In future work, the SuSi framework will be extended, optimized and upgraded e.g. with tools to better understand and visualize the input data as well as the handling of missing and incomplete data.
Tasks
Published	2019-03-26
URL	https://arxiv.org/abs/1903.11114v3
PDF	https://arxiv.org/pdf/1903.11114v3.pdf
PWC	https://paperswithcode.com/paper/susi-supervised-self-organizing-maps-for
Repo	https://github.com/JustGlowing/minisom
Framework	none