October 20, 2019

3316 words 16 mins read

Paper Group AWR 281

Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators. Stance Prediction for Russian: Data and Analysis. Deep Models of Interactions Across Sets. Building Sequential Inference Models for End-to-End Response Selection. End-to-End Learning of Motion Representation for Video Understanding. Joint Optic D …

Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators


Title	Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators
Authors	Haiyue Song, Chengwen Xu, Qiang Xu, Zhuoran Song, Naifeng Jing, Xiaoyao Liang, Li Jiang
Abstract	Neural approximate computing gains enormous energy-efficiency at the cost of tolerable quality-loss. A neural approximator can map the input data to output while a classifier determines whether the input data are safe to approximate with quality guarantee. However, existing works cannot maximize the invocation of the approximator, resulting in limited speedup and energy saving. By exploring the mapping space of those target functions, in this paper, we observe a nonuniform distribution of the approximation error incurred by the same approximator. We thus propose a novel approximate computing architecture with a Multiclass-Classifier and Multiple Approximators (MCMA). These approximators have identical network topologies and thus can share the same hardware resource in a neural processing unit(NPU) clip. In the runtime, MCMA can swap in the invoked approximator by merely shipping the synapse weights from the on-chip memory to the buffers near MAC within a cycle. We also propose efficient co-training methods for such MCMA architecture. Experimental results show a more substantial invocation of MCMA as well as the gain of energy-efficiency.
Tasks
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08379v1
PDF	http://arxiv.org/pdf/1810.08379v1.pdf
PWC	https://paperswithcode.com/paper/invocation-driven-neural-approximate
Repo	https://github.com/shyyhs/MCMA
Framework	none

Stance Prediction for Russian: Data and Analysis


Title	Stance Prediction for Russian: Data and Analysis
Authors	Nikita Lozhnikov, Leon Derczynski, Manuel Mazzara
Abstract	Stance detection is a critical component of rumour and fake news identification. It involves the extraction of the stance a particular author takes related to a given claim, both expressed in text. This paper investigates stance classification for Russian. It introduces a new dataset, RuStance, of Russian tweets and news comments from multiple sources, covering multiple stories, as well as text classification approaches to stance detection as benchmarks over this data in this language. As well as presenting this openly-available dataset, the first of its kind for Russian, the paper presents a baseline for stance prediction in the language.
Tasks	Stance Detection
Published	2018-09-05
URL	http://arxiv.org/abs/1809.01574v2
PDF	http://arxiv.org/pdf/1809.01574v2.pdf
PWC	https://paperswithcode.com/paper/stance-prediction-for-russian-data-and
Repo	https://github.com/npenzin/rustance
Framework	none

Deep Models of Interactions Across Sets


Title	Deep Models of Interactions Across Sets
Authors	Jason Hartford, Devon R Graham, Kevin Leyton-Brown, Siamak Ravanbakhsh
Abstract	We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings, protein-drug bindings, or ternary user-item-tag interactions. The canonical representation of such interactions is a matrix (or a higher-dimensional tensor) with an exchangeability property: the encoding’s meaning is not changed by permuting rows or columns. We argue that models should hence be Permutation Equivariant (PE): constrained to make the same predictions across such permutations. We present a parameter-sharing scheme and prove that it could not be made any more expressive without violating PE. This scheme yields three benefits. First, we demonstrate state-of-the-art performance on multiple matrix completion benchmarks. Second, our models require a number of parameters independent of the numbers of objects, and thus scale well to large datasets. Third, models can be queried about new objects that were not available at training time, but for which interactions have since been observed. In experiments, our models achieved surprisingly good generalization performance on this matrix extrapolation task, both within domains (e.g., new users and new movies drawn from the same distribution used for training) and even across domains (e.g., predicting music ratings after training on movies).
Tasks	Matrix Completion, Recommendation Systems
Published	2018-03-07
URL	http://arxiv.org/abs/1803.02879v2
PDF	http://arxiv.org/pdf/1803.02879v2.pdf
PWC	https://paperswithcode.com/paper/deep-models-of-interactions-across-sets
Repo	https://github.com/mravanba/deep_exchangeable_tensors
Framework	tf

Building Sequential Inference Models for End-to-End Response Selection


Title	Building Sequential Inference Models for End-to-End Response Selection
Authors	Jia-Chen Gu, Zhen-Hua Ling, Yu-Ping Ruan, Quan Liu
Abstract	This paper presents an end-to-end response selection model for Track 1 of the 7th Dialogue System Technology Challenges (DSTC7). This task focuses on selecting the correct next utterance from a set of candidates given a partial conversation. We propose an end-to-end neural network based on enhanced sequential inference model (ESIM) for this task. Our proposed model differs from the original ESIM model in the following four aspects. First, a new word representation method which combines the general pre-trained word embeddings with those estimated on the task-specific training set is adopted in order to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE) is designed which is capable to encode sentences hierarchically and generate more descriptive representations by aggregation. Third, a new pooling method which combines multi-dimensional pooling and last-state pooling is used instead of the simple combination of max pooling and average pooling in the original ESIM. Last, a modification layer is added before the softmax layer to emphasize the importance of the last utterance in the context for response selection. In the released evaluation results of DSTC7, our proposed method ranked second on the Ubuntu dataset and third on the Advising dataset in subtask 1 of Track 1.
Tasks	Conversational Response Selection, Word Embeddings
Published	2018-12-03
URL	http://arxiv.org/abs/1812.00686v1
PDF	http://arxiv.org/pdf/1812.00686v1.pdf
PWC	https://paperswithcode.com/paper/building-sequential-inference-models-for-end
Repo	https://github.com/JasonForJoy/DSTC7-ResponseSelection
Framework	tf

End-to-End Learning of Motion Representation for Video Understanding


Title	End-to-End Learning of Motion Representation for Video Understanding
Authors	Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang
Abstract	Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond exact optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach. Our TVNet achieves better accuracies than all compared methods, while being competitive with the fastest counterpart in terms of features extraction time.
Tasks	Action Recognition In Videos, Optical Flow Estimation, Video Understanding
Published	2018-04-02
URL	http://arxiv.org/abs/1804.00413v1
PDF	http://arxiv.org/pdf/1804.00413v1.pdf
PWC	https://paperswithcode.com/paper/end-to-end-learning-of-motion-representation
Repo	https://github.com/LijieFan/tvnet
Framework	tf

Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation


Title	Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation
Authors	Huazhu Fu, Jun Cheng, Yanwu Xu, Damon Wing Kee Wong, Jiang Liu, Xiaochun Cao
Abstract	Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA dataset. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets.
Tasks
Published	2018-01-03
URL	http://arxiv.org/abs/1801.00926v3
PDF	http://arxiv.org/pdf/1801.00926v3.pdf
PWC	https://paperswithcode.com/paper/joint-optic-disc-and-cup-segmentation-based
Repo	https://github.com/HzFu/DENet_GlaucomaScreen
Framework	tf

Rethinking Knowledge Graph Propagation for Zero-Shot Learning


Title	Rethinking Knowledge Graph Propagation for Zero-Shot Learning
Authors	Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, Eric P. Xing
Abstract	Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node’s relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches.
Tasks	Zero-Shot Learning
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11724v3
PDF	http://arxiv.org/pdf/1805.11724v3.pdf
PWC	https://paperswithcode.com/paper/rethinking-knowledge-graph-propagation-for
Repo	https://github.com/cyvius96/DGP
Framework	pytorch

Parallel Grid Pooling for Data Augmentation


Title	Parallel Grid Pooling for Data Augmentation
Authors	Akito Takeki, Daiki Ikami, Go Irie, Kiyoharu Aizawa
Abstract	Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://github.com/akitotakeki
Tasks	Data Augmentation, Image Augmentation, Image Classification
Published	2018-03-30
URL	http://arxiv.org/abs/1803.11370v1
PDF	http://arxiv.org/pdf/1803.11370v1.pdf
PWC	https://paperswithcode.com/paper/parallel-grid-pooling-for-data-augmentation
Repo	https://github.com/akitotakeki/pgp-chainer
Framework	none

Joint Slot Filling and Intent Detection via Capsule Neural Networks


Title	Joint Slot Filling and Intent Detection via Capsule Neural Networks
Authors	Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu
Abstract	Being able to recognize words as slots and detect the intent of an utterance has been a keen issue in natural language understanding. The existing works either treat slot filling and intent detection separately in a pipeline manner, or adopt joint models which sequentially label slots while summarizing the utterance-level intent without explicitly preserving the hierarchical relationship among words, slots, and intents. To exploit the semantic hierarchy for effective modeling, we propose a capsule-based neural network model which accomplishes slot filling and intent detection via a dynamic routing-by-agreement schema. A re-routing schema is proposed to further synergize the slot filling performance using the inferred intent representation. Experiments on two real-world datasets show the effectiveness of our model when compared with other alternative model architectures, as well as existing natural language understanding services.
Tasks	Intent Detection, Slot Filling
Published	2018-12-22
URL	https://arxiv.org/abs/1812.09471v2
PDF	https://arxiv.org/pdf/1812.09471v2.pdf
PWC	https://paperswithcode.com/paper/joint-slot-filling-and-intent-detection-via
Repo	https://github.com/Fireblossom/DeepDarkHomeword
Framework	none

Towards Gene Expression Convolutions using Gene Interaction Graphs


Title	Towards Gene Expression Convolutions using Gene Interaction Graphs
Authors	Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio
Abstract	We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the data, however is it not discovered automatically given the noise and low numbers of samples used in most research. We discuss how gene interaction graphs (same pathway, protein-protein, co-expression, or research paper text association) can be used to impose a bias on a deep model similar to the spatial bias imposed by convolutions on an image. We explore the usage of Graph Convolutional Neural Networks coupled with dropout and gene embeddings to utilize the graph information. We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used. We conclude that more work should be done in this direction. We design experiments that show why existing methods fail to capture signal that is present in the data when features are added which clearly isolates the problem that needs to be addressed.
Tasks
Published	2018-06-18
URL	http://arxiv.org/abs/1806.06975v1
PDF	http://arxiv.org/pdf/1806.06975v1.pdf
PWC	https://paperswithcode.com/paper/towards-gene-expression-convolutions-using
Repo	https://github.com/mila-iqia/gene-graph-conv
Framework	pytorch

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction


Title	Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Authors	Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens
Abstract	The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.
Tasks	Image Classification, Meta-Learning, Scene Parsing, Semantic Segmentation, Street Scene Parsing
Published	2018-09-11
URL	http://arxiv.org/abs/1809.04184v1
PDF	http://arxiv.org/pdf/1809.04184v1.pdf
PWC	https://paperswithcode.com/paper/searching-for-efficient-multi-scale
Repo	https://github.com/tensorflow/models/tree/master/research/deeplab
Framework	tf

Supervising strong learners by amplifying weak experts


Title	Supervising strong learners by amplifying weak experts
Authors	Paul Christiano, Buck Shlegeris, Dario Amodei
Abstract	Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors.
Tasks
Published	2018-10-19
URL	http://arxiv.org/abs/1810.08575v1
PDF	http://arxiv.org/pdf/1810.08575v1.pdf
PWC	https://paperswithcode.com/paper/supervising-strong-learners-by-amplifying
Repo	https://github.com/VenatioStudios/Web
Framework	none

Developing Brain Atlas through Deep Learning


Title	Developing Brain Atlas through Deep Learning
Authors	Asim Iqbal, Romesa Khan, Theofanis Karayannis
Abstract	Neuroscientists have devoted significant effort into the creation of standard brain reference atlases for high-throughput registration of anatomical regions of interest. However, variability in brain size and form across individuals poses a significant challenge for such reference atlases. To overcome these limitations, we introduce a fully automated deep neural network-based method (SeBRe) for registration through Segmenting Brain Regions of interest with minimal human supervision. We demonstrate the validity of our method on brain images from different mouse developmental time points, across a range of neuronal markers and imaging modalities. We further assess the performance of our method on images from MR-scanned human brains. Our registration method can accelerate brain-wide exploration of region-specific changes in brain development and, by simply segmenting brain regions of interest for high-throughput brain-wide analysis, provides an alternative to existing complex brain registration techniques.
Tasks
Published	2018-07-10
URL	https://arxiv.org/abs/1807.03440v2
PDF	https://arxiv.org/pdf/1807.03440v2.pdf
PWC	https://paperswithcode.com/paper/developing-brain-atlas-through-deep-learning
Repo	https://github.com/itsasimiqbal/SeBRe
Framework	tf

On Learning 3D Face Morphable Model from In-the-wild Images


Title	On Learning 3D Face Morphable Model from In-the-wild Images
Authors	Luan Tran, Xiaoming Liu
Abstract	As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing.
Tasks	3D Reconstruction, Face Alignment, Image Generation
Published	2018-08-28
URL	https://arxiv.org/abs/1808.09560v2
PDF	https://arxiv.org/pdf/1808.09560v2.pdf
PWC	https://paperswithcode.com/paper/on-learning-3d-face-morphable-model-from-in
Repo	https://github.com/tranluan/Nonlinear_Face_3DMM
Framework	tf

CocoNet: A deep neural network for mapping pixel coordinates to color values


Title	CocoNet: A deep neural network for mapping pixel coordinates to color values
Authors	Paul Andrei Bricman, Radu Tudor Ionescu
Abstract	In this paper, we propose a deep neural network approach for mapping the 2D pixel coordinates in an image to the corresponding Red-Green-Blue (RGB) color values. The neural network is termed CocoNet, i.e. coordinates-to-color network. During the training process, the neural network learns to encode the input image within its layers. More specifically, the network learns a continuous function that approximates the discrete RGB values sampled over the discrete 2D pixel locations. At test time, given a 2D pixel coordinate, the neural network will output the approximate RGB values of the corresponding pixel. By considering every 2D pixel location, the network can actually reconstruct the entire learned image. It is important to note that we have to train an individual neural network for each input image, i.e. one network encodes a single image only. To the best of our knowledge, we are the first to propose a neural approach for encoding images individually, by learning a mapping from the 2D pixel coordinate space to the RGB color space. Our neural image encoding approach has various low-level image processing applications ranging from image encoding, image compression and image denoising to image resampling and image completion. We conduct experiments that include both quantitative and qualitative results, demonstrating the utility of our approach and its superiority over standard baselines, e.g. bilateral filtering or bicubic interpolation. Our code is available at https://github.com/paubric/python-fuse-coconet.
Tasks	Denoising, Image Compression, Image Denoising
Published	2018-05-29
URL	http://arxiv.org/abs/1805.11357v3
PDF	http://arxiv.org/pdf/1805.11357v3.pdf
PWC	https://paperswithcode.com/paper/coconet-a-deep-neural-network-for-mapping
Repo	https://github.com/paubric/python-fuse-coconet
Framework	tf