Paper Group AWR 281
Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators. Stance Prediction for Russian: Data and Analysis. Deep Models of Interactions Across Sets. Building Sequential Inference Models for End-to-End Response Selection. End-to-End Learning of Motion Representation for Video Understanding. Joint Optic D …
Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators
Title | Invocation-driven Neural Approximate Computing with a Multiclass-Classifier and Multiple Approximators |
Authors | Haiyue Song, Chengwen Xu, Qiang Xu, Zhuoran Song, Naifeng Jing, Xiaoyao Liang, Li Jiang |
Abstract | Neural approximate computing gains enormous energy-efficiency at the cost of tolerable quality-loss. A neural approximator can map the input data to output while a classifier determines whether the input data are safe to approximate with quality guarantee. However, existing works cannot maximize the invocation of the approximator, resulting in limited speedup and energy saving. By exploring the mapping space of those target functions, in this paper, we observe a nonuniform distribution of the approximation error incurred by the same approximator. We thus propose a novel approximate computing architecture with a Multiclass-Classifier and Multiple Approximators (MCMA). These approximators have identical network topologies and thus can share the same hardware resource in a neural processing unit(NPU) clip. In the runtime, MCMA can swap in the invoked approximator by merely shipping the synapse weights from the on-chip memory to the buffers near MAC within a cycle. We also propose efficient co-training methods for such MCMA architecture. Experimental results show a more substantial invocation of MCMA as well as the gain of energy-efficiency. |
Tasks | |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08379v1 |
http://arxiv.org/pdf/1810.08379v1.pdf | |
PWC | https://paperswithcode.com/paper/invocation-driven-neural-approximate |
Repo | https://github.com/shyyhs/MCMA |
Framework | none |
Stance Prediction for Russian: Data and Analysis
Title | Stance Prediction for Russian: Data and Analysis |
Authors | Nikita Lozhnikov, Leon Derczynski, Manuel Mazzara |
Abstract | Stance detection is a critical component of rumour and fake news identification. It involves the extraction of the stance a particular author takes related to a given claim, both expressed in text. This paper investigates stance classification for Russian. It introduces a new dataset, RuStance, of Russian tweets and news comments from multiple sources, covering multiple stories, as well as text classification approaches to stance detection as benchmarks over this data in this language. As well as presenting this openly-available dataset, the first of its kind for Russian, the paper presents a baseline for stance prediction in the language. |
Tasks | Stance Detection |
Published | 2018-09-05 |
URL | http://arxiv.org/abs/1809.01574v2 |
http://arxiv.org/pdf/1809.01574v2.pdf | |
PWC | https://paperswithcode.com/paper/stance-prediction-for-russian-data-and |
Repo | https://github.com/npenzin/rustance |
Framework | none |
Deep Models of Interactions Across Sets
Title | Deep Models of Interactions Across Sets |
Authors | Jason Hartford, Devon R Graham, Kevin Leyton-Brown, Siamak Ravanbakhsh |
Abstract | We use deep learning to model interactions across two or more sets of objects, such as user-movie ratings, protein-drug bindings, or ternary user-item-tag interactions. The canonical representation of such interactions is a matrix (or a higher-dimensional tensor) with an exchangeability property: the encoding’s meaning is not changed by permuting rows or columns. We argue that models should hence be Permutation Equivariant (PE): constrained to make the same predictions across such permutations. We present a parameter-sharing scheme and prove that it could not be made any more expressive without violating PE. This scheme yields three benefits. First, we demonstrate state-of-the-art performance on multiple matrix completion benchmarks. Second, our models require a number of parameters independent of the numbers of objects, and thus scale well to large datasets. Third, models can be queried about new objects that were not available at training time, but for which interactions have since been observed. In experiments, our models achieved surprisingly good generalization performance on this matrix extrapolation task, both within domains (e.g., new users and new movies drawn from the same distribution used for training) and even across domains (e.g., predicting music ratings after training on movies). |
Tasks | Matrix Completion, Recommendation Systems |
Published | 2018-03-07 |
URL | http://arxiv.org/abs/1803.02879v2 |
http://arxiv.org/pdf/1803.02879v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-models-of-interactions-across-sets |
Repo | https://github.com/mravanba/deep_exchangeable_tensors |
Framework | tf |
Building Sequential Inference Models for End-to-End Response Selection
Title | Building Sequential Inference Models for End-to-End Response Selection |
Authors | Jia-Chen Gu, Zhen-Hua Ling, Yu-Ping Ruan, Quan Liu |
Abstract | This paper presents an end-to-end response selection model for Track 1 of the 7th Dialogue System Technology Challenges (DSTC7). This task focuses on selecting the correct next utterance from a set of candidates given a partial conversation. We propose an end-to-end neural network based on enhanced sequential inference model (ESIM) for this task. Our proposed model differs from the original ESIM model in the following four aspects. First, a new word representation method which combines the general pre-trained word embeddings with those estimated on the task-specific training set is adopted in order to address the challenge of out-of-vocabulary (OOV) words. Second, an attentive hierarchical recurrent encoder (AHRE) is designed which is capable to encode sentences hierarchically and generate more descriptive representations by aggregation. Third, a new pooling method which combines multi-dimensional pooling and last-state pooling is used instead of the simple combination of max pooling and average pooling in the original ESIM. Last, a modification layer is added before the softmax layer to emphasize the importance of the last utterance in the context for response selection. In the released evaluation results of DSTC7, our proposed method ranked second on the Ubuntu dataset and third on the Advising dataset in subtask 1 of Track 1. |
Tasks | Conversational Response Selection, Word Embeddings |
Published | 2018-12-03 |
URL | http://arxiv.org/abs/1812.00686v1 |
http://arxiv.org/pdf/1812.00686v1.pdf | |
PWC | https://paperswithcode.com/paper/building-sequential-inference-models-for-end |
Repo | https://github.com/JasonForJoy/DSTC7-ResponseSelection |
Framework | tf |
End-to-End Learning of Motion Representation for Video Understanding
Title | End-to-End Learning of Motion Representation for Video Understanding |
Authors | Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang |
Abstract | Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a novel end-to-end trainable neural network, to learn optical-flow-like features from data. TVNet subsumes a specific optical flow solver, the TV-L1 method, and is initialized by unfolding its optimization iterations as neural layers. TVNet can therefore be used directly without any extra learning. Moreover, it can be naturally concatenated with other task-specific networks to formulate an end-to-end architecture, thus making our method more efficient than current multi-stage approaches by avoiding the need to pre-compute and store features on disk. Finally, the parameters of the TVNet can be further fine-tuned by end-to-end training. This enables TVNet to learn richer and task-specific patterns beyond exact optical flow. Extensive experiments on two action recognition benchmarks verify the effectiveness of the proposed approach. Our TVNet achieves better accuracies than all compared methods, while being competitive with the fastest counterpart in terms of features extraction time. |
Tasks | Action Recognition In Videos, Optical Flow Estimation, Video Understanding |
Published | 2018-04-02 |
URL | http://arxiv.org/abs/1804.00413v1 |
http://arxiv.org/pdf/1804.00413v1.pdf | |
PWC | https://paperswithcode.com/paper/end-to-end-learning-of-motion-representation |
Repo | https://github.com/LijieFan/tvnet |
Framework | tf |
Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation
Title | Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation |
Authors | Huazhu Fu, Jun Cheng, Yanwu Xu, Damon Wing Kee Wong, Jiang Liu, Xiaochun Cao |
Abstract | Glaucoma is a chronic eye disease that leads to irreversible vision loss. The cup to disc ratio (CDR) plays an important role in the screening and diagnosis of glaucoma. Thus, the accurate and automatic segmentation of optic disc (OD) and optic cup (OC) from fundus images is a fundamental task. Most existing methods segment them separately, and rely on hand-crafted visual feature from fundus images. In this paper, we propose a deep learning architecture, named M-Net, which solves the OD and OC segmentation jointly in a one-stage multi-label system. The proposed M-Net mainly consists of multi-scale input layer, U-shape convolutional network, side-output layer, and multi-label loss function. The multi-scale input layer constructs an image pyramid to achieve multiple level receptive field sizes. The U-shape convolutional network is employed as the main body network structure to learn the rich hierarchical representation, while the side-output layer acts as an early classifier that produces a companion local prediction map for different scale layers. Finally, a multi-label loss function is proposed to generate the final segmentation map. For improving the segmentation performance further, we also introduce the polar transformation, which provides the representation of the original image in the polar coordinate system. The experiments show that our M-Net system achieves state-of-the-art OD and OC segmentation result on ORIGA dataset. Simultaneously, the proposed method also obtains the satisfactory glaucoma screening performances with calculated CDR value on both ORIGA and SCES datasets. |
Tasks | |
Published | 2018-01-03 |
URL | http://arxiv.org/abs/1801.00926v3 |
http://arxiv.org/pdf/1801.00926v3.pdf | |
PWC | https://paperswithcode.com/paper/joint-optic-disc-and-cup-segmentation-based |
Repo | https://github.com/HzFu/DENet_GlaucomaScreen |
Framework | tf |
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
Title | Rethinking Knowledge Graph Propagation for Zero-Shot Learning |
Authors | Michael Kampffmeyer, Yinbo Chen, Xiaodan Liang, Hao Wang, Yujia Zhang, Eric P. Xing |
Abstract | Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node’s relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches. |
Tasks | Zero-Shot Learning |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11724v3 |
http://arxiv.org/pdf/1805.11724v3.pdf | |
PWC | https://paperswithcode.com/paper/rethinking-knowledge-graph-propagation-for |
Repo | https://github.com/cyvius96/DGP |
Framework | pytorch |
Parallel Grid Pooling for Data Augmentation
Title | Parallel Grid Pooling for Data Augmentation |
Authors | Akito Takeki, Daiki Ikami, Go Irie, Kiyoharu Aizawa |
Abstract | Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://github.com/akitotakeki |
Tasks | Data Augmentation, Image Augmentation, Image Classification |
Published | 2018-03-30 |
URL | http://arxiv.org/abs/1803.11370v1 |
http://arxiv.org/pdf/1803.11370v1.pdf | |
PWC | https://paperswithcode.com/paper/parallel-grid-pooling-for-data-augmentation |
Repo | https://github.com/akitotakeki/pgp-chainer |
Framework | none |
Joint Slot Filling and Intent Detection via Capsule Neural Networks
Title | Joint Slot Filling and Intent Detection via Capsule Neural Networks |
Authors | Chenwei Zhang, Yaliang Li, Nan Du, Wei Fan, Philip S. Yu |
Abstract | Being able to recognize words as slots and detect the intent of an utterance has been a keen issue in natural language understanding. The existing works either treat slot filling and intent detection separately in a pipeline manner, or adopt joint models which sequentially label slots while summarizing the utterance-level intent without explicitly preserving the hierarchical relationship among words, slots, and intents. To exploit the semantic hierarchy for effective modeling, we propose a capsule-based neural network model which accomplishes slot filling and intent detection via a dynamic routing-by-agreement schema. A re-routing schema is proposed to further synergize the slot filling performance using the inferred intent representation. Experiments on two real-world datasets show the effectiveness of our model when compared with other alternative model architectures, as well as existing natural language understanding services. |
Tasks | Intent Detection, Slot Filling |
Published | 2018-12-22 |
URL | https://arxiv.org/abs/1812.09471v2 |
https://arxiv.org/pdf/1812.09471v2.pdf | |
PWC | https://paperswithcode.com/paper/joint-slot-filling-and-intent-detection-via |
Repo | https://github.com/Fireblossom/DeepDarkHomeword |
Framework | none |
Towards Gene Expression Convolutions using Gene Interaction Graphs
Title | Towards Gene Expression Convolutions using Gene Interaction Graphs |
Authors | Francis Dutil, Joseph Paul Cohen, Martin Weiss, Georgy Derevyanko, Yoshua Bengio |
Abstract | We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the data, however is it not discovered automatically given the noise and low numbers of samples used in most research. We discuss how gene interaction graphs (same pathway, protein-protein, co-expression, or research paper text association) can be used to impose a bias on a deep model similar to the spatial bias imposed by convolutions on an image. We explore the usage of Graph Convolutional Neural Networks coupled with dropout and gene embeddings to utilize the graph information. We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used. We conclude that more work should be done in this direction. We design experiments that show why existing methods fail to capture signal that is present in the data when features are added which clearly isolates the problem that needs to be addressed. |
Tasks | |
Published | 2018-06-18 |
URL | http://arxiv.org/abs/1806.06975v1 |
http://arxiv.org/pdf/1806.06975v1.pdf | |
PWC | https://paperswithcode.com/paper/towards-gene-expression-convolutions-using |
Repo | https://github.com/mila-iqia/gene-graph-conv |
Framework | pytorch |
Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Title | Searching for Efficient Multi-Scale Architectures for Dense Image Prediction |
Authors | Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens |
Abstract | The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems. |
Tasks | Image Classification, Meta-Learning, Scene Parsing, Semantic Segmentation, Street Scene Parsing |
Published | 2018-09-11 |
URL | http://arxiv.org/abs/1809.04184v1 |
http://arxiv.org/pdf/1809.04184v1.pdf | |
PWC | https://paperswithcode.com/paper/searching-for-efficient-multi-scale |
Repo | https://github.com/tensorflow/models/tree/master/research/deeplab |
Framework | tf |
Supervising strong learners by amplifying weak experts
Title | Supervising strong learners by amplifying weak experts |
Authors | Paul Christiano, Buck Shlegeris, Dario Amodei |
Abstract | Many real world learning tasks involve complex or hard-to-specify objectives, and using an easier-to-specify proxy can lead to poor performance or misaligned behavior. One solution is to have humans provide a training signal by demonstrating or judging performance, but this approach fails if the task is too complicated for a human to directly evaluate. We propose Iterated Amplification, an alternative training strategy which progressively builds up a training signal for difficult problems by combining solutions to easier subproblems. Iterated Amplification is closely related to Expert Iteration (Anthony et al., 2017; Silver et al., 2017), except that it uses no external reward function. We present results in algorithmic environments, showing that Iterated Amplification can efficiently learn complex behaviors. |
Tasks | |
Published | 2018-10-19 |
URL | http://arxiv.org/abs/1810.08575v1 |
http://arxiv.org/pdf/1810.08575v1.pdf | |
PWC | https://paperswithcode.com/paper/supervising-strong-learners-by-amplifying |
Repo | https://github.com/VenatioStudios/Web |
Framework | none |
Developing Brain Atlas through Deep Learning
Title | Developing Brain Atlas through Deep Learning |
Authors | Asim Iqbal, Romesa Khan, Theofanis Karayannis |
Abstract | Neuroscientists have devoted significant effort into the creation of standard brain reference atlases for high-throughput registration of anatomical regions of interest. However, variability in brain size and form across individuals poses a significant challenge for such reference atlases. To overcome these limitations, we introduce a fully automated deep neural network-based method (SeBRe) for registration through Segmenting Brain Regions of interest with minimal human supervision. We demonstrate the validity of our method on brain images from different mouse developmental time points, across a range of neuronal markers and imaging modalities. We further assess the performance of our method on images from MR-scanned human brains. Our registration method can accelerate brain-wide exploration of region-specific changes in brain development and, by simply segmenting brain regions of interest for high-throughput brain-wide analysis, provides an alternative to existing complex brain registration techniques. |
Tasks | |
Published | 2018-07-10 |
URL | https://arxiv.org/abs/1807.03440v2 |
https://arxiv.org/pdf/1807.03440v2.pdf | |
PWC | https://paperswithcode.com/paper/developing-brain-atlas-through-deep-learning |
Repo | https://github.com/itsasimiqbal/SeBRe |
Framework | tf |
On Learning 3D Face Morphable Model from In-the-wild Images
Title | On Learning 3D Face Morphable Model from In-the-wild Images |
Authors | Luan Tran, Xiaoming Liu |
Abstract | As a classic statistical model of 3D facial shape and albedo, 3D Morphable Model (3DMM) is widely used in facial analysis, e.g., model fitting, image synthesis. Conventional 3DMM is learned from a set of 3D face scans with associated well-controlled 2D face images, and represented by two sets of PCA basis functions. Due to the type and amount of training data, as well as, the linear bases, the representation power of 3DMM can be limited. To address these problems, this paper proposes an innovative framework to learn a nonlinear 3DMM model from a large set of in-the-wild face images, without collecting 3D face scans. Specifically, given a face image as input, a network encoder estimates the projection, lighting, shape and albedo parameters. Two decoders serve as the nonlinear 3DMM to map from the shape and albedo parameters to the 3D shape and albedo, respectively. With the projection parameter, lighting, 3D shape, and albedo, a novel analytically-differentiable rendering layer is designed to reconstruct the original input face. The entire network is end-to-end trainable with only weak supervision. We demonstrate the superior representation power of our nonlinear 3DMM over its linear counterpart, and its contribution to face alignment, 3D reconstruction, and face editing. |
Tasks | 3D Reconstruction, Face Alignment, Image Generation |
Published | 2018-08-28 |
URL | https://arxiv.org/abs/1808.09560v2 |
https://arxiv.org/pdf/1808.09560v2.pdf | |
PWC | https://paperswithcode.com/paper/on-learning-3d-face-morphable-model-from-in |
Repo | https://github.com/tranluan/Nonlinear_Face_3DMM |
Framework | tf |
CocoNet: A deep neural network for mapping pixel coordinates to color values
Title | CocoNet: A deep neural network for mapping pixel coordinates to color values |
Authors | Paul Andrei Bricman, Radu Tudor Ionescu |
Abstract | In this paper, we propose a deep neural network approach for mapping the 2D pixel coordinates in an image to the corresponding Red-Green-Blue (RGB) color values. The neural network is termed CocoNet, i.e. coordinates-to-color network. During the training process, the neural network learns to encode the input image within its layers. More specifically, the network learns a continuous function that approximates the discrete RGB values sampled over the discrete 2D pixel locations. At test time, given a 2D pixel coordinate, the neural network will output the approximate RGB values of the corresponding pixel. By considering every 2D pixel location, the network can actually reconstruct the entire learned image. It is important to note that we have to train an individual neural network for each input image, i.e. one network encodes a single image only. To the best of our knowledge, we are the first to propose a neural approach for encoding images individually, by learning a mapping from the 2D pixel coordinate space to the RGB color space. Our neural image encoding approach has various low-level image processing applications ranging from image encoding, image compression and image denoising to image resampling and image completion. We conduct experiments that include both quantitative and qualitative results, demonstrating the utility of our approach and its superiority over standard baselines, e.g. bilateral filtering or bicubic interpolation. Our code is available at https://github.com/paubric/python-fuse-coconet. |
Tasks | Denoising, Image Compression, Image Denoising |
Published | 2018-05-29 |
URL | http://arxiv.org/abs/1805.11357v3 |
http://arxiv.org/pdf/1805.11357v3.pdf | |
PWC | https://paperswithcode.com/paper/coconet-a-deep-neural-network-for-mapping |
Repo | https://github.com/paubric/python-fuse-coconet |
Framework | tf |