July 29, 2019

2972 words 14 mins read

Paper Group AWR 189

Computer vision-based food calorie estimation: dataset, method, and experiment. A Statistical Comparison of Some Theories of NP Word Order. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. Extreme 3D Face Reconstr …

Computer vision-based food calorie estimation: dataset, method, and experiment


Title	Computer vision-based food calorie estimation: dataset, method, and experiment
Authors	Yanchao Liang, Jianhua Li
Abstract	Computer vision has been introduced to estimate calories from food images. But current food image data sets don’t contain volume and mass records of foods, which leads to an incomplete calorie estimation. In this paper, we present a novel food image data set with volume and mass records of foods, and a deep learning method for food detection, to make a complete calorie estimation. Our data set includes 2978 images, and every image contains corresponding each food’s annotation, volume and mass records, as well as a certain calibration reference. To estimate calorie of food in the proposed data set, a deep learning method using Faster R-CNN first is put forward to detect the food. And the experiment results show our method is effective to estimate calories and our data set contains adequate information for calorie estimation. Our data set is the first released food image data set which can be used to evaluate computer vision-based calorie estimation methods.
Tasks	Calibration
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07632v3
PDF	http://arxiv.org/pdf/1705.07632v3.pdf
PWC	https://paperswithcode.com/paper/computer-vision-based-food-calorie-estimation
Repo	https://github.com/Yiming-Miao/Calorie-Predictor
Framework	none

A Statistical Comparison of Some Theories of NP Word Order


Title	A Statistical Comparison of Some Theories of NP Word Order
Authors	Richard Futrell, Roger Levy, Matthew Dryer
Abstract	A frequent object of study in linguistic typology is the order of elements {demonstrative, adjective, numeral, noun} in the noun phrase. The goal is to predict the relative frequencies of these orders across languages. Here we use Poisson regression to statistically compare some prominent accounts of this variation. We compare feature systems derived from Cinque (2005) to feature systems given in Cysouw (2010) and Dryer (in prep). In this setting, we do not find clear reasons to prefer the model of Cinque (2005) or Dryer (in prep), but we find both of these models have substantially better fit to the typological data than the model from Cysouw (2010).
Tasks
Published	2017-09-08
URL	http://arxiv.org/abs/1709.02783v1
PDF	http://arxiv.org/pdf/1709.02783v1.pdf
PWC	https://paperswithcode.com/paper/a-statistical-comparison-of-some-theories-of
Repo	https://github.com/psejenks/berkeley_u20
Framework	none

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks


Title	Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Authors	Brenden M. Lake, Marco Baroni
Abstract	Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb “dax,” he or she can immediately understand the meaning of “dax twice” or “sing and dax.” In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply “mix-and-match” strategies to solve the task. However, when generalization requires systematic compositional skills (as in the “dax” example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks’ notorious training data thirst.
Tasks	Machine Translation
Published	2017-10-31
URL	http://arxiv.org/abs/1711.00350v3
PDF	http://arxiv.org/pdf/1711.00350v3.pdf
PWC	https://paperswithcode.com/paper/generalization-without-systematicity-on-the
Repo	https://github.com/brendenlake/SCAN
Framework	none

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning


Title	Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Authors	Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra
Abstract	We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Qbot and Abot – who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end – from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results. First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision. Second, we conduct large-scale real-image experiments on the VisDial dataset, where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’ agents significantly outperform SL agents. Interestingly, the RL Qbot learns to ask questions that Abot is good at, ultimately resulting in more informative dialog and a better team.
Tasks	Visual Dialog, Visual Question Answering
Published	2017-03-20
URL	http://arxiv.org/abs/1703.06585v2
PDF	http://arxiv.org/pdf/1703.06585v2.pdf
PWC	https://paperswithcode.com/paper/learning-cooperative-visual-dialog-agents
Repo	https://github.com/batra-mlp-lab/visdial-rl
Framework	pytorch

Extreme 3D Face Reconstruction: Seeing Through Occlusions


Title	Extreme 3D Face Reconstruction: Seeing Through Occlusions
Authors	Anh Tuan Tran, Tal Hassner, Iacopo Masi, Eran Paz, Yuval Nirkin, Gerard Medioni
Abstract	Existing single view, 3D face reconstruction methods can produce beautifully detailed 3D results, but typically only for near frontal, unobstructed viewpoints. We describe a system designed to provide detailed 3D reconstructions of faces viewed under extreme conditions, out of plane rotations, and occlusions. Motivated by the concept of bump mapping, we propose a layered approach which decouples estimation of a global shape from its mid-level details (e.g., wrinkles). We estimate a coarse 3D face shape which acts as a foundation and then separately layer this foundation with details represented by a bump map. We show how a deep convolutional encoder-decoder can be used to estimate such bump maps. We further show how this approach naturally extends to generate plausible details for occluded facial regions. We test our approach and its components extensively, quantitatively demonstrating the invariance of our estimated facial details. We further provide numerous qualitative examples showing that our method produces detailed 3D face shapes in viewing conditions where existing state of the art often break down.
Tasks	3D Face Reconstruction, Face Reconstruction
Published	2017-12-14
URL	http://arxiv.org/abs/1712.05083v2
PDF	http://arxiv.org/pdf/1712.05083v2.pdf
PWC	https://paperswithcode.com/paper/extreme-3d-face-reconstruction-seeing-through
Repo	https://github.com/anhttran/extreme_3d_faces
Framework	pytorch

Dataflow Matrix Machines and V-values: a Bridge between Programs and Neural Nets


Title	Dataflow Matrix Machines and V-values: a Bridge between Programs and Neural Nets
Authors	Michael Bukatin, Jon Anthony
Abstract	1) Dataflow matrix machines (DMMs) generalize neural nets by replacing streams of numbers with linear streams (streams supporting linear combinations), allowing arbitrary input and output arities for activation functions, countable-sized networks with finite dynamically changeable active part capable of unbounded growth, and a very expressive self-referential mechanism. 2) DMMs are suitable for general-purpose programming, while retaining the key property of recurrent neural networks: programs are expressed via matrices of real numbers, and continuous changes to those matrices produce arbitrarily small variations in the associated programs. 3) Spaces of V-values (vector-like elements based on nested maps) are particularly useful, enabling DMMs with variadic activation functions and conveniently representing conventional data structures.
Tasks
Published	2017-12-20
URL	http://arxiv.org/abs/1712.07447v2
PDF	http://arxiv.org/pdf/1712.07447v2.pdf
PWC	https://paperswithcode.com/paper/dataflow-matrix-machines-and-v-values-a
Repo	https://github.com/jsa-aerial/DMM
Framework	none

Automated cardiovascular magnetic resonance image analysis with fully convolutional networks


Title	Automated cardiovascular magnetic resonance image analysis with fully convolutional networks
Authors	Wenjia Bai, Matthew Sinclair, Giacomo Tarroni, Ozan Oktay, Martin Rajchl, Ghislain Vaillant, Aaron M. Lee, Nay Aung, Elena Lukaschuk, Mihir M. Sanghvi, Filip Zemrak, Kenneth Fung, Jose Miguel Paiva, Valentina Carapella, Young Jin Kim, Hideaki Suzuki, Bernhard Kainz, Paul M. Matthews, Steffen E. Petersen, Stefan K. Piechnik, Stefan Neubauer, Ben Glocker, Daniel Rueckert
Abstract	Cardiovascular magnetic resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches for CMR image analysis, which is time consuming and prone to subjective errors. It is a major clinical challenge to automatically derive quantitative and clinically relevant information from CMR images. Deep neural networks have shown a great potential in image pattern recognition and segmentation for a variety of tasks. Here we demonstrate an automated analysis method for CMR images, which is based on a fully convolutional network (FCN). The network is trained and evaluated on a large-scale dataset from the UK Biobank, consisting of 4,875 subjects with 93,500 pixelwise annotated images. The performance of the method has been evaluated using a number of technical metrics, including the Dice metric, mean contour distance and Hausdorff distance, as well as clinically relevant measures, including left ventricle (LV) end-diastolic volume (LVEDV) and end-systolic volume (LVESV), LV mass (LVM); right ventricle (RV) end-diastolic volume (RVEDV) and end-systolic volume (RVESV). By combining FCN with a large-scale annotated dataset, the proposed automated method achieves a high performance on par with human experts in segmenting the LV and RV on short-axis CMR images and the left atrium (LA) and right atrium (RA) on long-axis CMR images.
Tasks
Published	2017-10-25
URL	http://arxiv.org/abs/1710.09289v4
PDF	http://arxiv.org/pdf/1710.09289v4.pdf
PWC	https://paperswithcode.com/paper/automated-cardiovascular-magnetic-resonance
Repo	https://github.com/baiwenjia/ukbb_cardiac
Framework	tf

Audio to Body Dynamics


Title	Audio to Body Dynamics
Authors	Eli Shlizerman, Lucio M. Dery, Hayden Schoen, Ira Kemelmacher-Shlizerman
Abstract	We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it’s not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation.
Tasks
Published	2017-12-19
URL	http://arxiv.org/abs/1712.09382v1
PDF	http://arxiv.org/pdf/1712.09382v1.pdf
PWC	https://paperswithcode.com/paper/audio-to-body-dynamics
Repo	https://github.com/facebookresearch/Audio2BodyDynamics
Framework	pytorch

Sharpness-aware Low dose CT denoising using conditional generative adversarial network


Title	Sharpness-aware Low dose CT denoising using conditional generative adversarial network
Authors	Xin Yi, Paul Babyn
Abstract	Low Dose Computed Tomography (LDCT) has offered tremendous benefits in radiation restricted applications, but the quantum noise as resulted by the insufficient number of photons could potentially harm the diagnostic performance. Current image-based denoising methods tend to produce a blur effect on the final reconstructed results especially in high noise levels. In this paper, a deep learning based approach was proposed to mitigate this problem. An adversarially trained network and a sharpness detection network were trained to guide the training process. Experiments on both simulated and real dataset shows that the results of the proposed method have very small resolution loss and achieves better performance relative to the-state-of-art methods both quantitatively and visually.
Tasks	Denoising
Published	2017-08-22
URL	http://arxiv.org/abs/1708.06453v2
PDF	http://arxiv.org/pdf/1708.06453v2.pdf
PWC	https://paperswithcode.com/paper/sharpness-aware-low-dose-ct-denoising-using
Repo	https://github.com/xinario/SAGAN
Framework	torch

A Brief Introduction to Machine Learning for Engineers


Title	A Brief Introduction to Machine Learning for Engineers
Authors	Osvaldo Simeone
Abstract	This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. The material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra.
Tasks
Published	2017-09-08
URL	http://arxiv.org/abs/1709.02840v3
PDF	http://arxiv.org/pdf/1709.02840v3.pdf
PWC	https://paperswithcode.com/paper/a-brief-introduction-to-machine-learning-for
Repo	https://github.com/leojang/Notes
Framework	tf


Title	WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition
Authors	Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals
Abstract	We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR.
Tasks	Machine Translation, Speech Recognition
Published	2017-09-21
URL	http://arxiv.org/abs/1709.07484v1
PDF	http://arxiv.org/pdf/1709.07484v1.pdf
PWC	https://paperswithcode.com/paper/werd-using-social-text-spelling-variants-for
Repo	https://github.com/qcri/werd
Framework	none

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search


Title	Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Authors	Chris Hokamp, Qun Liu
Abstract	We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence $ \mathbf{\hat{y}} = {y_{0}\ldots y_{T}} $, by maximizing $ p(\mathbf{y} \mathbf{x}) = \prod\limits_{t}p(y_{t} \mathbf{x}; {y_{0} \ldots y_{t-1}}) $. Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model’s output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios.
Tasks	Domain Adaptation, Machine Translation
Published	2017-04-24
URL	http://arxiv.org/abs/1704.07138v2
PDF	http://arxiv.org/pdf/1704.07138v2.pdf
PWC	https://paperswithcode.com/paper/lexically-constrained-decoding-for-sequence
Repo	https://github.com/chrishokamp/constrained_decoding
Framework	none

Deep Learning with Dynamic Computation Graphs


Title	Deep Learning with Dynamic Computation Graphs
Authors	Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, Peter Norvig
Abstract	Neural networks that compute over graph structures are a natural fit for problems in a variety of domains, including natural language (parse trees) and cheminformatics (molecular graphs). However, since the computation graph has a different shape and size for every input, such networks do not directly support batched training or inference. They are also difficult to implement in popular deep learning libraries, which are based on static data-flow graphs. We introduce a technique called dynamic batching, which not only batches together operations between different input graphs of dissimilar shape, but also between different nodes within a single input graph. The technique allows us to create static graphs, using popular libraries, that emulate dynamic computation graphs of arbitrary shape and size. We further present a high-level library of compositional blocks that simplifies the creation of dynamic graph models. Using the library, we demonstrate concise and batch-wise parallel implementations for a variety of models from the literature.
Tasks
Published	2017-02-07
URL	http://arxiv.org/abs/1702.02181v2
PDF	http://arxiv.org/pdf/1702.02181v2.pdf
PWC	https://paperswithcode.com/paper/deep-learning-with-dynamic-computation-graphs
Repo	https://github.com/tensorflow/fold
Framework	tf

On-the-fly Operation Batching in Dynamic Computation Graphs


Title	On-the-fly Operation Batching in Dynamic Computation Graphs
Authors	Graham Neubig, Yoav Goldberg, Chris Dyer
Abstract	Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, using computationally efficient batched operations. On a variety of tasks, we obtain throughput similar to that obtained with manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually.
Tasks
Published	2017-05-22
URL	http://arxiv.org/abs/1705.07860v1
PDF	http://arxiv.org/pdf/1705.07860v1.pdf
PWC	https://paperswithcode.com/paper/on-the-fly-operation-batching-in-dynamic
Repo	https://github.com/bplank/bilstm-aux
Framework	none

Training Deep AutoEncoders for Collaborative Filtering


Title	Training Deep AutoEncoders for Collaborative Filtering
Authors	Oleksii Kuchaiev, Boris Ginsburg
Abstract	This paper proposes a novel model for the rating prediction task in recommender systems which significantly outperforms previous state-of-the art models on a time-split Netflix data set. Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training. We empirically demonstrate that: a) deep autoencoder models generalize much better than the shallow ones, b) non-linear activation functions with negative parts are crucial for training deep models, and c) heavy use of regularization techniques such as dropout is necessary to prevent over-fiting. We also propose a new training algorithm based on iterative output re-feeding to overcome natural sparseness of collaborate filtering. The new algorithm significantly speeds up training and improves model performance. Our code is available at https://github.com/NVIDIA/DeepRecommender
Tasks	Recommendation Systems
Published	2017-08-05
URL	http://arxiv.org/abs/1708.01715v3
PDF	http://arxiv.org/pdf/1708.01715v3.pdf
PWC	https://paperswithcode.com/paper/training-deep-autoencoders-for-collaborative
Repo	https://github.com/NVIDIA/DeepRecommender
Framework	pytorch