Paper Group AWR 189
Computer vision-based food calorie estimation: dataset, method, and experiment. A Statistical Comparison of Some Theories of NP Word Order. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning. Extreme 3D Face Reconstr …
Computer vision-based food calorie estimation: dataset, method, and experiment
Title | Computer vision-based food calorie estimation: dataset, method, and experiment |
Authors | Yanchao Liang, Jianhua Li |
Abstract | Computer vision has been introduced to estimate calories from food images. But current food image data sets don’t contain volume and mass records of foods, which leads to an incomplete calorie estimation. In this paper, we present a novel food image data set with volume and mass records of foods, and a deep learning method for food detection, to make a complete calorie estimation. Our data set includes 2978 images, and every image contains corresponding each food’s annotation, volume and mass records, as well as a certain calibration reference. To estimate calorie of food in the proposed data set, a deep learning method using Faster R-CNN first is put forward to detect the food. And the experiment results show our method is effective to estimate calories and our data set contains adequate information for calorie estimation. Our data set is the first released food image data set which can be used to evaluate computer vision-based calorie estimation methods. |
Tasks | Calibration |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07632v3 |
http://arxiv.org/pdf/1705.07632v3.pdf | |
PWC | https://paperswithcode.com/paper/computer-vision-based-food-calorie-estimation |
Repo | https://github.com/Yiming-Miao/Calorie-Predictor |
Framework | none |
A Statistical Comparison of Some Theories of NP Word Order
Title | A Statistical Comparison of Some Theories of NP Word Order |
Authors | Richard Futrell, Roger Levy, Matthew Dryer |
Abstract | A frequent object of study in linguistic typology is the order of elements {demonstrative, adjective, numeral, noun} in the noun phrase. The goal is to predict the relative frequencies of these orders across languages. Here we use Poisson regression to statistically compare some prominent accounts of this variation. We compare feature systems derived from Cinque (2005) to feature systems given in Cysouw (2010) and Dryer (in prep). In this setting, we do not find clear reasons to prefer the model of Cinque (2005) or Dryer (in prep), but we find both of these models have substantially better fit to the typological data than the model from Cysouw (2010). |
Tasks | |
Published | 2017-09-08 |
URL | http://arxiv.org/abs/1709.02783v1 |
http://arxiv.org/pdf/1709.02783v1.pdf | |
PWC | https://paperswithcode.com/paper/a-statistical-comparison-of-some-theories-of |
Repo | https://github.com/psejenks/berkeley_u20 |
Framework | none |
Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks
Title | Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks |
Authors | Brenden M. Lake, Marco Baroni |
Abstract | Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb “dax,” he or she can immediately understand the meaning of “dax twice” or “sing and dax.” In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply “mix-and-match” strategies to solve the task. However, when generalization requires systematic compositional skills (as in the “dax” example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks’ notorious training data thirst. |
Tasks | Machine Translation |
Published | 2017-10-31 |
URL | http://arxiv.org/abs/1711.00350v3 |
http://arxiv.org/pdf/1711.00350v3.pdf | |
PWC | https://paperswithcode.com/paper/generalization-without-systematicity-on-the |
Repo | https://github.com/brendenlake/SCAN |
Framework | none |
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning
Title | Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning |
Authors | Abhishek Das, Satwik Kottur, José M. F. Moura, Stefan Lee, Dhruv Batra |
Abstract | We introduce the first goal-driven training for visual question answering and dialog agents. Specifically, we pose a cooperative ‘image guessing’ game between two agents – Qbot and Abot – who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images. We use deep reinforcement learning (RL) to learn the policies of these agents end-to-end – from pixels to multi-agent multi-round dialog to game reward. We demonstrate two experimental results. First, as a ‘sanity check’ demonstration of pure RL (from scratch), we show results on a synthetic world, where the agents communicate in ungrounded vocabulary, i.e., symbols with no pre-specified meanings (X, Y, Z). We find that two bots invent their own communication protocol and start using certain symbols to ask/answer about certain visual attributes (shape/color/style). Thus, we demonstrate the emergence of grounded language and communication among ‘visual’ dialog agents with no human supervision. Second, we conduct large-scale real-image experiments on the VisDial dataset, where we pretrain with supervised dialog data and show that the RL ‘fine-tuned’ agents significantly outperform SL agents. Interestingly, the RL Qbot learns to ask questions that Abot is good at, ultimately resulting in more informative dialog and a better team. |
Tasks | Visual Dialog, Visual Question Answering |
Published | 2017-03-20 |
URL | http://arxiv.org/abs/1703.06585v2 |
http://arxiv.org/pdf/1703.06585v2.pdf | |
PWC | https://paperswithcode.com/paper/learning-cooperative-visual-dialog-agents |
Repo | https://github.com/batra-mlp-lab/visdial-rl |
Framework | pytorch |
Extreme 3D Face Reconstruction: Seeing Through Occlusions
Title | Extreme 3D Face Reconstruction: Seeing Through Occlusions |
Authors | Anh Tuan Tran, Tal Hassner, Iacopo Masi, Eran Paz, Yuval Nirkin, Gerard Medioni |
Abstract | Existing single view, 3D face reconstruction methods can produce beautifully detailed 3D results, but typically only for near frontal, unobstructed viewpoints. We describe a system designed to provide detailed 3D reconstructions of faces viewed under extreme conditions, out of plane rotations, and occlusions. Motivated by the concept of bump mapping, we propose a layered approach which decouples estimation of a global shape from its mid-level details (e.g., wrinkles). We estimate a coarse 3D face shape which acts as a foundation and then separately layer this foundation with details represented by a bump map. We show how a deep convolutional encoder-decoder can be used to estimate such bump maps. We further show how this approach naturally extends to generate plausible details for occluded facial regions. We test our approach and its components extensively, quantitatively demonstrating the invariance of our estimated facial details. We further provide numerous qualitative examples showing that our method produces detailed 3D face shapes in viewing conditions where existing state of the art often break down. |
Tasks | 3D Face Reconstruction, Face Reconstruction |
Published | 2017-12-14 |
URL | http://arxiv.org/abs/1712.05083v2 |
http://arxiv.org/pdf/1712.05083v2.pdf | |
PWC | https://paperswithcode.com/paper/extreme-3d-face-reconstruction-seeing-through |
Repo | https://github.com/anhttran/extreme_3d_faces |
Framework | pytorch |
Dataflow Matrix Machines and V-values: a Bridge between Programs and Neural Nets
Title | Dataflow Matrix Machines and V-values: a Bridge between Programs and Neural Nets |
Authors | Michael Bukatin, Jon Anthony |
Abstract | 1) Dataflow matrix machines (DMMs) generalize neural nets by replacing streams of numbers with linear streams (streams supporting linear combinations), allowing arbitrary input and output arities for activation functions, countable-sized networks with finite dynamically changeable active part capable of unbounded growth, and a very expressive self-referential mechanism. 2) DMMs are suitable for general-purpose programming, while retaining the key property of recurrent neural networks: programs are expressed via matrices of real numbers, and continuous changes to those matrices produce arbitrarily small variations in the associated programs. 3) Spaces of V-values (vector-like elements based on nested maps) are particularly useful, enabling DMMs with variadic activation functions and conveniently representing conventional data structures. |
Tasks | |
Published | 2017-12-20 |
URL | http://arxiv.org/abs/1712.07447v2 |
http://arxiv.org/pdf/1712.07447v2.pdf | |
PWC | https://paperswithcode.com/paper/dataflow-matrix-machines-and-v-values-a |
Repo | https://github.com/jsa-aerial/DMM |
Framework | none |
Automated cardiovascular magnetic resonance image analysis with fully convolutional networks
Title | Automated cardiovascular magnetic resonance image analysis with fully convolutional networks |
Authors | Wenjia Bai, Matthew Sinclair, Giacomo Tarroni, Ozan Oktay, Martin Rajchl, Ghislain Vaillant, Aaron M. Lee, Nay Aung, Elena Lukaschuk, Mihir M. Sanghvi, Filip Zemrak, Kenneth Fung, Jose Miguel Paiva, Valentina Carapella, Young Jin Kim, Hideaki Suzuki, Bernhard Kainz, Paul M. Matthews, Steffen E. Petersen, Stefan K. Piechnik, Stefan Neubauer, Ben Glocker, Daniel Rueckert |
Abstract | Cardiovascular magnetic resonance (CMR) imaging is a standard imaging modality for assessing cardiovascular diseases (CVDs), the leading cause of death globally. CMR enables accurate quantification of the cardiac chamber volume, ejection fraction and myocardial mass, providing information for diagnosis and monitoring of CVDs. However, for years, clinicians have been relying on manual approaches for CMR image analysis, which is time consuming and prone to subjective errors. It is a major clinical challenge to automatically derive quantitative and clinically relevant information from CMR images. Deep neural networks have shown a great potential in image pattern recognition and segmentation for a variety of tasks. Here we demonstrate an automated analysis method for CMR images, which is based on a fully convolutional network (FCN). The network is trained and evaluated on a large-scale dataset from the UK Biobank, consisting of 4,875 subjects with 93,500 pixelwise annotated images. The performance of the method has been evaluated using a number of technical metrics, including the Dice metric, mean contour distance and Hausdorff distance, as well as clinically relevant measures, including left ventricle (LV) end-diastolic volume (LVEDV) and end-systolic volume (LVESV), LV mass (LVM); right ventricle (RV) end-diastolic volume (RVEDV) and end-systolic volume (RVESV). By combining FCN with a large-scale annotated dataset, the proposed automated method achieves a high performance on par with human experts in segmenting the LV and RV on short-axis CMR images and the left atrium (LA) and right atrium (RA) on long-axis CMR images. |
Tasks | |
Published | 2017-10-25 |
URL | http://arxiv.org/abs/1710.09289v4 |
http://arxiv.org/pdf/1710.09289v4.pdf | |
PWC | https://paperswithcode.com/paper/automated-cardiovascular-magnetic-resonance |
Repo | https://github.com/baiwenjia/ukbb_cardiac |
Framework | tf |
Audio to Body Dynamics
Title | Audio to Body Dynamics |
Authors | Eli Shlizerman, Lucio M. Dery, Hayden Schoen, Ira Kemelmacher-Shlizerman |
Abstract | We present a method that gets as input an audio of violin or piano playing, and outputs a video of skeleton predictions which are further used to animate an avatar. The key idea is to create an animation of an avatar that moves their hands similarly to how a pianist or violinist would do, just from audio. Aiming for a fully detailed correct arms and fingers motion is a goal, however, it’s not clear if body movement can be predicted from music at all. In this paper, we present the first result that shows that natural body dynamics can be predicted at all. We built an LSTM network that is trained on violin and piano recital videos uploaded to the Internet. The predicted points are applied onto a rigged avatar to create the animation. |
Tasks | |
Published | 2017-12-19 |
URL | http://arxiv.org/abs/1712.09382v1 |
http://arxiv.org/pdf/1712.09382v1.pdf | |
PWC | https://paperswithcode.com/paper/audio-to-body-dynamics |
Repo | https://github.com/facebookresearch/Audio2BodyDynamics |
Framework | pytorch |
Sharpness-aware Low dose CT denoising using conditional generative adversarial network
Title | Sharpness-aware Low dose CT denoising using conditional generative adversarial network |
Authors | Xin Yi, Paul Babyn |
Abstract | Low Dose Computed Tomography (LDCT) has offered tremendous benefits in radiation restricted applications, but the quantum noise as resulted by the insufficient number of photons could potentially harm the diagnostic performance. Current image-based denoising methods tend to produce a blur effect on the final reconstructed results especially in high noise levels. In this paper, a deep learning based approach was proposed to mitigate this problem. An adversarially trained network and a sharpness detection network were trained to guide the training process. Experiments on both simulated and real dataset shows that the results of the proposed method have very small resolution loss and achieves better performance relative to the-state-of-art methods both quantitatively and visually. |
Tasks | Denoising |
Published | 2017-08-22 |
URL | http://arxiv.org/abs/1708.06453v2 |
http://arxiv.org/pdf/1708.06453v2.pdf | |
PWC | https://paperswithcode.com/paper/sharpness-aware-low-dose-ct-denoising-using |
Repo | https://github.com/xinario/SAGAN |
Framework | torch |
A Brief Introduction to Machine Learning for Engineers
Title | A Brief Introduction to Machine Learning for Engineers |
Authors | Osvaldo Simeone |
Abstract | This monograph aims at providing an introduction to key concepts, algorithms, and theoretical results in machine learning. The treatment concentrates on probabilistic models for supervised and unsupervised learning problems. It introduces fundamental concepts and algorithms by building on first principles, while also exposing the reader to more advanced topics with extensive pointers to the literature, within a unified notation and mathematical framework. The material is organized according to clearly defined categories, such as discriminative and generative models, frequentist and Bayesian approaches, exact and approximate inference, as well as directed and undirected models. This monograph is meant as an entry point for researchers with a background in probability and linear algebra. |
Tasks | |
Published | 2017-09-08 |
URL | http://arxiv.org/abs/1709.02840v3 |
http://arxiv.org/pdf/1709.02840v3.pdf | |
PWC | https://paperswithcode.com/paper/a-brief-introduction-to-machine-learning-for |
Repo | https://github.com/leojang/Notes |
Framework | tf |
WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition
Title | WERd: Using Social Text Spelling Variants for Evaluating Dialectal Speech Recognition |
Authors | Ahmed Ali, Preslav Nakov, Peter Bell, Steve Renals |
Abstract | We study the problem of evaluating automatic speech recognition (ASR) systems that target dialectal speech input. A major challenge in this case is that the orthography of dialects is typically not standardized. From an ASR evaluation perspective, this means that there is no clear gold standard for the expected output, and several possible outputs could be considered correct according to different human annotators, which makes standard word error rate (WER) inadequate as an evaluation metric. Such a situation is typical for machine translation (MT), and thus we borrow ideas from an MT evaluation metric, namely TERp, an extension of translation error rate which is closely-related to WER. In particular, in the process of comparing a hypothesis to a reference, we make use of spelling variants for words and phrases, which we mine from Twitter in an unsupervised fashion. Our experiments with evaluating ASR output for Egyptian Arabic, and further manual analysis, show that the resulting WERd (i.e., WER for dialects) metric, a variant of TERp, is more adequate than WER for evaluating dialectal ASR. |
Tasks | Machine Translation, Speech Recognition |
Published | 2017-09-21 |
URL | http://arxiv.org/abs/1709.07484v1 |
http://arxiv.org/pdf/1709.07484v1.pdf | |
PWC | https://paperswithcode.com/paper/werd-using-social-text-spelling-variants-for |
Repo | https://github.com/qcri/werd |
Framework | none |
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Title | Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search |
Authors | Chris Hokamp, Qun Liu |
Abstract | We present Grid Beam Search (GBS), an algorithm which extends beam search to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model that generates a sequence $ \mathbf{\hat{y}} = {y_{0}\ldots y_{T}} $, by maximizing $ p(\mathbf{y} \mathbf{x}) = \prod\limits_{t}p(y_{t} \mathbf{x}; {y_{0} \ldots y_{t-1}}) $. Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate additional knowledge into a model’s output without requiring any modification of the model parameters or training data. We demonstrate the feasibility and flexibility of Lexically Constrained Decoding by conducting experiments on Neural Interactive-Predictive Translation, as well as Domain Adaptation for Neural Machine Translation. Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, GBS can be used to achieve significant gains in performance in domain adaptation scenarios. |
Tasks | Domain Adaptation, Machine Translation |
Published | 2017-04-24 |
URL | http://arxiv.org/abs/1704.07138v2 |
http://arxiv.org/pdf/1704.07138v2.pdf | |
PWC | https://paperswithcode.com/paper/lexically-constrained-decoding-for-sequence |
Repo | https://github.com/chrishokamp/constrained_decoding |
Framework | none |
Deep Learning with Dynamic Computation Graphs
Title | Deep Learning with Dynamic Computation Graphs |
Authors | Moshe Looks, Marcello Herreshoff, DeLesley Hutchins, Peter Norvig |
Abstract | Neural networks that compute over graph structures are a natural fit for problems in a variety of domains, including natural language (parse trees) and cheminformatics (molecular graphs). However, since the computation graph has a different shape and size for every input, such networks do not directly support batched training or inference. They are also difficult to implement in popular deep learning libraries, which are based on static data-flow graphs. We introduce a technique called dynamic batching, which not only batches together operations between different input graphs of dissimilar shape, but also between different nodes within a single input graph. The technique allows us to create static graphs, using popular libraries, that emulate dynamic computation graphs of arbitrary shape and size. We further present a high-level library of compositional blocks that simplifies the creation of dynamic graph models. Using the library, we demonstrate concise and batch-wise parallel implementations for a variety of models from the literature. |
Tasks | |
Published | 2017-02-07 |
URL | http://arxiv.org/abs/1702.02181v2 |
http://arxiv.org/pdf/1702.02181v2.pdf | |
PWC | https://paperswithcode.com/paper/deep-learning-with-dynamic-computation-graphs |
Repo | https://github.com/tensorflow/fold |
Framework | tf |
On-the-fly Operation Batching in Dynamic Computation Graphs
Title | On-the-fly Operation Batching in Dynamic Computation Graphs |
Authors | Graham Neubig, Yoav Goldberg, Chris Dyer |
Abstract | Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, using computationally efficient batched operations. On a variety of tasks, we obtain throughput similar to that obtained with manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually. |
Tasks | |
Published | 2017-05-22 |
URL | http://arxiv.org/abs/1705.07860v1 |
http://arxiv.org/pdf/1705.07860v1.pdf | |
PWC | https://paperswithcode.com/paper/on-the-fly-operation-batching-in-dynamic |
Repo | https://github.com/bplank/bilstm-aux |
Framework | none |
Training Deep AutoEncoders for Collaborative Filtering
Title | Training Deep AutoEncoders for Collaborative Filtering |
Authors | Oleksii Kuchaiev, Boris Ginsburg |
Abstract | This paper proposes a novel model for the rating prediction task in recommender systems which significantly outperforms previous state-of-the art models on a time-split Netflix data set. Our model is based on deep autoencoder with 6 layers and is trained end-to-end without any layer-wise pre-training. We empirically demonstrate that: a) deep autoencoder models generalize much better than the shallow ones, b) non-linear activation functions with negative parts are crucial for training deep models, and c) heavy use of regularization techniques such as dropout is necessary to prevent over-fiting. We also propose a new training algorithm based on iterative output re-feeding to overcome natural sparseness of collaborate filtering. The new algorithm significantly speeds up training and improves model performance. Our code is available at https://github.com/NVIDIA/DeepRecommender |
Tasks | Recommendation Systems |
Published | 2017-08-05 |
URL | http://arxiv.org/abs/1708.01715v3 |
http://arxiv.org/pdf/1708.01715v3.pdf | |
PWC | https://paperswithcode.com/paper/training-deep-autoencoders-for-collaborative |
Repo | https://github.com/NVIDIA/DeepRecommender |
Framework | pytorch |