January 26, 2020

3262 words 16 mins read

Paper Group ANR 1404

Adaptive Densely Connected Super-Resolution Reconstruction. To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations. Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning. How to make a pizza: Learning a compositional layer-based GAN model. Learning Algebraic Structur …

Adaptive Densely Connected Super-Resolution Reconstruction


Title	Adaptive Densely Connected Super-Resolution Reconstruction
Authors	Tangxin Xie, Xin Yang, Yu Jia, Chen Zhu, Xiaochuan Li
Abstract	For a better performance in single image super-resolution(SISR), we present an image super-resolution algorithm based on adaptive dense connection (ADCSR). The algorithm is divided into two parts: BODY and SKIP. BODY improves the utilization of convolution features through adaptive dense connections. Also, we develop an adaptive sub-pixel reconstruction layer (AFSL) to reconstruct the features of the BODY output. We pre-trained SKIP to make BODY focus on high-frequency feature learning. The comparison of PSNR, SSIM, and visual effects verify the superiority of our method to the state-of-the-art algorithms.
Tasks	Image Super-Resolution, Super-Resolution
Published	2019-12-17
URL	https://arxiv.org/abs/1912.08002v2
PDF	https://arxiv.org/pdf/1912.08002v2.pdf
PWC	https://paperswithcode.com/paper/adaptive-densely-connected-super-resolution
Repo
Framework

To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations


Title	To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations
Authors	Chaitanya Ahuja, Shugao Ma, Louis-Philippe Morency, Yaser Sheikh
Abstract	Non verbal behaviours such as gestures, facial expressions, body posture, and para-linguistic cues have been shown to complement or clarify verbal messages. Hence to improve telepresence, in form of an avatar, it is important to model these behaviours, especially in dyadic interactions. Creating such personalized avatars not only requires to model intrapersonal dynamics between a avatar’s speech and their body pose, but it also needs to model interpersonal dynamics with the interlocutor present in the conversation. In this paper, we introduce a neural architecture named Dyadic Residual-Attention Model (DRAM), which integrates intrapersonal (monadic) and interpersonal (dyadic) dynamics using selective attention to generate sequences of body pose conditioned on audio and body pose of the interlocutor and audio of the human operating the avatar. We evaluate our proposed model on dyadic conversational data consisting of pose and audio of both participants, confirming the importance of adaptive attention between monadic and dyadic dynamics when predicting avatar pose. We also conduct a user study to analyze judgments of human observers. Our results confirm that the generated body pose is more natural, models intrapersonal dynamics and interpersonal dynamics better than non-adaptive monadic/dyadic models.
Tasks
Published	2019-10-05
URL	https://arxiv.org/abs/1910.02181v1
PDF	https://arxiv.org/pdf/1910.02181v1.pdf
PWC	https://paperswithcode.com/paper/to-react-or-not-to-react-end-to-end-visual
Repo
Framework

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning


Title	Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning
Authors	Youngeun Kwon, Minsoo Rhu
Abstract	As the models and the datasets to train deep learning (DL) models scale, system architects are faced with new challenges, one of which is the memory capacity bottleneck, where the limited physical memory inside the accelerator device constrains the algorithm that can be studied. We propose a memory-centric deep learning system that can transparently expand the memory capacity available to the accelerators while also providing fast inter-device communication for parallel training. Our proposal aggregates a pool of memory modules locally within the device-side interconnect, which are decoupled from the host interface and function as a vehicle for transparent memory capacity expansion. Compared to conventional systems, our proposal achieves an average 2.8x speedup on eight DL applications and increases the system-wide memory capacity to tens of TBs.
Tasks
Published	2019-02-18
URL	http://arxiv.org/abs/1902.06468v1
PDF	http://arxiv.org/pdf/1902.06468v1.pdf
PWC	https://paperswithcode.com/paper/beyond-the-memory-wall-a-case-for-memory
Repo
Framework

How to make a pizza: Learning a compositional layer-based GAN model


Title	How to make a pizza: Learning a compositional layer-based GAN model
Authors	Dim P. Papadopoulos, Youssef Tamaazousti, Ferda Ofli, Ingmar Weber, Antonio Torralba
Abstract	A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g., adding an ingredient) or changing the appearance of the existing ones (e.g., cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added to or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to: (1) segment pizza toppings in a weaklysupervised fashion, (2) remove them by revealing what is occluded underneath them (i.e., inpainting), and (3) infer the ordering of the toppings without any depth ordering supervision. Code, data, and models are available online.
Tasks
Published	2019-06-06
URL	https://arxiv.org/abs/1906.02839v1
PDF	https://arxiv.org/pdf/1906.02839v1.pdf
PWC	https://paperswithcode.com/paper/how-to-make-a-pizza-learning-a-compositional-1
Repo
Framework

Learning Algebraic Structures: Preliminary Investigations


Title	Learning Algebraic Structures: Preliminary Investigations
Authors	Yang-Hui He, Minhyong Kim
Abstract	We employ techniques of machine-learning, exemplified by support vector machines and neural classifiers, to initiate the study of whether AI can “learn” algebraic structures. Using finite groups and finite rings as a concrete playground, we find that questions such as identification of simple groups by “looking” at the Cayley table or correctly matching addition and multiplication tables for finite rings can, at least for structures of small size, be performed by the AI, even after having been trained only on small number of cases. These results are in tandem with recent investigations on whether AI can solve certain classes of problems in algebraic geometry.
Tasks
Published	2019-05-02
URL	https://arxiv.org/abs/1905.02263v1
PDF	https://arxiv.org/pdf/1905.02263v1.pdf
PWC	https://paperswithcode.com/paper/learning-algebraic-structures-preliminary
Repo
Framework

Semantic Driven Fielded Entity Retrieval


Title	Semantic Driven Fielded Entity Retrieval
Authors	Shahrzad Naseri, Sheikh Muhammad Sarwar, James Allan
Abstract	A common approach for knowledge-base entity search is to consider an entity as a document with multiple fields. Models that focus on matching query terms in different fields are popular choices for searching such entity representations. An instance of such a model is FSDM (Fielded Sequential Dependence Model). We propose to integrate field-level semantic features into FSDM. We use FSDM to retrieve a pool of documents, and then to use semantic field-level features to re-rank those documents. We propose to represent queries as bags of terms as well as bags of entities, and eventually, use their dense vector representation to compute semantic features based on query document similarity. Our proposed re-ranking approach achieves significant improvement in entity retrieval on the DBpedia-Entity (v2) dataset over existing FSDM model. Specifically, for all queries we achieve 2.5% and 1.2% significant improvement in NDCG@10 and NDCG@100, respectively.
Tasks
Published	2019-07-02
URL	https://arxiv.org/abs/1907.01457v1
PDF	https://arxiv.org/pdf/1907.01457v1.pdf
PWC	https://paperswithcode.com/paper/semantic-driven-fielded-entity-retrieval
Repo
Framework

Harvesting Information from Captions for Weakly Supervised Semantic Segmentation


Title	Harvesting Information from Captions for Weakly Supervised Semantic Segmentation
Authors	Johann Sawatzky, Debayan Banerjee, Juergen Gall
Abstract	Since acquiring pixel-wise annotations for training convolutional neural networks for semantic image segmentation is time-consuming, weakly supervised approaches that only require class tags have been proposed. In this work, we propose another form of supervision, namely image captions as they can be found on the Internet. These captions have two advantages. They do not require additional curation as it is the case for the clean class tags used by current weakly supervised approaches and they provide textual context for the classes present in an image. To leverage such textual context, we deploy a multi-modal network that learns a joint embedding of the visual representation of the image and the textual representation of the caption. The network estimates text activation maps (TAMs) for class names as well as compound concepts, i.e. combinations of nouns and their attributes. The TAMs of compound concepts describing classes of interest substantially improve the quality of the estimated class activation maps which are then used to train a network for semantic segmentation. We evaluate our method on the COCO dataset where it achieves state of the art results for weakly supervised image segmentation.
Tasks	Image Captioning, Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-05-16
URL	https://arxiv.org/abs/1905.06784v1
PDF	https://arxiv.org/pdf/1905.06784v1.pdf
PWC	https://paperswithcode.com/paper/harvesting-information-from-captions-for
Repo
Framework

Interpretable neural networks based on continuous-valued logic and multicriteria decision operators


Title	Interpretable neural networks based on continuous-valued logic and multicriteria decision operators
Authors	Orsolya Csiszár, Gábor Csiszár, József Dombi
Abstract	Combining neural networks with continuous logic and multicriteria decision making tools can reduce the black box nature of neural models. In this study, we show that nilpotent logical systems offer an appropriate mathematical framework for a hybridization of continuous nilpotent logic and neural models, helping to improve the interpretability and safety of machine learning. In our concept, perceptrons model soft inequalities; namely membership functions and continuous logical operators. We design the network architecture before training, using continuous logical operators and multicriteria decision tools with given weights working in the hidden layers. Designing the structure appropriately leads to a drastic reduction in the number of parameters to be learned. The theoretical basis offers a straightforward choice of activation functions (the cutting function or its differentiable approximation, the squashing function), and also suggests an explanation to the great success of the rectified linear unit (ReLU). In this study, we focus on the architecture of a hybrid model and introduce the building blocks for future application in deep neural networks. The concept is illustrated with some toy examples taken from an extended version of the tensorflow playground.
Tasks	Decision Making
Published	2019-10-06
URL	https://arxiv.org/abs/1910.02486v2
PDF	https://arxiv.org/pdf/1910.02486v2.pdf
PWC	https://paperswithcode.com/paper/semantic-interpretation-of-deep-neural
Repo
Framework

M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding


Title	M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding
Authors	Titouan Parcollet, Mohamed Morchid, Xavier Bost, Georges Linarès
Abstract	Deep learning is at the core of recent spoken language understanding (SLU) related tasks. More precisely, deep neural networks (DNNs) drastically increased the performances of SLU systems, and numerous architectures have been proposed. In the real-life context of theme identification of telephone conversations, it is common to hold both a human, manual (TRS) and an automatically transcribed (ASR) versions of the conversations. Nonetheless, and due to production constraints, only the ASR transcripts are considered to build automatic classifiers. TRS transcripts are only used to measure the performances of ASR systems. Moreover, the recent performances in term of classification accuracy, obtained by DNN related systems are close to the performances reached by humans, and it becomes difficult to further increase the performances by only considering the ASR transcripts. This paper proposes to distillates the TRS knowledge available during the training phase within the ASR representation, by using a new generative adversarial network called M2H-GAN to generate a TRS-like version of an ASR document, to improve the theme identification performances.
Tasks	Spoken Language Understanding
Published	2019-04-13
URL	http://arxiv.org/abs/1905.01957v1
PDF	http://arxiv.org/pdf/1905.01957v1.pdf
PWC	https://paperswithcode.com/paper/190501957
Repo
Framework

A Regulation Enforcement Solution for Multi-agent Reinforcement Learning


Title	A Regulation Enforcement Solution for Multi-agent Reinforcement Learning
Authors	Fan-Yun Sun, Yen-Yu Chang, Yueh-Hua Wu, Shou-De Lin
Abstract	Human behaviors are regularized by a variety of norms or regulations, either to maintain orders or to enhance social welfare. If artificially intelligent (AI) agents make decisions on behalf of human beings, we would hope they can also follow established regulations while interacting with humans or other AI agents. However, it is possible that an AI agent can opt to disobey the regulations (being defective) for self-interests. In this paper, we aim to answer the following question: Consider a multi-agent decentralized environment. Agents make decisions in complete isolation of other agents. Each agent knows the state of its own MDP and its own actions but it does not know the states and the actions taken by other players. There is a set of regulations for all agents to follow. Although most agents are benign and will comply to regulations but not all agents are compliant at first, can we develop a framework such that it is in the self-interest of non-compliant agents to comply after all?. We first introduce the problem as Regulation Enforcement and formulate it using reinforcement learning and game theory under the scenario where agents make decisions in complete isolation of other agents. We then propose a solution based on the key idea that although we could not alter how defective agents choose to behave, we can, however, leverage the aggregated power of compliant agents to boycott the defective ones. We conducted simulated experiments on two scenarios: Replenishing Resource Management Dilemma and Diminishing Reward Shaping Enforcement, using deep multi-agent reinforcement learning algorithms. We further use empirical game-theoretic analysis to show that the method alters the resulting empirical payoff matrices in a way that promotes compliance (making mutual compliant a Nash Equilibrium).
Tasks	Multi-agent Reinforcement Learning
Published	2019-01-29
URL	https://arxiv.org/abs/1901.10059v5
PDF	https://arxiv.org/pdf/1901.10059v5.pdf
PWC	https://paperswithcode.com/paper/a-regulation-enforcement-solution-for-multi
Repo
Framework

Prediction of Sewer Pipe Deterioration Using Random Forest Classification


Title	Prediction of Sewer Pipe Deterioration Using Random Forest Classification
Authors	Razieh Tavakoli, Ali Sharifara, Mohammad Najafi
Abstract	Wastewater infrastructure systems deteriorate over time due to a combination of physical and chemical factors. Failure of this significant infrastructure could affect important social, environmental, and economic impacts. Furthermore, recognizing the optimized timeline for inspection of sewer pipelines are challenging tasks for the utility managers and other authorities. Regular examination of sewer networks is not cost-effective due to limited time and high cost of assessment technologies and a large inventory of pipes. To avoid such obstacles, various researchers endeavored to improve infrastructure condition assessment methodologies to maintain sewer pipe systems at the desired condition. Sewer condition prediction models are developed to provide a framework to forecast the future condition of pipes to schedule inspection frequencies. The main goal of this study is to develop a predictive model for wastewater pipes using random forest classification. Predictive models can effectively predict sewer pipe condition and can increase the certainty level of the predictive results and decrease uncertainty in the current condition of wastewater pipes. The developed random forest classification model has achieved a stratified test set false negative rate, the false positive rate, and an excellent area under the ROC curve of 0.81 in a case study application for the City of LA, California. An area under the ROC curve > 0.80 indicates the developed model is an “excellent” choice for predicting the condition of individual pipes in a sewer network. The deterioration models can be used in the industry to improve the inspection timeline and maintenance planning.
Tasks
Published	2019-12-09
URL	https://arxiv.org/abs/1912.04194v1
PDF	https://arxiv.org/pdf/1912.04194v1.pdf
PWC	https://paperswithcode.com/paper/prediction-of-sewer-pipe-deterioration-using
Repo
Framework

Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation


Title	Box-driven Class-wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation
Authors	Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang
Abstract	Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-04-26
URL	http://arxiv.org/abs/1904.11693v1
PDF	http://arxiv.org/pdf/1904.11693v1.pdf
PWC	https://paperswithcode.com/paper/box-driven-class-wise-region-masking-and
Repo
Framework

Memory-free dynamics for the TAP equations of Ising models with arbitrary rotation invariant ensembles of random coupling matrices


Title	Memory-free dynamics for the TAP equations of Ising models with arbitrary rotation invariant ensembles of random coupling matrices
Authors	Burak Çakmak, Manfred Opper
Abstract	We propose an iterative algorithm for solving the Thouless-Anderson-Palmer (TAP) equations of Ising models with arbitrary rotation invariant (random) coupling matrices. In the thermodynamic limit, we prove by means of the dynamical functional method that the proposed algorithm converges when the so-called de Almeida Thouless (AT) criterion is fulfilled. Moreover, we give exact analytical expressions for the rate of the convergence.
Tasks
Published	2019-01-24
URL	http://arxiv.org/abs/1901.08583v2
PDF	http://arxiv.org/pdf/1901.08583v2.pdf
PWC	https://paperswithcode.com/paper/memory-free-dynamics-for-the-tap-equations-of
Repo
Framework

Weakly Supervised Semantic Segmentation of Satellite Images


Title	Weakly Supervised Semantic Segmentation of Satellite Images
Authors	Adrien Nivaggioli, Hicham Randrianarivo
Abstract	When one wants to train a neural network to perform semantic segmentation, creating pixel-level annotations for each of the images in the database is a tedious task. If he works with aerial or satellite images, which are usually very large, it is even worse. With that in mind, we investigate how to use image-level annotations in order to perform semantic segmentation. Image-level annotations are much less expensive to acquire than pixel-level annotations, but we lose a lot of information for the training of the model. From the annotations of the images, the model must find by itself how to classify the different regions of the image. In this work, we use the method proposed by Anh and Kwak [1] to produce pixel-level annotation from image level annotation. We compare the overall quality of our generated dataset with the original dataset. In addition, we propose an adaptation of the AffinityNet that allows us to directly perform a semantic segmentation. Our results show that the generated labels lead to the same performances for the training of several segmentation networks. Also, the quality of semantic segmentation performed directly by the AffinityNet and the Random Walk is close to the one of the best fully-supervised approaches.
Tasks	Semantic Segmentation, Weakly-Supervised Semantic Segmentation
Published	2019-04-08
URL	http://arxiv.org/abs/1904.03983v1
PDF	http://arxiv.org/pdf/1904.03983v1.pdf
PWC	https://paperswithcode.com/paper/weakly-supervised-semantic-segmentation-of
Repo
Framework

That and There: Judging the Intent of Pointing Actions with Robotic Arms


Title	That and There: Judging the Intent of Pointing Actions with Robotic Arms
Authors	Malihe Alikhani, Baber Khalid, Rahul Shome, Chaitanya Mitash, Kostas Bekris, Matthew Stone
Abstract	Collaborative robotics requires effective communication between a robot and a human partner. This work proposes a set of interpretive principles for how a robotic arm can use pointing actions to communicate task information to people by extending existing models from the related literature. These principles are evaluated through studies where English-speaking human subjects view animations of simulated robots instructing pick-and-place tasks. The evaluation distinguishes two classes of pointing actions that arise in pick-and-place tasks: referential pointing (identifying objects) and locating pointing (identifying locations). The study indicates that human subjects show greater flexibility in interpreting the intent of referential pointing compared to locating pointing, which needs to be more deliberate. The results also demonstrate the effects of variation in the environment and task context on the interpretation of pointing. Our corpus, experiments and design principles advance models of context, common sense reasoning and communication in embodied communication.
Tasks	Common Sense Reasoning
Published	2019-12-13
URL	https://arxiv.org/abs/1912.06602v1
PDF	https://arxiv.org/pdf/1912.06602v1.pdf
PWC	https://paperswithcode.com/paper/that-and-there-judging-the-intent-of-pointing
Repo
Framework