Table of Content

    25 September 2020, Volume 2020 Issue 5 Previous Issue    Next Issue
    For Selected: Toggle Thumbnails
    Data System
    High contention transaction processing prototype for e-commerce workloads
    ZHANG Shuyan, WANG Qingshuai, ZHANG Rong
    2020, 2020 (5):  1-9.  doi: 10.3969/j.issn.1000-5641.202091005
    Abstract ( 399 )   HTML ( 1072 )   PDF (829KB) ( 194 )   Save
    Modern multi-core main-memory databases still cannot achieve ideal performance under high contention. Throughput is considered to be choked by concurrent transactions trying to modify the same data. These transactions contend for the same resources and must be executed serially in a traditional database. Unfortunately, e-commerce workloads often meet with high contentions due to promotions. In this paper, we optimize the transaction processing scheme for high contention e-commerce workloads from two aspects. First, prefiltering is designed to filter invalid modifications to the databases, which can mitigate the contention for locks. Second, if a large number of writes are similar, we propose to do lock sharing among similar writes. We implement a prototype of our proposed system, Filmer, to demonstrate the idea. Extensive experiments have shown that filtering and merging improve efficiency in handling high contention e-commerce workloads.
    References | Related Articles | Metrics
    Global transaction log of a multi-master cloud database
    WEI Xiaoxian, LIU Wenxin, CAI Peng
    2020, 2020 (5):  10-20.  doi: 10.3969/j.issn.1000-5641.202091002
    Abstract ( 398 )   HTML ( 46 )   PDF (814KB) ( 125 )   Save
    With the prevalence of cloud computing, users’ requirements for cloud databases are becoming increasingly complex. The current write-once-read-many cloud database system, based on shared storage, cannot support the dynamic expansion of write performance. Multiple master nodes provide write services simultaneously, which can cause cross-node read and write conflicts, eventually leading to inconsistencies in the cache of multiple master nodes. For this problem, optimistic conflict detection based on globally ordered transaction logs can detect cross-node transaction conflicts, roll back conflicting transactions, and maintain the isolation level and consistency of the overall system. By broadcasting and replaying the global orderly transaction log, moreover, the modification of the master node can be synchronized to the remaining nodes to ensure the independent service capability of each individual node. This multi-master cache consistency solution based on global transaction logs is implemented on the open source database MySQL, and the impact on system performance is verified through experiments.
    References | Related Articles | Metrics
    Research and implementation of a smart automatic contract generation method for Ethereum
    GAO Yichen, ZHAO Bin, ZHANG Zhao
    2020, 2020 (5):  21-32.  doi: 10.3969/j.issn.1000-5641.202091015
    Abstract ( 826 )   HTML ( 58 )   PDF (941KB) ( 334 )   Save
    Smart contracts based on Ethereum have been widely used across various fields. The programming of smart contracts, however, requires professional developers with expertise in a special programming language; in other words, the developers must have professional domain knowledge in addition to programming ability. In this paper, a method for the automated generation of smart contracts for specific domains is proposed with the aim addressing the programming friendliness of smart contracts. The paper introduces cluster analysis of smart contracts and establishes the basic functional code for transactional type smart contracts. MFC is used to link the generated code with UI control and provide users with a friendly smart contract programming page; hence, the automatic generation of smart contracts is realized, thereby reducing the difficulty and cost of contract programming. Finally, a case study is presented to verify the availability of the generated smart contract.
    References | Related Articles | Metrics
    Data Governance
    An integrity auditing scheme based on MHT for power equipment images stored in the cloud
    ZHANG Xun, BAI Wanrong, WEI Feng, WANG Rong, TIAN Xiuxia, LIU Tianshun
    2020, 2020 (5):  33-43.  doi: 10.3969/j.issn.1000-5641.202091012
    Abstract ( 466 )   HTML ( 50 )   PDF (1090KB) ( 130 )   Save
    This paper proposes an integrity auditing scheme suitable for power equipment images stored in the cloud with the aim of addressing the risks of being attacked, tampered with, or lost. First, each image is cut into four image blocks, and then a Scale Invariant Feature Transform (SIFT) algorithm is used to extract features from the image blocks. The four image blocks for each image are subsequently used as a leaf node to construct a Merkle Hash Tree (MHT). Finally, access level bits and update status bits are added to the nodes of the tree. Theoretical analysis and experimental results show that the proposed image integrity auditing scheme has lower computational overhead and higher audit efficiency compared to existing approaches; hence, the scheme can accurately locate the incomplete area of an image and is suitable for auditing integrity of power equipment images in a cloud storage environment.
    References | Related Articles | Metrics
    Research on an endogenous data interaction protocol for the dual-middle platform and dual-chain architecture
    LIU Feng, YANG Jie, LI Zhibin, QI Jiayin
    2020, 2020 (5):  44-55.  doi: 10.3969/j.issn.1000-5641.202091014
    Abstract ( 572 )   HTML ( 60 )   PDF (1123KB) ( 212 )   Save
    Since dual-middle platform architecture of “data and business” was first proposed, the efficiency of secure data interaction between middle platforms has become particularly important. This paper proposes an efficient interaction protocol, while ensuring data security and credibility with the help of a blockchain dual-chain structure, to optimize the efficiency of endogenous data interaction between middle platforms. By extracting the core threshold signature technology in the new protocol, the simulation experiment showed that the protocol reduced 42.1% of the overhead time used in the off-chain signature and on-chain verification processes compared with the method for traditional single signature and on-chain verification. The new protocol will have a positive impact on the integration of blockchain technology in the middle platform and on accelerating the overall adoption of the middle platform and blockchain.
    References | Related Articles | Metrics
    Methodology and System of Machine Learning
    Enriching image descriptions by fusing fine-grained semantic features with a transformer
    WANG Junhao, LUO Yifeng
    2020, 2020 (5):  56-67.  doi: 10.3969/j.issn.1000-5641.202091004
    Abstract ( 478 )   HTML ( 38 )   PDF (754KB) ( 186 )   Save
    Modern image captioning models following the encoder-decoder architecture of a convolutional neural network (CNN) or recurrent neural network (RNN) face the issue of dismissing a large amount of detailed information contained in images and the high cost of training time. In this paper, we propose a novel model, consisting of a compact bilinear encoder and a compact multi-modal decoder, to improve image captioning with fine-grained regional object features. In the encoder, compact bilinear pooling (CBP) is used to encode fine-grained semantic features from an image’s regional features and transformers are used to encode global semantic features from an image’s global bottom-up features; the collective encoded features are subsequently fused using a gate structure to form the overall encoded features of the image. In the decoding process, we extract multi-modal features from fine grained regional object features, and fuse them with overall encoded features to decode semantic information for description generation. Extensive experiments performed on the public Microsoft COCO dataset show that our model achieves state-of-the-art image captioning performance.
    References | Related Articles | Metrics
    Methods and progress in deep neural network model compression
    LAI Yejing, HAO Shanfeng, HUANG Dingjiang
    2020, 2020 (5):  68-82.  doi: 10.3969/j.issn.1000-5641.202091001
    Abstract ( 544 )   HTML ( 52 )   PDF (1132KB) ( 434 )   Save
    The deep neural network (DNN) model achieves strong performance using substantial memory consumption and high computational power, which can be difficult to deploy on hardware platforms with limited resources. To meet these challenges, researchers have made great strides in this field and have formed a wealth of relevant literature and methods. This paper introduces four representative compression methods for deep neural networks used in recent years: network pruning, quantization, knowledge distillation, and compact network design; in particular, the article focuses on the characteristics of these representative models. Finally, evaluation criteria and research prospects of model compression are summarized.
    References | Related Articles | Metrics
    Semantic Extraction from Data
    Approaches on network vertex embedding
    ZHOU Xiaoxu, LIU Yingfeng, FU Yingnan, ZHU Renyu, GAO Ming
    2020, 2020 (5):  83-94.  doi: 10.3969/j.issn.1000-5641.202091007
    Abstract ( 334 )   HTML ( 31 )   PDF (1220KB) ( 169 )   Save
    Network is a commonly used data structure, which is widely applied in social network, communication and biological fields. Thus, how to represent network vertices is one of the difficult problems that is widely concerned in academia and industry. Network vertex representation aims at learning to map each vertex into a vector in a low-dimensional space, and simultaneously preserving the topology structure between vertices in the network. Based on the analysis of the motivation and challenges of network vertex representation, this paper analyzes and compares the mainstream methods of network vertex representation in detail, including matrix decomposition, random walk and deep learning based approaches, and finally introduces the methods to measure the performance of network vertex representation.
    References | Related Articles | Metrics
    Approaches for semantic textual similarity
    HAN Chengcheng, LI Lei, LIU Tingting, GAO Ming
    2020, 2020 (5):  95-112.  doi: 10.3969/j.issn.1000-5641.202091011
    Abstract ( 755 )   HTML ( 55 )   PDF (855KB) ( 628 )   Save
    This paper summarizes the latest research progress on semantic textual similarity calculation methods, including string-based, statistics-based, knowledge-based, and deep-learning-based methods. For each method, the paper reviews not only typical models and approaches, but also discusses the respective advantages and disadvantages of each routine; the paper also explores public datasets and evaluation metrics commonly used. Finally, we put forward several possible directions for future research in the field of semantic textual similarity.
    References | Related Articles | Metrics
    Relation extraction via distant supervision technology
    WANG Jianing, HE Yi, ZHU Renyu, LIU Tingting, GAO Ming
    2020, 2020 (5):  113-130.  doi: 10.3969/j.issn.1000-5641.202091006
    Abstract ( 1001 )   HTML ( 55 )   PDF (902KB) ( 625 )   Save
    Relation extraction is one of the classic natural language processing tasks that has been widely used in knowledge graph construction and completion, knowledge base question answering, and text summarization. It aims to extract the semantic relation from a target entity pair. In order to construct a large-scale supervised corpus efficiently, a distant supervision method was proposed to realize automatic annotation by aligning the text with the existing knowledge base. However, it highlights a series of challenges as a result of over-strong assumptions and, accordingly, has attracted the attention of researchers. Firstly, this paper introduces the theories of distant supervision relation extraction and the corresponding formal descriptions. Secondly, we systematically analyze related methods and their respective pros and cons from three perspectives: noisy data, insufficient information, and data imbalance. Next, we explain and compare some benchmark corpus and evaluation metrics. Lastly, we highlight new subsequent challenges for distant supervision relation extraction and discuss trends and directions of future research before concluding.
    References | Related Articles | Metrics
    Application of Data Platform
    The role of the middle platform in digital government implementation
    CHEN Bing, FANG Haibin, ZHAO Wenwen
    2020, 2020 (5):  131-136.  doi: 10.3969/j.issn.1000-5641.202091008
    Abstract ( 515 )   HTML ( 39 )   PDF (515KB) ( 231 )   Save
    This paper introduces the characteristics of digital government by reviewing the development of digital government systems. Combined with the development of IT technology, it is demonstrated that the middle platform is an important technical support component for the construction of digital government. Using a case study on the building process of Shanghai’s “Integrated Online Platform”, this paper introduces the construction process of the middle platform on the business, data, and application aspects of government affairs; the article also offers a summary for the future direction of development.
    References | Related Articles | Metrics
    Methodology for building a business-oriented data asset system: Feature hierarchies
    REN Yinzi
    2020, 2020 (5):  137-145.  doi: 10.3969/j.issn.1000-5641.202091009
    Abstract ( 1142 )   HTML ( 87 )   PDF (1226KB) ( 933 )   Save
    This paper proposes a new method for building a business-oriented data asset system. Data assets are one of the core components of the data middle office concept and require business-oriented asset mapping to realize the transformation from the asset to the broader business value. The proposed methodology uses feature hierarchies and describes how to organize data assets based on a tree structure, with the root as an object, the branch as a category, and a leaf/flower as a tag. There are energetic connections between the various object trees and the growth of these trees is supplied by the business. The instantiation of feature hierarchies can be realized in two modes: overall planning and local interception. The asset results are divided into two major parts, namely asset inventories and asset entities; these can be quickly configured as data service results for business use through service management tools in order to realize the value of the underlying data assets.
    References | Related Articles | Metrics
    Research on abnormal detection of daily loss rate based on a variational auto-encoder
    ZHANG Guofang, LIU Tongyu, WEN Lili, GUO Guo, ZHOU Zhongxin, YUAN Peisen
    2020, 2020 (5):  146-155.  doi: 10.3969/j.issn.1000-5641.202091013
    Abstract ( 401 )   HTML ( 36 )   PDF (1034KB) ( 232 )   Save
    This paper adopts an anomaly detection algorithm based on a self-encoder to achieve anomaly detection of large-scale daily line loss rate data. A variational auto-encoder is a neural network that uses the backpropagation algorithm to make the output value approximately equal to the input value. It uses the auto-encoder to encode the original daily line loss rate time series and records the reconstruction possibility at each time point during the reconstruction process. When the reconstruction possibility is greater than a specified threshold, it is classified as anomaly data. In this paper, experiments were conducted on real daily line loss data. The test results show that the proposed algorithm for abnormal detection of daily line loss rate data based on an auto-encoder has good detection capability.
    References | Related Articles | Metrics
    GRS: A generative retrieval dialogue model for intelligent customer service in the field of e-commerce
    GUO Xiaozhe, PENG Dunlu, ZHANG Yatong, PENG Xuegui
    2020, 2020 (5):  156-166.  doi: 10.3969/j.issn.1000-5641.202091010
    Abstract ( 544 )   HTML ( 47 )   PDF (992KB) ( 196 )   Save
    There are generally two ways to realize most intelligent chat systems: ① based on retrieval and ② based on generation. The content and type of responses, however, are limited by the corpus chosen. The generative approach can obtain responses that are not in the corpus, rendering it more flexible; at the same time, it is also easy to produce errors or meaningless replies. In order to solve the aforementioned problems, a new model GRS (generative retrieval score) is proposed. This model can train the retrieval model and the generation model simultaneously. A scoring module is used to rank the results of the retrieval model and the generation model, and the responses with high scores are taken as the output of the overall dialogue system. As a result, GRS can combine the advantages of both dialogue systems and output a specific, diverse, and flexible response. An experiment on a real-world JingDong intelligent customer service dialogue dataset shows that the proposed model offers better outputs than existing retrieval and generation models.
    References | Related Articles | Metrics
    Merchant churn prediction based on transaction data of aggregate payment platform
    XU Yiwen, LI Xiaoyang, DONG Qiwen, QIAN Weining, ZHOU Fang
    2020, 2020 (5):  167-178.  doi: 10.3969/j.issn.1000-5641.202091016
    Abstract ( 511 )   HTML ( 41 )   PDF (979KB) ( 232 )   Save
    In the field of aggregate payments, ensuring a low dropout rate of merchants on the platform is a key issue to reduce the overall platform operating cost and increase profit. This study focuses on the prediction of merchant churn for aggregate payment platforms and aims to help the platform reactivate potential churn merchants. The paper proposes a series of features that are highly relevant to merchant churn and applies a variety of traditional machine learning models for prediction. Given that the data analyzed contains sequential information, the study, moreover, applies LSTM-based techniques to address the prediction problem. Experimental results on a real dataset show that the proposed features have a certain predictive ability and the results are interpretable. And, the LSTM-based approaches are capable of capturing the timing characteristics in the data and further improve prediction results.
    References | Related Articles | Metrics
    Discovering traveling companions using autoencoders
    LI Xiaochang, CHEN Bei, DONG Qiwen, LU Xuesong
    2020, 2020 (5):  179-188.  doi: 10.3969/j.issn.1000-5641.202091003
    Abstract ( 384 )   HTML ( 30 )   PDF (1668KB) ( 398 )   Save
    With the widespread adoption of mobile devices, today’s location tracking systems are producing tremendous amounts of trajectory data on a continuous basis. The ability to discover moving objects that travel together (i.e., traveling companions) from their respective trajectories is desirable for many applications, including intelligent transportation systems and intelligent advertising. Existing algorithms are either based on pattern mining methods that define a particular pattern of traveling companions or based on representation learning methods that learn similar representations for similar trajectories. The former method suffers from the pairwise point-matching problem, and the latter often ignores the temporal proximity between trajectories. In this work, we propose a deep representation learning model using autoencoders, namely Mean-Attn (Mean-Attention) , for the discovery of traveling companions. Mean-Attn collectively injects spatial and temporal information into its input embeddings using skip-gram and positional encoding techniques, respectively. In addition, our model encourages trajectories to learn from their neighbors by leveraging the sort-tile-recursive (STR) algorithm as well as the mean operation and global attention mechanisms. After obtaining the representations from the encoder, we run DBSCAN (Density-Based Spatial Clustering of Applications with Noise) to cluster the representations and find traveling companions. Experimental results suggest that Mean-Attn performs better than the state-of-the-art data mining and deep learning algorithms for locating traveling companions.
    References | Related Articles | Metrics