Feature management plays an important role in the AI(artificial intelligence) pipeline. Feature stores are designed to offer effective versioning of features during the model training and inference stages. Feature stores must ensure real-time feature updates and version management to collaborate with the upstream data ingestion tasks and power the model serving system. In AI-powered online decision augmentation applications, the model serving system responds to requests in real time to provide better user experience, and feature stores face the challenge of low-latency online feature retrieval. Focusing on this challenge, we developed FeaDB, an in-memory based multi-version online feature store, which adopts a time series model and provides feature versioning semantics to automatically manage features from ingestion to serving. Moreover, an append-write operation was applied to ensure ingestion performance, and version indexing was optimized to improve read operations. A snapshot mechanism is proposed, and it was experimentally proven that snapshot read operations improve performance of lookup and range lookup.
China has the advantages of scale and diversity in data resources, and mobile internet data applications, which generate massive amounts of data in diverse application scenarios, recommendation systems have the capability to extract valuable information from this massive amounts of data, thereby mitigating the problem of information overload. Most existing research on recommendation systems focused on centralized recommender systems, training the data on the cloud centrally. However, with increasingly prominent data security and privacy protection issues, collecting user data has become increasingly difficult, making centralized recommendation methods infeasible. This study focuses on privacy-preserving cloud-end collaborative training in a decentralized manner for personalized recommender systems. To fully utilize the advantages of end devices and cloud servers while considering privacy and security issues, a cloud-end collaborative training method named FedMNN (federated machine learning and mobile neural network) is proposed for recommender systems based on federated machine learning (FedML) and a mobile neural network (MNN). The proposed method was divided into three parts: First, cloud-based models implemented in various deep learning frameworks were converted into general MNN models for end-device training using the ONNX (open neural network exchange) intermediate framework and a MNN model conversion tool. Second, the cloud server sends the model to the end-side devices, which initialized and obtain local data for training and loss calculation, followed by gradient back-propagation. Finally, the end-side models are fed back to the cloud server for model aggregation and updating. Depending on different requirements, the cloud model was deployed on end-side devices as required, achieving end-cloud collaboration. Experiments comparing power consumption of the proposed FedMNN and FLTFlite (flower and TensorFlow lite) frameworks on benchmark tasks identified that FedMNN is 32% to 51% lower than FLTFlite. Using DSSM (deep structured semantic model) and deep and wide recommendation models, the experimental results demonstrated the effectiveness of the proposed cloud-end collaborative training method.
The high parallelism and throughput of graphics processing unit (GPU) can improve the performance of on-line analytical processing (OLAP) queries in databases. However, openGauss currently cannot take advantage of the benefits of heterogeneous computing hardware such as GPU. Therefore, in this study, we explore using GPU to accelerate the OLAP processing in the system and achieve higher performance. The focus is on how to implement and optimize GPU acceleration modules for openGauss. To address the difference in execution granularity between openGauss and PostgreSQL, we propose a CPU (central processing unit)-GPU collaborative parallel solution based on chunked reading and key distribution. This solution can reduce the I/O (input/output) time of the GPU Scan operator to reduce idle waiting time, and run multiple instances of GPU Join to support multi-GPU environments. To address the architectural differences between openGauss and PostgreSQL, a heterogeneous operator acceleration technology compatible with vectorized engines is proposed. A custom operator framework is implemented that can embed a vectorized execution engine, and a vectorized GPU Scan operator capable of processing openGauss columnar data is employed based on this framework. A prototype system is implemented to verify the effectiveness of the proposed approach.
User interfaces (UIs) play a vital role in the interactions between an application and its users. The current popularity of mobile Internet has led to the large-scale migration of web-based applications from desktop to mobile. Web front-end development has become more extensive and in-depth in application development. Traditional web front-end development relies on designers to give initial design drafts and then programmers to write the corresponding UI code. This method has high industry barriers and slow development, which are not conducive to rapid product iteration. The development of deep learning makes it possible to automatically generate web front-end code based on UI images. Existing methods poorly capture the features of UI images, and the accuracy of the generated code is low. To mitigate these problems, we propose an encoder–decoder model, called image2code, based on the Swin Transformer, which is used to generate web front-end code from UI images. Image2code regards the process of generating web front-end code from UI images as an image captioning task and uses Swin Transformer with a sliding window design as the backbone network of the encoder and decoder. The sliding window operation limits the attention calculation to one window, which reduces the amount of calculation by the attention mechanism while simultaneously ensuring that feature connections remain across windows. In addition, image2code generates Emmet code, which is much simpler and can be directly converted to HTML code, improving the efficiency of model training. Experimental results show that image2code performs better than existing representative models, such as pix2code and image2emmet, in the task of web front-end code generation on existing and newly constructed datasets.
In heterogeneous federated learning systems, among a variety of edge devices such as personal computers and embedded devices, resource-constrained devices, i.e. stragglers, reduce the training efficiency of the federated learning system. This paper proposes a heterogeneous coded federated learning (HCFL) system to ① improve the training efficiency of the system and speed up the training of heterogeneous federated learning (FL) for multiple stragglers, ② provide a certain level of data privacy protection. The HCFL scheme designs scheduling strategies from the perspective of client and server to satisfy the accelerated calculation of multiple stragglers model in the general environment. In addition, a linear coded computing (LCC) scheme is designed to provide data protection for task distribution. The experimental results show that HCFL can reduce training time by 89.85% when the performance difference between devices is large.
With the continuous development of network attack methods, it is becoming increasingly difficult to protect the security of power communication networks. Currently, the detection accuracy of abnormal traffic in distribution communication networks is insufficient and the efficiency of abnormal traffic detection is low. To address these issues, a new method for abnormal traffic detection in distribution communication networks is proposed, in which feature extraction and traffic classification are improved. The proposed method utilizes a time-frequency domain feature extraction method, using an adaptive redundancy boosting multiwavelet packet transform to quickly extract frequency-domain features, while time-domain features are extracted using the communication characteristics of the distribution network. To improve traffic classification and detection, a parallel deep forest classification algorithm is proposed based on a distributed computing framework, and the training and classification task scheduling strategies are optimized. The experimental results show that the false alarm rate of the proposed method is only 2.63% and the accuracy rate for the detection of abnormal traffic in distribution networks is 98.29%.