随着城市化的推进以及大数据技术的不断发展,智慧商圈成为智慧城市建设的重要组成部分.智慧商圈的热门程度、消费者的规模、消费层次等因素成为智慧商圈建设的关注热点.然而,传统的消费者规模的统计,还是基于传统的问卷调查或者抽样等,这些方法不仅成本昂贵而且效率低下.但随着数据挖掘技术的发展,使得通过分析用户行为轨迹来确定商圈消费者规模成为可能.本文提出了一种基于轨迹数据分析的商圈消费者规模分析方法.本文的主要工作包括:①在轨迹数据中,如何确定商圈的边界这是一个首要的问题,基于此,才能确定一位消费者是在商圈内活动,还是在商圈外面.本文提出了根据商圈内基站点的位置分布,运用k-NearestNeighbor(kNN)分类算法,对该商圈的范围进行圈定的方法.②由于轨迹数据的不确定性特点,确定一个用户与商圈的关系也是一个难题.本文利用计算不规则多边形面积的方法计算基站点的权重值,结合时间阈值分析该区域内每天的消费者规模.③最后,鉴于轨迹数据的海量性,本文提出了一个大数据计算框架BPDA(Business-Circle Parallel Distributed Algorithm),基于Hadoop大数据处理平台和Kafka分布式消息系统,实现了基于移动轨迹数据的商圈消费者规模分析系统,并使用中山公园商圈基站数据,展示了本文所提方法的可行性.
With the advancement of urbanization and continental development of big data technology, smart business has become an important part of smart city construction. The popularity, consumer number scale and consumption level of smart business also become the hot spot in the construction of smart city. However, traditional consumer statistics method is based on traditional survey and sampling, etc. All of these traditional methods are high-cost and inefficient. Fortunately, the fast development of data mining technology makes statistics in business circle by analyzing user behavior trajectory data possible. In this paper, we propose a consumer scale analysis method on business circle using user trajectory data. There are three mainly work parts:① How to determine the real boundary of business circle in trajectory data analysis domain is a primary problem, and we can judge a consumer activity within or outside the business circle based on it. Facing this issue, we raise a new method to delineate business circle using k-Nearest Neighbor(kNN) classification algorithm based on the location of base station within business circle.② How to determine the relationship between user and business circle is also a new problem due to uncertainty of trajectory characteristics. We calculate irregular polygon area to evaluate the weight of each base station and also combine with time threshold in order to analyze consumer scale every day.③ Finally, considering large amounts in trajectory data, we propose a big data computing framework BPDA (Business-Circle Parallel Distributed Algorithm), which is based on Hadoop big data platform and Kafka distributed message system, to implement business circle consumers scale analysis system. Moreover, we take Zhongshan Park business circle as an instance to verify the feasibility of our algorithm.
[1] YUAN J, ZHENG Y, XIE X. Discovering regions of different functions in a city using human mobility and pois[C]//Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012: 186-194.
[2] YUAN N J, ZHENG Y, XIE X, et al. Discovering urban functional zones using latent activity trajectories[J]. IEEE Transactions on Knowledge & Data Engineering, 2015, 27(3): 712-725.
[3] QI G, LI X, LI S, et al. Measuring social functions of city regions from large-scale taxi behaviors[C]//IEEE International Conference on Pervasive Computing and Communications Workshops. IEEE, 2011: 384-388.
[4] GODDARD J B. Functional regions within the city centre: A study by factor analysis of taxi flows in central London[J]. Transactions of the Institute of British Geographers, 1970, 49(49): 161-182.
[5] VATSAVAI R R, BRIGHT E, VARUN C, et al. Machine learning approaches for high-resolution urban land cover classification: A comparative study[C]//Proceedings of the 2nd International Conference on Computing for Geospatial Research & Applications. ACM, 2011: Article No 11.
[6] ANTIKAINEN J. The concept of functional urban area(Findings on the ESPON project 1.1.1)[J]. Informationen Zur Raumentwicklung, 2005, 7: 447-456.
[7] KARLSSON C. Clusters, functional regions and cluster policies[R/OL]. JIBS CESIS Electron, Working Paper Ser (84). [2016-06-01]. https://www.researchgate.net/publication/5094404.
[8] BIRANT D, KUT A. ST-DBSCAN: An algorithm for clustering spatial–temporal data[J]. Data & Knowledge Engineering, 2007, 60(1): 208-221.
[9] CHEN X C, FAGHMOUS J H, KHANDELWAL A. Clustering dynamic spatio-temporal patterns in the presence of noise and missing data[C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI 2015). 2015: 2575-2581.
[10] BIRANT D, KUT A. ST-DBSCAN: An algorithm for clustering spatial-temporal data[J]. Data & Knowledge Engineering, 2007, 60(1): 208-221.
[11] SLINK S R. An optimally efficient algorithm for the single-link cluster method[J]. The Computer Journal, 1973, 16(1): 30-34.
[12] ZHANG M L, ZHOU Z H. ML-kNN: A lazy learning approach to multi-label learning[J]. Pattern recognition, 2007, 40(7): 2038-2048.
[13] ZHANG H, BERG A C, MAIRE M, et al. SVM-KNN: Discriminative nearest neighbor classification for visual category recognition[C]//IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2006: 2126-2136.
[14] LI L, WEINBERG C R, DARDEN T A. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/kNN method[J]. Bioinformatics, 2001, 17(12): 1131-1142.
[15] 李秀娟. kNN分类算法研究[J].科技信息, 2009, 31: 81+383.
[16] WBITE T. O'Reilly: Hadoop权威指南[M]. 周敏奇, 王晓玲, 金澈清, 等, 译. 第2版. 北京: 清华大学出版社, 2011.
[17] 章志刚, 金澈清, 王晓玲, 等. 面向海量低质手机轨迹数据的重要位置发现[J]. 软件学报, 2016,7: 1700-1714.
[18] 吴松, 雒江涛, 周云峰, 等. 基于移动网络信令数据的实时人流量统计方法[J]. 计算机应用研究, 2014(3): 776-779.
[19] 沈泽, 吴松, 杨勇, 等. 移动通信网信令处理平台的实时人流量统计方法[J]. 广东通信技术, 2013, 8: 56-60.
[20] 肖江, 丁亮, 束鑫, 等. 一种基于计算机视觉的行人流量统计方法[J]. 信息技术, 2015, 8: 22-25.