J* E* C* N* U* N* S* ›› 2025, Vol. 2025 ›› Issue (1): 46-58.doi: 10.3969/j.issn.1000-5641.2025.01.004
• Computer Science • Previous Articles Next Articles
Yuhang CHEN, Shizhou WANG, Zhengting TANG, Liangyu CHEN, Ningkang JIANG*()
Received:
2023-10-31
Online:
2025-01-25
Published:
2025-01-20
Contact:
Ningkang JIANG
E-mail:nkjiang@sei.ecnu.edu.cn
CLC Number:
Yuhang CHEN, Shizhou WANG, Zhengting TANG, Liangyu CHEN, Ningkang JIANG. Research on software classification based on the fusion of code and descriptive text[J]. J* E* C* N* U* N* S*, 2025, 2025(1): 46-58.
Table 3
Results of the experimental group"
软件名称 | 软件类别 | 预测的类别 |
Lambspec | Assertion | Assertion, StringUtilities, redisClient |
OkapiBarcode | Barcode | Barcode, MachineLearning, Encryption |
Javassist | BytecodeLibs | BytecodeLibs, Reflection, ClasspathTools |
Jmemcached | CacheImps | CacheImps, Microbenchmark, ParserGens |
Zclasspath | ClasspathTools | ClasspathTools, VirtualFileSystem, Assertion |
Dasein | cloudComputing | cloudComputing, ORM, httpClients |
Jopt | CmdLineParsers | CmdLineParsers, Microbenchmark, ParserGens |
Snappy | compressLibs | compressLibs, HashingLibs, Encryption |
Jmdns | DNSLibs | SSH Library, cloudComputing, compressLibs |
Unirest | httpClients | httpClients, cloudComputing, CmdLineParsers |
Jersey | JSONLibs | JSONLibs, httpClients, cloudComputing |
Jsontoken | JWTLibs | JWTLibs, cloudComputing, DNSLibs |
OpenIMAJ | MachineLearning | MachineLearning, MathLibs, Barcode |
JTS | MathLibs | Barcode, ParserGens, MathLibs |
KoPeMe | Microbenchmark | Microbenchmark, CacheImps, cloudComputing |
MyBatis | ORM | ORM, cloudComputing, httpClients |
ToucanPdf | PDFLibs | Barcode, PDFLibs, MachineLearning |
Reb4j | RegexLibs | ParserGens, redisClient, StringUtilities |
TrueZip | VirtualFileSystem | VirtualFileSystem, CacheImps, httpClients |
Wasync | websocketClients | httpClients, cloudComputing, websocketClients |
Jsoup | html parser | html parser, ParserGens, httpClients |
JParsec | ParserGens | Assertion, ParserGens, ORM |
Redisson | redisClient | CacheImps, redisClient, Microbenchmark |
WildFly | Security | Security, Assertion, ORM |
SSHJ | SSH Library | SSH Library, httpClients, cloudComputing |
tomgibara | HashingLibs | JWTLibs, HashingLibs, UUIDGens |
Reflection-Util | Reflection | Reflection, BytecodeLibs, ORM |
Vt-Crypt | Encryption | Encryption, SSH Library, JWTLibs |
UUID-Creator | UUIDGens | UUIDGens, cloudComputing, JWTLibs |
Joda-Convert | StringUtilities | Reflection, ORM, JWTLibs |
Table 4
Prediction results of the software when the threshold is 0.3"
软件名称 | 软件类别 | 预测归属的类别 |
gpars | ActorFrameworks | 无 |
HdrHistogram | ApplicationMetrics | 无 |
jongo | MongoClient | 无 |
jdom2 | xmlProcess | html parser |
bobo | SearchEngines | 无 |
generex | RegularExpressionLibraries | 无 |
DeephacksCached | OffHeapLibraries | 无 |
commonmark | Markdown | html parser |
log4j | logging | CmdLineParsers |
jmxutils | JMXLibraries | 无 |
activeio | IOUtilities | 无 |
bitsy | GraphDatabases | 无 |
ftpserver | FTP | 无 |
fastexcel | ExcelLibraries | 无 |
activej | DependencyInjection | 无 |
CheckerQual | defectDetect | 无 |
dateutils | DateandTimeUtilities | 无 |
jansi | ConsoleUtilities | 无 |
awaitility | concurrent | 无 |
jcommander | CommandLineParsers | 无 |
1 | SHARMA A, THUNG F, KOCHHAR P S, et al. Cataloging github repositories [C]// Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 2017: 314-319. |
2 | WANG T, WANG H M, YIN G, et al.. Tag recommendation for open source software. Frontiers of Computer Science, 2014, 8 (1): 69- 82. |
3 | WANG Y, LIU H X, GAO S Q, et al. Categorizing npm packages by analyzing the text information in software repositories [C]// Proceedings of the 28th Asia-Pacific Software Engineering Conference (APSEC). IEEE, 2021: 53-60. |
4 | Al-KOFAHI J M, TAMRAWI A, NGUYEN T T, et al. Fuzzy set approach for automatic tagging in evolving software [C]// Proceedings of the 2010 IEEE International Conference on Software Maintenance. IEEE, 2010. DOI: 10.1109/ICSM.2010.5609751. |
5 | RADOSAVLJEVIC V, GRBOVIC M, DJURIC N, et al. Smartphone app categorization for interest targeting in advertising marketplace [C]// Proceedings of the 25th International Conference Companion on World Wide Web. Geneva: International World Wide Web Conferences Steering Committee, 2016: 93-94. |
6 | YUSOF Y, ALHERSH T, MAHMUDDIN M, et al. Classification of machine learning engines using latent semantic indexing [C]// Knowledge Management International Conference (KMLCe). Kedah Darul Aman, Malaysia: Universiti Utara Malaysia (UUM), 2012: 472-476. |
7 | 郑珏, 欧毓毅.. 基于卷积神经网络与多特征融合恶意代码分类方法. 计算机应用研究, 2022, 39 (1): 240- 244. |
8 | 轩勃娜, 李进.. 基于改进 CNN 的恶意软件分类方法. 电子学报, 2023, 51 (5): 1187- 1197. |
9 | 谷勇浩, 王翼翡, 刘威歆, 等.. 基于多重异质图的恶意软件相似性度量方法. 软件学报, 2023, 34 (7): 3188- 3205. |
10 | VARGAS-BALDRICH S, LINARES-VÁSQUEZ M, POSHYVANYK D. Automated tagging of software projects using bytecode and dependencies [C]// Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2015: 289-294. |
11 | YANG L, WANG L, HU Z G, et al. Automatic tagging for open source software by utilizing package dependency information [C]// Proceedings of the 2020 International Symposium on Theoretical Aspects of Software Engineering (TASE). IEEE, 2020: 137-144. |
12 | HAMEDNAI M R, KIM G, CHO S.. SimAndro: An effective method to compute similarity of Android applications. Soft Computing, 2019, 23, 7569- 7590. |
13 | LI M L, LU Q, LONG Y F. Representation learning of multiword expressions with compositionality constraint [C]// Knowledge Science, Engineering and Management, KSEM 2017, Lecture Notes in Computer Science, vol 10412. Cham: Springer, 2017: 507-519. |
14 | ALON U, ZILBERSTEIN M, LEVY O, et al.. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 2019, 3 (POPL): 40. |
15 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL].(2013-09-07)[2023-09-05]. https://doi.org/10.48550/arXiv.1301.3781. |
16 | COMPTON R, FRANK E, PATROS P, et al. Embedding Java classes with code2vec: Improvements from variable obfuscation [C]// Proceedings of the 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). IEEE, 2020: 243-253. |
17 | LI H, WANG T, PAN W F, et al.. Mining key classes in Java projects by examining a very small number of classes: A complex network-based approach. IEEE Access, 2021, 9, 28076- 28088. |
18 | 陶佩. 基于复杂网络的软件项目重要类识别研究[D]. 上海: 华东师范大学, 2022. |
19 | BRIN S, PAGE L.. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30 (1/2/3/4/5/6/7): 107- 117. |
20 | DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics (ACL), 2019: 4171-4186. |
21 | BROWN T, MANN B, RYDER N, et al.. Language models are few-shot learners. Advances in Neural Information Processing Systems, 2020, 33 (1): 1877- 1901. |
[1] | Jie CHEN, Wenyi SHEN, Wenyu WU, Jiali MAO. Method for improving the quality of trajectory data for riding-map inference [J]. Journal of East China Normal University(Natural Science), 2023, 2023(6): 14-27. |
[2] | Yiming YU, Yuchen HONG, Ye WANG, Qiwen DONG. Design of experimental data governance module for chemical material formulation [J]. Journal of East China Normal University(Natural Science), 2022, 2022(5): 1-13. |
[3] | Qing SUN, Guanyu LIANG, Yanjun WU, Bin WU, Chunqi TIAN, Wei WANG. Data-driven open source software supply chain maintenance risk analysis method [J]. Journal of East China Normal University(Natural Science), 2022, 2022(5): 90-99. |
[4] | Xin GONG, Lihua XU, Liang DOU, Ruixiang ZHAO. Redundancy measurement and reduction of automated tests in financial technology [J]. Journal of East China Normal University(Natural Science), 2022, 2022(4): 43-55. |
[5] | JI Yu, HE Yi-xuan, WU Guo-qun, WU Min. On evaluation of Bessel functions of the first kind via Prony-like methods [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(6): 42-60. |
[6] | ZHANG Tao, ZHANG Xiao-lei, LI Yu-ming, ZHANG Chun-xi, ZHANG Rong. Woodpecker+: Customized workload performance evaluation based on data characteristics [J]. Journal of East China Normal University(Natural Sc, 2019, 2019(5): 190-202. |
[7] | ZHANG Heng, CHEN Liang-yu. Optimization of the Levenshtein algorithm and its application in repeatability judgment for test bank [J]. Journal of East China Normal University(Natural Sc, 2018, 2018(5): 154-163. |
[8] | LI Jie-ying, LI Yu-ming, ZHANG Xiao-lei, ZHANG Rong. Woodpecker: Fine-grained contention simulation database testing framework [J]. Journal of East China Normal University(Natural Sc, 2018, 2018(2): 77-88. |
[9] | YANG Le, LIU Yin-ping, LI Zhi-bin. Emathema: An online automated computing platform for equations [J]. Journal of East China Normal University(Natural Sc, 2017, (3): 20-28. |
[10] | ZHAO Da-peng,LIANG Lei,TIAN Xiu-xia,WANG Xiao-ling. Privacy protection in locationbased services: Model and development [J]. Journal of East China Normal University(Natural Sc, 2015, 2015(5): 28-45. |
[11] | HAN Wen-wen;WANG Ling;CHEN You-guang . Segmentation Algorithm Based on Subpixel Image (Chinese) [J]. Journal of East China Normal University(Natural Sc, 2007, 2007(3): 100-106. |
[12] | . (Chinese) [J]. Journal of East China Normal University(Natural Sc, 2007, 2007(2): 122-125. |
[13] | . (Chinese) [J]. Journal of East China Normal University(Natural Sc, 2006, 2006(4): 137-140. |
[14] | WANG Yuan-fei;ZHOU Feng;LIU Zhi-qiang;MI Wei-jie;LU Tao;DING Jing-hong. Component GIS for Census in Pudong New Area(Chinese) [J]. Journal of East China Normal University(Natural Sc, 2006, 2006(2): 27-32. |
[15] | ZHANG Sheng-xi;ZHANG Wei;LI Guo-qiang;GU Guo-qing. Skew Detection for Form Document Using Vertex-chain-code [J]. Journal of East China Normal University(Natural Sc, 2004, 2004(3): 54-58. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||