青岛远洋船员职业学院学报

基于层级特征和DPCNN的文本数据治理方法

丁行硕，鞠通

青岛远洋船员职业学院数字信息中心，山东青岛 266427

关键词：数据治理;层级特征;BERT;DPCNN

Text Data Governance Method based on Hierarchical Feature and DPCNN

DING Xing—shuo，JU Tong

The Center of Data&Information, Qingdao Ocean Shipping Mariners College, Qingdao266427, China

Keywords：data governance; hierarchical characteristics; BERT; DPCNN.

DOI:

备注

摘要

全文

图/表

参考文献

大规模文本的数据划分是数据治理中的关键问题，而传统的中文文档建模方法容易忽视上下文语义关系和文档层级结构。针对以上问题提出一种基于层级特征和DPCNN的文本数据治理方法。该方法首先通过BERT模型抽取文本的层次特征信息，然后将结合全文信息的向量传入DPCNN模型中；经过金字塔型池化层后，最终通过全连接层进行数据划分。该方法能够有效提高特征稀疏文本数据的预测准确率。

The data division of large-scale text is a key problem in data governance, but the traditional Chinese document modeling method is easy to ignore the contextual semantic relationship and the hierarchical structure of the document. To solve the above problems, a text data governance method based on hierarchical characteristics and DPCNN is proposed. Firstly, the hierarchical feature information of text is extracted by BERT model. Then the vector combined with the full text information is passed into DPCNN model, after passing through the pyramid pooling layer; Finally, the data is divided through the full connection layer. This method can effectively improve the prediction accuracy of sparse feature text data.