研究兴趣与项目 - Research

 


主要研究兴趣

 

(Web)数据挖掘;Web异常模式发现;Web欺诈检测;大数据管理与智能分析 (NoSQL, NewSQL, 数据仓库技术);Web Services技术

当前的研究工作

 

任务名称 研究内容描述

垃圾网页检测

      垃圾网页通过影响搜索引擎算法以提升某些网页的检索结果排名。全世界10%-15%的网页曾被暗链和垃圾文本污染。各类垃圾网页攻击手段包括,在常用搜索引擎的搜索结果链接或文本中暗藏广告或恶意信息,采用伪装技术或自动转址到不良网页,以期实现非法宣传和获利,甚至达到损害公信力的攻击目的。

      本项目通过分析网页欺诈的攻击目的和欺诈手段,建立垃圾网页特征模型。在该模型基础上,研究基于数据挖掘技术的高性能检测算法,识别并过滤垃圾网页。

Web欺诈挖掘

      随着World Wide Web上的信息愈加丰富,Web应用愈加广泛,各类欺诈攻击手段也愈加猖狂。新出现了许多欺诈类型,例如社交网络上的欺诈,多媒体方式欺诈,点击欺诈等,主要目的仍然是从经济上非法获利。

      本项目目前从几个方面进行研究:发现微博垃圾用户;检测图像化的欺诈;挖掘产品评价中的欺诈内容;检测点击欺诈。 研究内容包括研究欺诈现象和机理,在此基础上提取具有代表性的特征,研究基于数据挖掘技术的高性能检测算法。

面向互联网+的产品数字化设计与制造服务

      随着我国制造2025规划推行,互联网+技术逐渐带动行业进行深入的信息化与工业化融合,行业研发设计的服务方式和方法将进行结构型转变,研究新的信息化模型下产品研发设计的服务模式、工具和平台变得至关重要。

      本项目开展如下研究:
  • 以移动互联网为依托,以智能手机、笔记本电脑、个人终端为基础,研究基于开放式创新的产品设计模式,建立面向产品研发的开放式众创设计平台;
  • 基于云计算的体验式虚拟设计技术研究,搭建出基于众创思想的开放式创新服务平台;
  • 选择典型的产品研发设计服务流程,在平台上进行示范应用,并给出示范的案例和流程,供其他研发设计对象或行业服务。
  • Web资源质量评价机制研究

          极大丰富的Web资源给信息获取和决策带来了空前的便利。然而,Web资源具有开放、自主、动态、类型多样、质量要求不一等特点,造成了Web资源内容质量的参差不齐,特别是Wiki类型的Web内容编写方式的出现,更是给其质量评测和管理工作带来了巨大挑战,已有的技术方法存在着较大的局限性,许多关键科学问题亟待深入研究并加以解决。

          本项目通过剖析Web资源质量问题,研究Web资源质量隐患发现技术,改进扩展多属性决策算法,将相关技术加以有机组合,以实现Web资源质量综合评价。通过分析评价算法的鲁棒性,最大程度地提高Web资源质量评测和遴选的合理性、准确性、客观性和自动化程度。

    高速列车大数据体系构建

          针对高速列车数字化仿真平台中数据的异质异构性,各子系统间数据交换量大,以及数据访问效率和跨平台性等特点,本项目将开发同质化、可共享、支持专业仿真和协同仿真、支持多维多层次信息可视化展示的数据体系。实现多源异构数据转换及统一数据访问接口方案,构建多层次松耦合的数据管理机制,实现多专业数据访问的透明性和数据按需多层次融合。

     

    近年完成的研究项目

         * 高速铁路电务监测维护技术研究(铁路 集团总公司重大科技项目)

         * 高速铁路信号监测维护技术研究(铁路集团总公司重大科技项目)

         * 交通领域多源异构数据智慧融合关键技术研究(四川省交通厅项目)

         * 铁路智能运输系统构架及数据接口标准研究(铁道部科技研究开发重大项目子项)


    Research Interests

    Web data mining;Web anomaly(spam) detection;Networking information security;Big data management and intelligent analysis; Web services

    Current Projects


    Project Description
    Web spam detection

    Web spam attempts to influence search engine ranking algorithm in order to boost the rankings of specific web pages in search engine results.10%-15% Web pages were deliberately contaminated by hidden links and objectionable content.Various spamming tricks including advertisement or objectionable content injection, hidden links attack,cloaking and redirection, etc. The objectives of Web spamming are to gain more benefits or to attack. This project will tackle the various challengeable problems of Web spam by modeling junk Web pages, extracting the spam features, analyzing the spammed content and URL,designing and improving the malicious Web page detection approaches.

    Web Fraud Mining

    As Web information and applications are becoming increasingly rich and wide, lots of fraud onslaughts attack rampantly. New fraud and spam types appear, such as social networking fraud, multimedia spam, click fraud/spam, which main purpose is still to profit illegally from the cheating.

    The research is currently conducted on several objects: Twitter spammer discovery; Multimedia spam detection; Comment spam mining; Click fraud detection. Our focuses include spamming tricks and mechanism investigation, discriminative features extraction, high-performance detection algorithms development, etc.

    Web source quality assessment mechanism

    The extremely rich Web resources make the information acquisition and decision making very much easily. However,the Web source quality is very problematic due to the peculiar characteristics of the Web, such as, dynamics and autonomy of Web sources, enormous amount and various types of Web data, multifarious quality requirements of Web applications, etc. These result in uneven and uncertain information quality and inferior Web-based planning and strategy making. With the popularization of Wiki sites, the Web source quality becomes increasing challenge.

    In this multistage project, we have proposed a Web quality model - WebQM for capturing the Web quality features from 3 dimensions.The feasibility and effectiveness of WebQM has been verified by SEM with actually observed data.We have developed the evaluation approaches under fuzzy environments based on WebQM and implemented a prototype of Web quality fuzzy assessment system,where the sensitivity analysis of the evaluation approach was carried out. Our current work is to model the quality problem of Wiki sources and modify WebQM and the evaluation mechanism for assessing the content quality of Wiki sites.

    Product digitized design and manufacture services based on Internet+

    With the implementation of China's 2025 manufacturing plan, Internet plus technology has led to the deeply integration of industry and information. The conventional patterns and methods for product design and manufacture will be changed. It is necessary and very important to study and develope new model, tools and platforms for adapting such a transformation.

    This project was carried out as follows:

  • Based on mobile Internet, a product design mode with crowd-creating is investigated, and a product crowd-design platform will be built in mobile phone, notebook computer, and personal computer, etc.;
  • A crowd-innovation service platform is established with the combination of virtual design technology and cloud computing;
  • A typical product design and development process as an examples of the digitized design pattern and service will be excuted on the platform for the demonstration.

  • High-speed rail big data management system

    This project is key part of the digital simulation platform of high-speed rail. A lot of problems must be dealt with for building the platform, which includes heterogeneous and multi-structured high-speed rail data,a huge amount of data exchange among subsystems,the efficiency and system-independency of data access. This project will develop a data management system for solving the issues above and for supporting the branch- and coupling-simulation,as well as supporting the multi-dimensional and multi-level visualization.Specific approaches are designed for multi-source data ETL, loose-coupling data management,and multi-branch data access and fusion.

     

    Copyright © Yan Zhu. Last updated: February 2017