修改采集节点

修改采集节点

网址索引

网页内容获取规则

１、匹配规则：在匹配区域规则中，规则一般为“起始无重复HTML[内容]结尾无重复HTML”(普通匹配，非正则)。２、字段值：如果指定的字段没有指定区域匹配规则，用这个值作为默认值。３、过滤规则：如果有多个规则，用 {dede:trim replace=""}规则一{/dede:trim} {dede:trim replace=""}规则二{/dede:trim} ...表示，如果要替换成指定的值，在 replace=""里设置即可
预览网址：
内容分页导航所在的区域匹配规则：	<?php echo trim($sppage->GetInnerText()); ?>	GetAtt('sptype')=='full') echo " checked='1'"; ?>/> 全部列出的分页列表 GetAtt('sptype')=='next') echo " checked='1'"; ?>/> 上下页形式或不完整的分页列表
以下为固定的采集项目：(项目点击可展开/隐藏，内容摘要、关键字、缩略图系统会用正则进行自动匹配)

关键字过滤内容：

摘要过滤内容：

文章标题

匹配规则：	<?php echo $notes['title']['match']; ?>
过滤规则：	<?php echo $notes['title']['trim']; ?>

文章作者

匹配规则：	<?php echo $notes['writer']['match']; ?>
过滤规则：	<?php echo $notes['writer']['trim']; ?>

文章来源

匹配规则：	<?php echo $notes['source']['match']; ?>
过滤规则：	<?php echo $notes['source']['trim']; ?>

发布时间

匹配规则：	<?php echo $notes['pubdate']['match']; ?>
过滤规则：	<?php echo $notes['pubdate']['trim']; ?>

以下是针对模型设置的采集项目：

GetOne("Select * From `#@__channeltype` where id='{$channelid}' "); $dtp = new DedeTagParse(); $dtp->SetNameSpace('field','<','>'); $dtp->LoadString($row['fieldset']); foreach($dtp->CTags as $ctag) { //采集禁用的字段 $notsend = $ctag->GetAtt('notsend'); if($notsend==1) continue; $fieldtype = $ctag->GetAtt('type'); $tname = $ctag->GetTagName(); $iname = $ctag->GetAtt('itemname'); if(isset($notes[$tname]['item'])) { $tvalue = $notes[$tname]['item']->GetAtt('value'); $tisunit = $notes[$tname]['item']->GetAtt('isunit'); $tisdown = $notes[$tname]['item']->GetAtt('isdown'); $tmatch = $notes[$tname]['match']; $ttrim = $notes[$tname]['trim']; $tfunction = $notes[$tname]['function']; }else { $tvalue = $tisunit = $tisdown = $tmatch = $ttrim = $tfunction = ''; } ?>

字段值：

匹配规则：	<?php echo $tmatch; ?>	/> 分页内容字段（规则中只允许单一的该类型字段） /> 下载字段里的多媒体资源
过滤规则：	<?php echo $ttrim; ?>
自定义处理接口：	<?php echo $tfunction; ?>	函数或程序的变量 @body 表示原始网页 @litpic 缩略图 @me 表示当前标记值和最终结果