星期三, 2015-05-27 16:10 — adminshiping1

一个简单的例子教你如何用DRUPAL 采集

主标签

星期六, 08/11/2012 - 23:11 — 月亮不孤主

版本： DRUPAL 7

要采集的采集的页面：http://drupalgarden.cn/forum/1202.html

代码：～

<?php
function create_node($title,$uid,$body,$type){
 
$node->is_new=1;
  $node->title = $title;

  $node->uid =$uid;
$node->type=$type;
  $node->body['und'][0]['value']= $body;
 node_save($node);
 print $node->nid;
}
 

function _get_contents($url,$img = FALSE){
   $dir = pathinfo($url);
 $host = $dir['dirname'];
  
      $refer = $host.'/'; 
  
      
       
    $ch = curl_init();
    $user_agent = "Baiduspider+(+http://www.baidu.com/search/spider.htm)"; //伪装百度蜘蛛
    
    curl_setopt ($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_HEADER, 0);

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt ($ch, CURLOPT_REFERER,  $refer);
    curl_setopt ($ch, CURLOPT_TIMEOUT, 800);
    curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
    


    $file_contents = curl_exec($ch);
     if($img == TRUE){
    $content_length = curl_getinfo($ch,CURLINFO_CONTENT_LENGTH_DOWNLOAD);
    $size = round($content_length / 1024, 2);

     return $size;
     }else{
    curl_close($ch);
    return $file_contents;
   }
   

}

$url="http://drupalgarden.cn/forum/1202.html";
$content=_get_contents($url,$img = FALSE);
preg_match_all('/div class="forum-post-content">(.*?)<\/div>/is',$content, $a);



preg_match_all('/<h2 class="with-tabs">(.*?)<\/h2>/is',$content, $t);

$body=$a[1][0];
$title=$t[1][0];
$type='article';
$uid=1;




 create_node($title,$uid,$body,$type);


?>

你可以继续根据nid 弄个foreach 或者FOR 循环，你可以把龙马的论坛在5分钟内全部采集过来,龙马可别揍我：）

来自 http://drupalchina.cn/forum/1204.html

普通分类:

drupal

You are here

一个简单的例子教你如何用DRUPAL 采集

一个简单的例子教你如何用DRUPAL 采集

主标签

友情链接

搜索表单

用户登录

You are here

一个简单的例子教你如何用DRUPAL 采集

一个简单的例子教你如何用DRUPAL 采集

主标签

友情链接