Jsoup是一个非常好的解析网页的包,用java开发的,提供了类似DOM,CSS选择器的方式来查找和提取文档中的内容。

相关资料如下:

下载地址:[http://jsoup.org/download](http://jsoup.org/download)





中文文档资料:[http://www.open-open.com/jsoup/](http://www.open-open.com/jsoup/)





比较好的文档:[http://www.ostools.net/apidocs/apidoc?api=jsoup-1.6.3](http://www.ostools.net/apidocs/apidoc?api=jsoup-1.6.3)





 





今天做了一个Jsoup解析网站的项目,使用Jsoup.connect(url).get()连接某网站时偶尔会出现





java.net.SocketTimeoutException:Read timed out异常。





原因是默认的Socket的延时比较短,而有些网站的响应速度比较慢,





所以会发生超时的情况。







  解决方法:





  链接的时候设定超时时间即可。





  doc = Jsoup.connect(url).timeout(5000).get();





  5000表示延时时间设置为5s。





  

    测试代码如下:
  

  
  

    

      1,不设定timeout时:
    

    
    

      <div class="dp-highlighter bg_java" style="color: #362e2b;">
        <div class="bar">
          <div class="tools" style="color: silver;">
            **[java]** [view plain](http://blog.csdn.net/huangxy10/article/details/8188067#)[copy](http://blog.csdn.net/huangxy10/article/details/8188067#)[print](http://blog.csdn.net/huangxy10/article/details/8188067#)[?](http://blog.csdn.net/huangxy10/article/details/8188067#)

            
            <div>
            </div>
          </div>
        </div>
        
        
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">package</span> jsoupTest;  </span>
          
          - <span style="color: black;">  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> java.io.IOException;  </span>
          
          - <span style="color: black;">  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.*;  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.helper.Validate;  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Document;  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Element;  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.select.Elements;  </span>
          
          - <span style="color: black;">  </span>
          
          - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">class</span> JsoupTest {  </span>
          
          - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">static</span>  <span class="keyword" style="font-weight: bold; color: blue;">void</span> main(String[] args) <span class="keyword" style="font-weight: bold; color: blue;">throws</span> IOException{  </span>
          
          - <span style="color: black;">    String url = <span class="string" style="color: red;">&#8220;http://www.weather.com.cn/weather/101010400.shtml&#8221;</span>;  </span>
          
          - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">long</span> start = System.currentTimeMillis();  </span>
          
          - <span style="color: black;">    Document doc=<span class="keyword" style="font-weight: bold; color: blue;">null</span>;  </span>
          
          - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">try</span>{  </span>
          
          - <span style="color: black;">        doc = Jsoup.connect(url).get();  </span>
          
          - <span style="color: black;">    }  </span>
          
          - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">catch</span>(Exception e){  </span>
          
          - <span style="color: black;">        e.printStackTrace();  </span>
          
          - <span style="color: black;">    }  </span>
          
          - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">finally</span>{  </span>
          
          - <span style="color: black;">        System.out.println(<span class="string" style="color: red;">&#8220;Time is:&#8221;</span>+(System.currentTimeMillis()-start) + <span class="string" style="color: red;">&#8220;ms&#8221;</span>);  </span>
          
          - <span style="color: black;">    }  </span>
          
          - <span style="color: black;">    Elements elem = doc.getElementsByTag(<span class="string" style="color: red;">&#8220;Title&#8221;</span>);  </span>
          
          - <span style="color: black;">    System.out.println(<span class="string" style="color: red;">&#8220;Title is:&#8221;</span> +elem.text());  </span>
          
          - <span style="color: black;">    }     </span>
          
          - <span style="color: black;">}  </span>
          
        
      </div>
      
      

        **<span style="color: #362e2b;">有时发生超时:</span>
      

      
      

        

          <span style="color: #ff0000;">java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.ChunkedInputStream.fastRead(Unknown Source) at sun.net.www.http.ChunkedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at sun.net.www.protocol.http.HttpURLConnectionHttpInputStream.read(Unknown Source) at java.util.zip.InflaterInputStream.fill(Unknown Source) at java.util.zip.InflaterInputStream.read(Unknown Source) at java.util.zip.GZIPInputStream.read(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at org.jsoup.helper.DataUtil.readToByteBuffer(DataUtil.java:113) at org.jsoup.helper.HttpConnectionResponse.execute(HttpConnection.java:447) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:393) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:159) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:148) at jsoupTest.JsoupTest.main(JsoupTest.java:17) Time is:3885ms Exception in thread “main” java.lang.NullPointerException at jsoupTest.JsoupTest.main(JsoupTest.java:25)

            2,设定了则一般不会超时
          

          
          

            <div class="dp-highlighter bg_java" style="color: #362e2b;">
              <div class="bar">
                <div class="tools" style="color: silver;">
                  <b>[java]** [view plain](http://blog.csdn.net/huangxy10/article/details/8188067#)[copy](http://blog.csdn.net/huangxy10/article/details/8188067#)[print](http://blog.csdn.net/huangxy10/article/details/8188067#)[?](http://blog.csdn.net/huangxy10/article/details/8188067#)

                  
                  <div>
                  </div>
                </div>
              </div>
              
              
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">package</span> jsoupTest;  </span>
                
                - <span style="color: black;">  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> java.io.IOException;  </span>
                
                - <span style="color: black;">  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.*;  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.helper.Validate;  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Document;  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Element;  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.select.Elements;  </span>
                
                - <span style="color: black;">  </span>
                
                - <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">class</span> JsoupTest {  </span>
                
                - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">static</span>  <span class="keyword" style="font-weight: bold; color: blue;">void</span> main(String[] args) <span class="keyword" style="font-weight: bold; color: blue;">throws</span> IOException{  </span>
                
                - <span style="color: black;">    String url = <span class="string" style="color: red;">&#8220;http://www.weather.com.cn/weather/101010400.shtml&#8221;</span>;  </span>
                
                - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">long</span> start = System.currentTimeMillis();  </span>
                
                - <span style="color: black;">    Document doc=<span class="keyword" style="font-weight: bold; color: blue;">null</span>;  </span>
                
                - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">try</span>{  </span>
                
                - <span style="color: black;">        doc = Jsoup.connect(url).timeout(<span class="number" style="color: #c00000;">5000</span>).get();  </span>
                
                - <span style="color: black;">    }  </span>
                
                - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">catch</span>(Exception e){  </span>
                
                - <span style="color: black;">        e.printStackTrace();  </span>
                
                - <span style="color: black;">    }  </span>
                
                - <span style="color: black;">    <span class="keyword" style="font-weight: bold; color: blue;">finally</span>{  </span>
                
                - <span style="color: black;">        System.out.println(<span class="string" style="color: red;">&#8220;Time is:&#8221;</span>+(System.currentTimeMillis()-start) + <span class="string" style="color: red;">&#8220;ms&#8221;</span>);  </span>
                
                - <span style="color: black;">    }  </span>
                
                - <span style="color: black;">    Elements elem = doc.getElementsByTag(<span class="string" style="color: red;">&#8220;Title&#8221;</span>);  </span>
                
                - <span style="color: black;">    System.out.println(<span class="string" style="color: red;">&#8220;Title is:&#8221;</span> +elem.text());  </span>
                
                - <span style="color: black;">    }     </span>
                
                - <span style="color: black;">}  </span>
                
              
            </div>
            
            

              <br style="color: #362e2b;" /><span style="color: #362e2b;">输出为:</span>
            

            
            

              

                Time is:4158ms

Title is:顺义天气预报-今日_明日_一周天气预报:16日星期五  多云转晴  11/-4℃

                  转自http://blog.csdn.net/huangxy10/article/details/8188067