Jsoup是一个非常好的解析网页的包,用java开发的,提供了类似DOM,CSS选择器的方式来查找和提取文档中的内容。
相关资料如下:
下载地址:[http://jsoup.org/download](http://jsoup.org/download)
中文文档资料:[http://www.open-open.com/jsoup/](http://www.open-open.com/jsoup/)
比较好的文档:[http://www.ostools.net/apidocs/apidoc?api=jsoup-1.6.3](http://www.ostools.net/apidocs/apidoc?api=jsoup-1.6.3)
今天做了一个Jsoup解析网站的项目,使用Jsoup.connect(url).get()连接某网站时偶尔会出现
java.net.SocketTimeoutException:Read timed out异常。
原因是默认的Socket的延时比较短,而有些网站的响应速度比较慢,
所以会发生超时的情况。
解决方法:
链接的时候设定超时时间即可。
doc = Jsoup.connect(url).timeout(5000).get();
5000表示延时时间设置为5s。
测试代码如下:
1,不设定timeout时:
<div class="dp-highlighter bg_java" style="color: #362e2b;">
<div class="bar">
<div class="tools" style="color: silver;">
**[java]** [view plain](http://blog.csdn.net/huangxy10/article/details/8188067#)[copy](http://blog.csdn.net/huangxy10/article/details/8188067#)[print](http://blog.csdn.net/huangxy10/article/details/8188067#)[?](http://blog.csdn.net/huangxy10/article/details/8188067#)
<div>
</div>
</div>
</div>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">package</span> jsoupTest; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> java.io.IOException; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.*; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.helper.Validate; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Document; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Element; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.select.Elements; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">class</span> JsoupTest { </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">static</span> <span class="keyword" style="font-weight: bold; color: blue;">void</span> main(String[] args) <span class="keyword" style="font-weight: bold; color: blue;">throws</span> IOException{ </span>
- <span style="color: black;"> String url = <span class="string" style="color: red;">“http://www.weather.com.cn/weather/101010400.shtml”</span>; </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">long</span> start = System.currentTimeMillis(); </span>
- <span style="color: black;"> Document doc=<span class="keyword" style="font-weight: bold; color: blue;">null</span>; </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">try</span>{ </span>
- <span style="color: black;"> doc = Jsoup.connect(url).get(); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">catch</span>(Exception e){ </span>
- <span style="color: black;"> e.printStackTrace(); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">finally</span>{ </span>
- <span style="color: black;"> System.out.println(<span class="string" style="color: red;">“Time is:”</span>+(System.currentTimeMillis()-start) + <span class="string" style="color: red;">“ms”</span>); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> Elements elem = doc.getElementsByTag(<span class="string" style="color: red;">“Title”</span>); </span>
- <span style="color: black;"> System.out.println(<span class="string" style="color: red;">“Title is:”</span> +elem.text()); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;">} </span>
</div>
**<span style="color: #362e2b;">有时发生超时:</span>
<span style="color: #ff0000;">java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(Unknown Source) at java.net.SocketInputStream.read(Unknown Source) at java.io.BufferedInputStream.fill(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at sun.net.www.http.ChunkedInputStream.fastRead(Unknown Source) at sun.net.www.http.ChunkedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at sun.net.www.protocol.http.HttpURLConnectionHttpInputStream.read(Unknown Source) at java.util.zip.InflaterInputStream.fill(Unknown Source) at java.util.zip.InflaterInputStream.read(Unknown Source) at java.util.zip.GZIPInputStream.read(Unknown Source) at java.io.BufferedInputStream.read1(Unknown Source) at java.io.BufferedInputStream.read(Unknown Source) at java.io.FilterInputStream.read(Unknown Source) at org.jsoup.helper.DataUtil.readToByteBuffer(DataUtil.java:113) at org.jsoup.helper.HttpConnectionResponse.execute(HttpConnection.java:447) at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:393) at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:159) at org.jsoup.helper.HttpConnection.get(HttpConnection.java:148) at jsoupTest.JsoupTest.main(JsoupTest.java:17) Time is:3885ms Exception in thread “main” java.lang.NullPointerException at jsoupTest.JsoupTest.main(JsoupTest.java:25)
2,设定了则一般不会超时
<div class="dp-highlighter bg_java" style="color: #362e2b;">
<div class="bar">
<div class="tools" style="color: silver;">
<b>[java]** [view plain](http://blog.csdn.net/huangxy10/article/details/8188067#)[copy](http://blog.csdn.net/huangxy10/article/details/8188067#)[print](http://blog.csdn.net/huangxy10/article/details/8188067#)[?](http://blog.csdn.net/huangxy10/article/details/8188067#)
<div>
</div>
</div>
</div>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">package</span> jsoupTest; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> java.io.IOException; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.*; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.helper.Validate; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Document; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.nodes.Element; </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">import</span> org.jsoup.select.Elements; </span>
- <span style="color: black;"> </span>
- <span style="color: black;"><span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">class</span> JsoupTest { </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">public</span> <span class="keyword" style="font-weight: bold; color: blue;">static</span> <span class="keyword" style="font-weight: bold; color: blue;">void</span> main(String[] args) <span class="keyword" style="font-weight: bold; color: blue;">throws</span> IOException{ </span>
- <span style="color: black;"> String url = <span class="string" style="color: red;">“http://www.weather.com.cn/weather/101010400.shtml”</span>; </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">long</span> start = System.currentTimeMillis(); </span>
- <span style="color: black;"> Document doc=<span class="keyword" style="font-weight: bold; color: blue;">null</span>; </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">try</span>{ </span>
- <span style="color: black;"> doc = Jsoup.connect(url).timeout(<span class="number" style="color: #c00000;">5000</span>).get(); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">catch</span>(Exception e){ </span>
- <span style="color: black;"> e.printStackTrace(); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> <span class="keyword" style="font-weight: bold; color: blue;">finally</span>{ </span>
- <span style="color: black;"> System.out.println(<span class="string" style="color: red;">“Time is:”</span>+(System.currentTimeMillis()-start) + <span class="string" style="color: red;">“ms”</span>); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;"> Elements elem = doc.getElementsByTag(<span class="string" style="color: red;">“Title”</span>); </span>
- <span style="color: black;"> System.out.println(<span class="string" style="color: red;">“Title is:”</span> +elem.text()); </span>
- <span style="color: black;"> } </span>
- <span style="color: black;">} </span>
</div>
<br style="color: #362e2b;" /><span style="color: #362e2b;">输出为:</span>
Time is:4158ms
Title is:顺义天气预报-今日_明日_一周天气预报:16日星期五 多云转晴 11/-4℃
转自http://blog.csdn.net/huangxy10/article/details/8188067
💬 评论