tangxiaoguodong-CSDN博客

原创 [10]个人学习python：设置代理IP，学习全局代理与临时代理

代码如下：#临时代理# -*- coding: utf-8 -*import urllib.request,randomproxy_iplist=['122.114.31.177:808','61.135.217.7:80']proxy_ip=random.choice(proxy_iplist) #随机选择一个代理url=('http://www...

2018-06-05 22:27:53 4084

原创 [9]个人学习python：爬豆瓣电影，处理‘加载更多’动态页面

先分析：1、网址是https://movie.douban.com/tag/#/，Network-XHR，刷新并筛选，找到第一条，然后点击页面'加载更多'，找到第二条，以此类推；点击找到的动态信息，找到General-Request URL，就是动态的实际地址了： https://movie.douban.com/j/new_search_subjects?sort=T&rang...

2018-05-29 22:05:59 1500

原创 [8]个人学习python：爬取豆瓣首页所有图片

代码如下：# -*- coding: utf-8 -*import urllib.request,socket,re,sys,ossavepath=r'C:\\Users\\Administrator\\PycharmProjects\\untitled\\venv1\\image\\'def saveimage(url): if not os.path.isdir(savepath...

2018-05-28 22:16:17 386

原创 [7]个人学习python：爬取百度贴吧图片，并保存到本地目录

代码如下：# -*- coding: utf-8 -*import urllib.request,reurl='http://tieba.baidu.com/p/5665019988/'page=urllib.request.urlopen(url,timeout = 2)html=page.read()html=html.decode('utf-8') ...

2018-05-28 21:50:38 521

原创 [6]个人学习python：爬取CSDN的Oracle论坛，并保存到本地txt文档

代码如下：# -*- coding: utf-8 -*import urllib.request,socket,re,sys,os,time,requestsfrom lxml import etreewith open(r'C:\Users\admin\Desktop\practice_csdn.txt','w') as f: for p in range(1,3): ...

2018-05-27 23:03:23 234

原创 [5]个人学习python：伪装成浏览器

代码如下：# -*- coding: utf-8 -*import urllib.request,requests,io,sysdef save(data,filename,flag): path=r'C:\Users\admin\Desktop\{}.txt'.format(filename) if flag=='wb': f=open(path,mode='...

2018-05-27 21:26:45 542

原创 [4]个人学习python：豆瓣图书信息，处理‘下一页’

111111111

2018-05-26 22:45:50 121

原创 [0]个人学习python：前期准备工作

1、去官网下载Python3.X版本：（1）网址为：https://www.python.org/downloads/windows/，建议选择3.X版本。（2）64位电脑选择Download Windows x86-64 executable installer，32位电脑选择Download Windows x86 executable installer，exec...

2018-05-26 21:09:59 1390

原创 [3]个人学习python：爬取豆瓣单本图书，深入学习取数

代码如下：# -*- coding: utf-8 -*import requests,timefrom lxml import etreeurl='https://book.douban.com/top250'html=requests.get(url).texts=etree.HTML(html)title1=s.xpath('//*[@id="content"]/div/div[1...

2018-05-23 18:28:23 163

原创 [2]个人学习python：爬取豆瓣单个电影

代码如下：# -*- coding: utf-8 -*import requests,timefrom lxml import etreeurl='https://movie.douban.com/subject/1849031/?from=subject-page'html=requests.get(url).text #此处获取html网页代...

2018-05-23 10:11:30 485

原创 [1]个人学习python：爬取百度搜索网页

import requestsurl = 'https://www.baidu.com/'data = requests.get(url)data.encoding='utf-8'print('http请求的状态：',data.status_code)print('http的文本内容：',data.text)print('http的编码方式：',data.encoding)...

2018-05-21 08:49:08 350

糖小果冻的博客