下载 Oneindex 网站的整个文件夹

最近在别人的 Oneindex 下东西,懒得一个一个点于是就写了个脚本准备一次性下。

成品:github.com/memset0/oneindex-folder-spider

原理其实是很简单的,对于当前这个页面遍历每一个连接:如果是文件就下载,如果是文件夹就递归即可。
下载出来的文件树会和 Oneindex 上的一样。

如果使用的话需要安装 Python3 & requests & wget 。

代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import requests, re, os, urllib

def mkdir(path):
# print('mkdir', path)
if not os.path.exists(path):
os.makedirs(path)

def get(url, path):
print('get', urllib.parse.unquote(url), urllib.parse.unquote(path))
if url[-1] == '/':
mkdir(urllib.parse.unquote(path))
length = len(re.split(r'://[\S]*?/', url)[-1]) + 1
content = requests.get(url).text
children = re.findall(r'<li class="mdui-list-item mdui-ripple">\n <a href="[\s\S]*?">', content)
folders = [children[i][52 + length:-2] for i in range(1, len(children))]
children = re.findall(r'<li class="mdui-list-item file mdui-ripple">\n <a href="[\s\S]*?" target="_blank">', content)
files = [children[i][57 + length:-18] for i in range(0, len(children))]
for child in files + folders:
get(url + child, path + child)
else:
os.system('wget {url} -O {path}'.format(url=url, path=urllib.parse.unquote(path)))

if __name__ == '__main__':
url = 'https://drive.bakaawt.com/Videos/%E6%B4%9B%E8%B0%B7%E6%98%A5%E5%AD%A3%E5%9B%9E%E6%94%BE/TG/%E5%A4%8D%E8%B5%9B/'
path = os.getcwd() + '/down/'
get(url, path)
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×