检测是否是puppeteer访问以及如何破解

Alan

Alan

Maintainer of blog

有几个bilibili视频想下载下来, 网上找了个工具哔哩哔哩视频解析下载, 但是不支持多个视频下载, 为了快速下载, 打算利用puppeteer自动化下载.

问题#

简单写了一下puppeteer下载代码但是一直下载失败, 网页提示请使用推荐浏览器下载的错误信息, 排查了下代码,发现哔哩哔哩视频解析下载通过判断navigator.webdriver变量, 检测是否是自动化工具访问, 主要检测代码如下:

function u(t, e) {
if (!0 === window.navigator.webdriver || window.document.documentElement.getAttribute('webdriver') || window.callPhantom || window._phantom) return md5(o + t + o);
var n = e.charAt(t.charCodeAt(0) % e.length),
r = e.charAt(t.charCodeAt(t.length - 1) % e.length);
return md5(n + t + r)
}

问题就出在默认情况下, 使用puppeteer等UI自动化工具访问网页时, navigator.webdriver变量为true, 然后这个下载网站检测到这个变量为true时就使用特殊的md5值, 服务器端检测到这个值就返回给前端提示请使用推荐的浏览器.

解决#

于是网上搜了一下怎么破解, can't set webdriver #6725, 其中vabaly提到以下解决方案:

You can try to add argument --disable-blink-features=AutomationControlled when launch the browser:

const browser = await launch({
// ...
ignoreDefaultArgs: ['--enable-automation'],
args: ['--disable-blink-features=AutomationControlled'],
// ...
})

示例代码#

puppeteer#

下面展示一下示例代码:

src/main.ts
import puppeteer from "puppeteer";
import axios, { AxiosResponse } from "axios";
import { Stream } from "stream";
import fs from "fs";
let start: number = 88;
const end: number = 172;
const baseLink = "https://www.bilibili.com/video/BV1sQ4y1X7o8?p=";
async function download() {
const browser = await puppeteer.launch({
headless: false,
ignoreDefaultArgs: ['--enable-automation'],
args: ['--disable-blink-features=AutomationControlled'],
});
const page = await browser.newPage();
page.on("response", async (response) => {
const req = response.request();
const url: string = req.url();
const method = req.method();
if (method === "POST" && url.includes("/video/web/bilibili")) {
const body: {
data: {
text: string,
video: string
}
} = await response.json();
try {
const fileName = `p${start}-${body.data.text}`;
print(`download url: ${body.data.video} -> ${fileName}`);
await saveAs(body.data.video, fileName);
await sleep(10);
start++;
await access(page)
} catch (err) {
console.log(body)
console.warn(err)
}
}
})
await page.goto("https://bilibili.iiilab.com/");
access(page);
}
async function access(page: puppeteer.Page) {
if (start > end) {
// 下载结束
process.exit();
}
const url = baseLink + start;
page.reload();
await page.waitForTimeout(5000);
page.type("#app input.link-input", url);
await page.waitForTimeout(5000);
page.click("#app button.btn-default");
}
function sleep(seconds: number): Promise<void> {
return new Promise(resolve => {
setTimeout(() => {
resolve();
}, seconds * 1000);
});
}
async function saveAs(url: string, fileName: string): Promise<void> {
return new Promise(resolve => {
print(`${fileName} download start`);
axios({
method: "GET",
url: url,
responseType: "stream"
}).then((response: AxiosResponse<Stream>) => {
response.data.pipe(fs.createWriteStream(`./${fileName}.mp4`))
.on("finish", () => {
resolve();
print(`${fileName} download finish`);
});
});
});
}
function print(text: string) {
console.log(`[${new Date().toLocaleString()}] ${text}`)
}
download(); // 开始执行下载
package.json
{
"dependencies": {
"@msgpack/msgpack": "^2.4.0",
"axios": "^0.22.0",
"puppeteer": "^8.0.0",
"tslib": "^2.1.0"
},
"devDependencies": {
"@types/axios": "^0.14.0",
"@types/node": "^16.10.3",
"ts-node": "^9.1.1",
"typescript": "^4.2.2"
}
}
tsconfig.json
{
"compilerOptions": {
"module": "commonjs",
"declaration": false,
"noImplicitAny": true,
"skipLibCheck": true,
"noUnusedLocals": false,
"importHelpers": true,
"removeComments": false,
"emitDecoratorMetadata": true,
"experimentalDecorators": true,
"target": "ES2017",
"sourceMap": true,
"allowJs": false,
"esModuleInterop": true,
"moduleResolution": "Node",
"baseUrl": "./",
"resolveJsonModule": true,
"outDir": "dist/",
"typeRoots": [
"./node_modules/*",
]
},
"include": [
"src/**/*.ts",
],
"exclude": [
"node_modules"
]
}

Cypress#

使用Cypress可能更方便一些:

cypress/plugins/index.js
module.exports = (on, config) => {
// `on` is used to hook into various events Cypress emits
// `config` is the resolved Cypress config
on('before:browser:launch', (browser = {}, launchOptions) => {
// `args` is an array of all the arguments that will
// be passed to browsers when it launches
console.log(launchOptions.args) // print all current args
if (browser.family === 'chromium' && browser.name !== 'electron') {
// auto open devtools
launchOptions.args.push('--auto-open-devtools-for-tabs')
launchOptions.args.push("--disable-blink-features=AutomationControlled")
}
// whatever you return here becomes the launchOptions
return launchOptions
})
}

然后就是写用例下载需要的视频就是, 代码类似上面的 puppeteer