Cendertron, Sliding Authentication Code Bypass Strategy for Dynamic Crawlers
stay Cendertron In the Safe Dynamic Crawler series, we introduce the design of the safe crawler and the cluster building of the crawler in turn. In this article, we discuss the bypass strategies for sliding authentication codes.
The strategies and codes used in this article come from How to bypass "slider CAPTCHA" with JS and Puppeteer One article.
Bypass of sliding validation in Crawlers
Validation is one of the common anti-crawling strategies, and in many sites today we introduce sliding validation to verify the authenticity of visitors.For example, the following famous jQuery sliding plug-in:
In simulated landing, we often need to bypass such sliding validation, and dynamic crawlers based on Puppeteer facilitate it; often we need to do the following steps: move to the middle of the slider bar, press the mouse, move the mouse, release the mouse.
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch({ headless: false, defaultViewport: { width: 1366, height: 768 } }); const page = await browser.newPage(); await page.goto('http://kthornbloom.com/slidetosubmit/'); await page.type('input[name="name"]', 'Puppeteer Bot'); await page.type('input[name="email"]', 'js@automation.com'); let sliderElement = await page.$('.slide-submit'); let slider = await sliderElement.boundingBox(); let sliderHandle = await page.$('.slide-submit-thumb'); let handle = await sliderHandle.boundingBox(); await page.mouse.move( handle.x + handle.width / 2, handle.y + handle.height / 2 ); await page.mouse.down(); await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, { steps: 10 }); await page.mouse.up(); await page.waitFor(3000); // success! await browser.close(); } run();
In the actual case, we can take the registration interface of Taobao as an example:
const puppeteer = require('puppeteer'); async function run() { const browser = await puppeteer.launch({ headless: false, defaultViewport: { width: 1366, height: 768 } }); const page = await browser.newPage(); await page.evaluateOnNewDocument(() => { Object.defineProperty(navigator, 'webdriver', { get: () => false }); }); await page.goto('https://world.taobao.com/markets/all/sea/register'); let frame = page.frames()[1]; await frame.waitForSelector('.nc_iconfont.btn_slide'); const sliderElement = await frame.$('.slidetounlock'); const slider = await sliderElement.boundingBox(); const sliderHandle = await frame.$('.nc_iconfont.btn_slide'); const handle = await sliderHandle.boundingBox(); await page.mouse.move( handle.x + handle.width / 2, handle.y + handle.height / 2 ); await page.mouse.down(); await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, { steps: 50 }); await page.mouse.up(); await page.waitFor(3000); // success! await browser.close(); } run();
Another common slider is the puzzle like this:
const puppeteer = require('puppeteer'); const Rembrandt = require('rembrandt'); async function run() { const browser = await puppeteer.launch({ headless: false, defaultViewport: { width: 1366, height: 768 } }); const page = await browser.newPage(); let originalImage = ''; await page.setRequestInterception(true); page.on('request', request => request.continue()); page.on('response', async response => { if (response.request().resourceType() === 'image') originalImage = await response.buffer().catch(() => {}); }); await page.goto('https://monoplasty.github.io/vue-monoplasty-slide-verify/'); const sliderElement = await page.$('.slide-verify-slider'); const slider = await sliderElement.boundingBox(); const sliderHandle = await page.$('.slide-verify-slider-mask-item'); const handle = await sliderHandle.boundingBox(); let currentPosition = 0; let bestSlider = { position: 0, difference: 100 }; await page.mouse.move( handle.x + handle.width / 2, handle.y + handle.height / 2 ); await page.mouse.down(); while (currentPosition < slider.width - handle.width / 2) { await page.mouse.move( handle.x + currentPosition, handle.y + handle.height / 2 + Math.random() * 10 - 5 ); let sliderContainer = await page.$('.slide-verify'); let sliderImage = await sliderContainer.screenshot(); const rembrandt = new Rembrandt({ imageA: originalImage, imageB: sliderImage, thresholdType: Rembrandt.THRESHOLD_PERCENT }); let result = await rembrandt.compare(); let difference = result.percentageDifference * 100; if (difference < bestSlider.difference) { bestSlider.difference = difference; bestSlider.position = currentPosition; } currentPosition += 5; } await page.mouse.move( handle.x + bestSlider.position, handle.y + handle.height / 2, { steps: 10 } ); await page.mouse.up(); await page.waitFor(3000); // success! await browser.close(); } run();
Here we use a simple way to compare pictures, that is, during the sliding process, if a difference that meets the threshold is found, the sliding is considered successful.
Spider Configuration
stay Cendertron In, a special class of Slider Captcha Monkey is provided and the following parameters can be added to the incoming SpiderOption:
export interface SpiderOption { allowRedirect: boolean; depth: number; // Page Plugin monkies?: { sliderCaptcha: { sliderElementSelector: string; sliderHandleSelector: string; }; }; }
Extended reading
You can read the author's series of articles in any of the following ways, covering technical data induction, programming languages and theories, Web and big front-end, service-side development and infrastructure, cloud computing and big data, data science and artificial intelligence, product design and other fields:
- Browse online in Gitbook, each series corresponding to its own Gitbook repository.
Awesome Lists | Awesome CheatSheets | Awesome Interviews | Awesome RoadMaps | Awesome-CS-Books-Warehouse |
---|
Programming Language Theory | Java Reality | JavaScript Actual Warfare | Go Actual Warfare | Python Actual Warfare | Rust Actual Warfare |
---|