Cendertron, Sliding Authentication Code Bypass Strategy for Dynamic Crawlers

Posted by kjharve on Wed, 31 Jul 2019 19:15:39 +0200

Cendertron, Sliding Authentication Code Bypass Strategy for Dynamic Crawlers

stay Cendertron In the Safe Dynamic Crawler series, we introduce the design of the safe crawler and the cluster building of the crawler in turn. In this article, we discuss the bypass strategies for sliding authentication codes.

The strategies and codes used in this article come from How to bypass "slider CAPTCHA" with JS and Puppeteer One article.

Bypass of sliding validation in Crawlers

Validation is one of the common anti-crawling strategies, and in many sites today we introduce sliding validation to verify the authenticity of visitors.For example, the following famous jQuery sliding plug-in:

In simulated landing, we often need to bypass such sliding validation, and dynamic crawlers based on Puppeteer facilitate it; often we need to do the following steps: move to the middle of the slider bar, press the mouse, move the mouse, release the mouse.

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: { width: 1366, height: 768 }
  });
  const page = await browser.newPage();

  await page.goto('http://kthornbloom.com/slidetosubmit/');
  await page.type('input[name="name"]', 'Puppeteer Bot');
  await page.type('input[name="email"]', 'js@automation.com');

  let sliderElement = await page.$('.slide-submit');
  let slider = await sliderElement.boundingBox();

  let sliderHandle = await page.$('.slide-submit-thumb');
  let handle = await sliderHandle.boundingBox();

  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();
  await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
    steps: 10
  });
  await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();

In the actual case, we can take the registration interface of Taobao as an example:

const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: { width: 1366, height: 768 }
  });
  const page = await browser.newPage();

  await page.evaluateOnNewDocument(() => {
    Object.defineProperty(navigator, 'webdriver', {
      get: () => false
    });
  });

  await page.goto('https://world.taobao.com/markets/all/sea/register');

  let frame = page.frames()[1];
  await frame.waitForSelector('.nc_iconfont.btn_slide');

  const sliderElement = await frame.$('.slidetounlock');
  const slider = await sliderElement.boundingBox();

  const sliderHandle = await frame.$('.nc_iconfont.btn_slide');
  const handle = await sliderHandle.boundingBox();
  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();
  await page.mouse.move(handle.x + slider.width, handle.y + handle.height / 2, {
    steps: 50
  });
  await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();

Another common slider is the puzzle like this:

const puppeteer = require('puppeteer');
const Rembrandt = require('rembrandt');

async function run() {
  const browser = await puppeteer.launch({
    headless: false,
    defaultViewport: { width: 1366, height: 768 }
  });
  const page = await browser.newPage();

  let originalImage = '';

  await page.setRequestInterception(true);
  page.on('request', request => request.continue());
  page.on('response', async response => {
    if (response.request().resourceType() === 'image')
      originalImage = await response.buffer().catch(() => {});
  });

  await page.goto('https://monoplasty.github.io/vue-monoplasty-slide-verify/');

  const sliderElement = await page.$('.slide-verify-slider');
  const slider = await sliderElement.boundingBox();

  const sliderHandle = await page.$('.slide-verify-slider-mask-item');
  const handle = await sliderHandle.boundingBox();

  let currentPosition = 0;
  let bestSlider = {
    position: 0,
    difference: 100
  };

  await page.mouse.move(
    handle.x + handle.width / 2,
    handle.y + handle.height / 2
  );
  await page.mouse.down();

  while (currentPosition < slider.width - handle.width / 2) {
    await page.mouse.move(
      handle.x + currentPosition,
      handle.y + handle.height / 2 + Math.random() * 10 - 5
    );

    let sliderContainer = await page.$('.slide-verify');
    let sliderImage = await sliderContainer.screenshot();

    const rembrandt = new Rembrandt({
      imageA: originalImage,
      imageB: sliderImage,
      thresholdType: Rembrandt.THRESHOLD_PERCENT
    });

    let result = await rembrandt.compare();
    let difference = result.percentageDifference * 100;

    if (difference < bestSlider.difference) {
      bestSlider.difference = difference;
      bestSlider.position = currentPosition;
    }

    currentPosition += 5;
  }

  await page.mouse.move(
    handle.x + bestSlider.position,
    handle.y + handle.height / 2,
    { steps: 10 }
  );
  await page.mouse.up();

  await page.waitFor(3000);

  // success!

  await browser.close();
}

run();

Here we use a simple way to compare pictures, that is, during the sliding process, if a difference that meets the threshold is found, the sliding is considered successful.

Spider Configuration

stay Cendertron In, a special class of Slider Captcha Monkey is provided and the following parameters can be added to the incoming SpiderOption:

export interface SpiderOption {
  allowRedirect: boolean;
  depth: number;
  // Page Plugin
  monkies?: {
    sliderCaptcha: {
      sliderElementSelector: string;
      sliderHandleSelector: string;
    };
  };
}

Extended reading

You can read the author's series of articles in any of the following ways, covering technical data induction, programming languages and theories, Web and big front-end, service-side development and infrastructure, cloud computing and big data, data science and artificial intelligence, product design and other fields:

  • Browse online in Gitbook, each series corresponding to its own Gitbook repository.
Awesome Lists Awesome CheatSheets Awesome Interviews Awesome RoadMaps Awesome-CS-Books-Warehouse
Programming Language Theory Java Reality JavaScript Actual Warfare Go Actual Warfare Python Actual Warfare Rust Actual Warfare
Software Engineering, Data Structure and Algorithms, Design Patterns, Software Architecture Foundation of Modern Web Development and Engineering Practice Large Front End Hybrid Development and Data Visualization Service-side Development Practice and Engineering Architecture Distributed Infrastructure Data science, artificial intelligence and in-depth learning Product Design and User Experience

Topics: node.js Programming JQuery github Vue