Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
3.7k views
in Technique[技术] by (71.8m points)

javascript - How can I capture all links in a page with Puppeteer?

trying capturing all the <a> in a page

the console.log returns undefined, but i can't understand why is this const anchors = Array.from(document.querySelectorAll(sel)); correct?

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch({
                                            headless: false,
                                            userDataDir: "C:\Users\johndoe\AppData\Local\Google\Chrome\User Data\Default"
                                        });
  const page = await browser.newPage();
  await page.setViewport({
    width: 1920,
    height: 1080,
    deviceScaleFactor: 1,
  });
  await page.goto('https://www.facebook.com/groups/632312010245152/members');
  
  //https://github.com/puppeteer/puppeteer/blob/main/examples/search.js
  let membri = await page.evaluate((sel) => { 
    const anchors = Array.from(document.querySelectorAll(sel));
    return anchors;
  }, 'a');
  console.log(membri);
})();

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
const findLinks = await page.evaluate(() =>
  Array.from(document.querySelectorAll("a")).map((info) => ({
    information: info.href.split()
  }))
);
links = [];
findLinks.forEach((link) => {
  if (link.information.length) {
    links.push(link.information);
  }
});
await console.log(links);
await page.close();
return links;

Not sure if this is the most optimized solution, but it works. If you could message me a cleaned version of this code I would highly appreciate that :)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...