Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
226 views
in Technique[技术] by (71.8m points)

python 3.x - How do I print the last message from a Reddit message group using Selenium

So I get the messages from this line:

<pre class="_3Gy8WZD53wWAE41lr57by3 ">Sleep</pre>

My code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

PATH = 'C:\Users\User\Desktop\chromedriver.exe'
driver = webdriver.Chrome(PATH)

driver.get('https://www.reddit.com')
time.sleep(80) # TIME TO LOGIN IN

search = driver.find_element_by_class_name('_3Gy8WZD53wWAE41lr57by3 ')

print(driver.find_element_by_xpath(".//pre").text) # *LET'S CALL THIS 'S'*

And everything works, kinda. When I print: 's' it prints out the last message from that chat.

Note that whenever someone enters a message, it will be under the variable(class): '_3Gy8WZD53wWAE41lr57by3 '

My goal is to print out the first message from the that chat.

I had to edit it twice because of some mistakes that I had made


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

I would suggest 2 changes to your code which'll save you major frustration:

  1. Avoid explicit sleep calls, instead, wait for presence of elements. This will allow your program to wait as little time as possible for the page you're trying to load.
  2. Utilize css selectors instead of xpath --> you have much finer control over accessing elements, plus, your code becomes more robust and flexible.

In terms of execution, here's how that looks:

Wait up to 80 seconds for login:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

# Get the page, now the user will need to log in
driver.get('https://www.reddit.com')

# Wait until the page is loaded, up to 80 seconds
try:
    element = WebDriverWait(driver, 80).until(
        EC.presence_of_element_located((By. CSS_SELECTOR, "pre. _3Gy8WZD53wWAE41lr57by3"))
    )
except TimeoutException:
    print("You didn't log in, shutting down program")
    driver.quit()

# continue as normal here

Utilize css selectors to find your messages:

# I personally like to always use the plural form of this function
# since, if it fails, it returns an empty list. The single form of
# this function results in an error if no results are found
# NOTE: utilize reddit's class for comments, you may need to change the css selector
all_messages = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3')

# You can now access the first and last elements from this:
first_message = all_messages[0].text
last_message = all_messages[-1].text

# Alternatively, if you are concerned about memory usage from potentially large
# lists of messages, use css selector 'nth-of-type' 
# NOTE: accessing first instance of list of the list exists allows None
# if no results are found
first_message = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3:first-of-type')
first_message = first_message[0] if first_message else None
last_message = driver.find_elements_by_css_selector('pre._3Gy8WZD53wWAE41lr57by3:last-of-type')
last_message = last_message[0] if last_message else None

I hope this provides an immediate solution but also some fundamentals how to optimize your web scraping moving forward.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...