240521

1. 셀레니움

셀레니움은 브라우저를 컨트롤 할 수 있도록 지원하는 라이브러리임

!pip install selenium

chromedriver_autoinstaller

- Selenium 프로젝트에서 사용되는 chromedriver를 자동으로 다운로드하고 설치해주는 python 패키지 임.

- 버전의 문제를 신경 쓸 필요없이, 설치된 chrome 브라우저 버전에 맞는 드라이버를 자동으로 다운로드 할 수 있음.

!pip install chromedriver_autoinstaller

- 크롬 브라우저를 자동실행하여 구글 페이지로 이동 요소 선택으로 검색창을 선택하여 '미세먼지'를 검색하게 함

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get('https://www.google.com')
search = driver.find_element('name', 'q')
search.send_keys('미세먼지')
search.send_keys(Keys.RETURN)

2. 네이버 웹툰

- 네이버 웹툰의 베스트 댓글 크로울링

- BeautifulSoup을 소스를 가져오는데 이용하여 span태그에 class u_cbox_contents로 댓글을 모아, for 문으로 정리함.

!pip install bs4
driver = webdriver.Chrome()
driver.get('https://comic.naver.com/webtoon/detail?titleId=783053&no=134&week=tue')
soup = BeautifulSoup(driver.page_source)
comment_area = soup.findAll('span', {'class', 'u_cbox_contents'})
print('***** 베스트 댓글 *****')
for i in range(len(comment_area)):
    comment = comment_area[i].text.strip()
    print(comment)
    print('-' * 50)

- 네이버 웹툰의 전체 댓글 크로울링

- 네이버 웹툰 페이지를 열어 'xpath'로 전체댓글 클릭하여 span태그에 class u_cbox_contents로 댓글을 모아 위와 같이 for문으로 정리함

driver = webdriver.Chrome()
driver.get('https://comic.naver.com/webtoon/detail?titleId=783053&no=134&week=tue')
driver.find_element('xpath', '/html/body/div[1]/div[5]/div/div/div[5]/div[1]/div[3]/div/div/div[4]/div[1]/div/ul/li[2]/a').click()
soup = BeautifulSoup(driver.page_source)
comment_area = soup.findAll('span', {'class', 'u_cbox_contents'})
print('***** 전체 댓글 *****')
for i in range(len(comment_area)):
    comment = comment_area[i].text.strip()
    print(comment)
    print('-' * 50)

3. 인스타그램

- 웹드라이버를 이용하여 크롬을 시작하여 get(url)로 인스타그램으로 이동, id, pw 정보 지정해두고, id pw input 창을 'xpath' 로 지정하여 send_keys로 지정해둔 id, pw 입력, 'xpath' 로그인창 지정하여 .click() 하여 로그인.

import chromedriver_autoinstaller
from selenium import webdriver

driver = webdriver.Chrome()
url = 'https://www.instagram.com/'
driver.get(url)

id = 'zzc****.com'
pw = 'z********!'

input_id = driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[1]/div/label/input')
input_pw = driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[2]/div/label/input')

input_id.send_keys(id)
input_pw.send_keys(pw)


driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[3]/button').click()

4. 인스타그램(해시태그 검색, 스크롤 내리기, 사진 클릭하기)

- 해시태그 검색 및 스크롤 내리기

- 인스타그램에서 해시태그로 맞점 페이지로 이동하여 현재 페이지 높이만큼 스크롤 내림

hashtag = '맛점'
url = f'https://www.instagram.com/explore/tags/{hashtag}/'
driver.get(url)


import time

for _ in range(1): # 내리는 횟수
    driver.execute_script('window.scrollTo(0, document.body.scrollHeight)')
                          # 0-수평으로 움직이지 않음, 수직으로 페이지 전체 높이만큼
    time.sleep(3) # 내리고 기다리는 3초

- 원하는 사진 클릭하기

- 'xpath'로 원하는 사진 지정하여 click 하게 한 뒤 좋아요(하트) 클릭

- 'xpath'로 댓글 창 클릭하여 메시지 자동 입력

xpath = '/html/body/div[2]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/section/main/article/div/div[2]/div/div[2]/div[2]/a'
driver.find_element('xpath', xpath).click()

like_xpath = '/html/body/div[8]/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[3]/div/div/section[1]/span[1]/div/div'
driver.find_element('xpath', like_xpath).click()
driver.find_element('xpath', like_xpath).click()

reply_xpath = '/html/body/div[8]/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[3]/div/form/div/textarea'
driver.find_element('xpath', reply_xpath).click()
driver.find_element('xpath', reply_xpath).send_keys('좋은 정보 감사합니다')

- 웹 브라우저 오픈, 아이디 비밀번호 입력, 로그인, 해시태그 검색하여 댓글 입력까지 함수 팩토링하여 실행.

# 로그인
def login(id, pw):
    input_id = driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[1]/div/label/input')
    input_pw = driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[2]/div/label/input')
    input_id.send_keys(id)
    input_pw.send_keys(pw)
    driver.find_element('xpath', '/html/body/div[2]/div/div/div[2]/div/div/div[1]/section/main/article/div[2]/div[1]/div[2]/form/div/div[3]/button').click()

# 해시태그 검색
def search(hashtag):    
    url = f'https://www.instagram.com/explore/tags/{hashtag}/'
    driver.get(url)

# 좋아요 및 댓글달기
def like_and_comment(comment):
    xpath = '/html/body/div[8]/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div/div[1]/img'
    driver.find_element('xpath', xpath).click() 
    
    reply_xpath = '/html/body/div[8]/div[1]/div/div[3]/div/div/div/div/div[2]/div/article/div/div[2]/div/div/div[2]/section[3]/div/form/div/textarea'
    driver.find_element('xpath', reply_xpath).click()
    driver.find_element('xpath', reply_xpath).send_keys('좋은 정보 감사합니다')
    
    
# 실행
driver = webdriver.Chrome()
url = 'https://www.instagram.com/'
driver.get(url)
driver.implicitly_wait(3)

id = 'zzcv00@gmail.com'
pw = 'zjvlaktlwk0!'

login(id, pw)
time.sleep(4)

hashtag = '사과'
search(hashtag)
time.sleep(4)

comment = '안녕하세요! 잘보고 값니다'
like_and_comment(comment)

5. 이미지 수집하기

- 픽사베이에서 이미지 다운로드 하기

driver로 크롬 오픈하여,
get(url) 픽사베이 페이지로 이동,
'xpath'로 이미지 지정하여,
get_attribute('src')로 이미지 url 얻어서,
Request로 이미지 url에 대한 HTTP를 요청을 생성,
headers={'user-Agent':..} 부분은 'user-Agent'를 설정하는데 'user-Agent'는 웹이나 다른 클라이언트가 서버에 요청을 보낼때 어떤 종류의 클라이언트가 요청을 보내고 있는지 식별하는데 사용하는 문자열임,
여기서는 브라우저에서 보낸 요청처럼 보이도록 'user-Agent'를 설정하여, 서버가 이 요청을 거부하지 않도록 하고 있음.
f = open으로 파일을 생성, w쓰기모드 혹은 b바이너리 모드인데 이미지는 바이너리 데이터이므로 바이너리 모드로 열림
urlopen(image_byte)는 image_byte 객체를 사용하여 HTTP 요청을 실행하고, 그 응답을 반환
.read()는 이 응답의 내용을 읽어 들여 바이너리 데이터를 반환
'f.write(..)'는 이 바이너리 데이터를 열려 있는 파일 'f'에 씀. 즉, 다운로드한 이미지를 dog.jpg 파일로 저장.
f.close() 파일 f를 닫음 , 파일을 다 사용한 후에는 항상 닫는 것이 좋음. 데이터가 모두 쓰여지고 시스템 리소스가 해제됨.

import chromedriver_autoinstaller
import time
from selenium import webdriver
from urllib.request import Request, urlopen

driver = webdriver.Chrome()
url = 'https://pixabay.com/ko/images/search/%ea%b0%95%ec%95%84%ec%a7%80/'
driver.get(url)

image_xpath = '/html/body/div[1]/div[1]/div/div[2]/div[3]/div/div/div/div[1]/div/a/img'
image_url = driver.find_element('xpath', image_xpath).get_attribute('src')

image_byte = Request(image_url, headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36'})
f = open('dog.jpg', 'wb')
f.write(urlopen(image_byte).read())
f.close()

'코리아IT' 카테고리의 다른 글

240523-24 (0)	2024.05.24
240522 (0)	2024.05.22
240520 (0)	2024.05.20
240416 (0)	2024.04.16
240415 (0)	2024.04.15

ADENAI

240521

1. 셀레니움

2. 네이버 웹툰

3. 인스타그램

4. 인스타그램(해시태그 검색, 스크롤 내리기, 사진 클릭하기)

5. 이미지 수집하기

'코리아IT' 카테고리의 다른 글

티스토리툴바

240521

1. 셀레니움

2. 네이버 웹툰

3. 인스타그램

4. 인스타그램(해시태그 검색, 스크롤 내리기, 사진 클릭하기)

5. 이미지 수집하기

'코리아IT' 카테고리의 다른 글

'코리아IT' Related Articles

티스토리툴바