IT 免費/無料/Kostenlos(gratis) 記事本分享: 文字檔有網頁開啟清單,用PYTHON程式,將網頁全數開啟,並找出特定字串，將其另存至文字檔中

2023年12月16日星期六

文字檔有網頁開啟清單,用PYTHON程式,將網頁全數開啟,並找出特定字串，將其另存至文字檔中

情境說明:

PROMPT提示訊息下法: (範例2)

D:/HyperLinkSource.txt 此檔案內含連結位址,請用PYTHON寫程式,CHROME瀏覽器,開啟該文字檔案全數連結,並按任意鍵繼續瀏覽下個頁面

import webbrowser

# 读取文本文件中的链接
file_path = 'D:/HyperLinkSource.txt'

with open(file_path, 'r') as file:
    # 逐行读取链接
    for line in file:
        # 移除行尾的换行符
        url = line.strip()

        # 使用webbrowser模块打开链接
        webbrowser.open(url)

        # 等待用户按下任意键继续
        input("Press Enter to continue...")

PROMPT提示訊息下法: (範例2)

D:\HyperLinkSource.txt 此檔案內含連結位址,請用PYTHON寫一程式,運用requests與BeautifulSoup模組,以預設瀏覽器,開啟該文字檔案提供全數連結

,並請協助各自讀取每個連結之網頁內容後，將網頁內容中有包含 詳洽本公司： 開始到 < 回訊息列表 間的字串值將字串值取出至D:\BrowsingString.txt ,

該檔如存在請刪除,重新建檔,且該檔包含該連結位址及網頁內容之特定字串，如找不到資料詳洽之特定字串,則用找不到特定字串, 處理，並請接續處理下一筆連結之瀏覽檢視作業

import os
import requests
from bs4 import BeautifulSoup

# 讀取連結位址檔案
file_path = r'D:\HyperLinkSource.txt'
with open(file_path, 'r') as file:
    links = file.readlines()

# 刪除 output_file 若存在
output_file = r'D:\BrowsingString.txt'
if os.path.exists(output_file):
    os.remove(output_file)

# 開啟瀏覽器並處理每個連結
for link in links:
    link = link.strip() # 移除連結前後的空白字符
    try:
        # 使用requests獲取網頁內容
        response = requests.get(link)
        response.raise_for_status() # 確保獲取網頁成功
        webpage_content = response.text

        # 使用BeautifulSoup解析網頁內容
        soup = BeautifulSoup(webpage_content, 'html.parser')

        # 尋找特定字串值
        start_marker = '詳洽本公司：'
        end_marker = '< 回訊息列表'
        start_index = webpage_content.find(start_marker)
        end_index = webpage_content.find(end_marker)

        # 提取特定字串值
        if start_index != -1 and end_index != -1:
            substring = webpage_content[start_index + len(start_marker):end_index].strip()
        else:
            substring = '@@ 找不到特定字串'

        # 將特定字串值寫入檔案
        with open(output_file, 'a', encoding='utf-8') as file:
            file.write(f'{link}\n')
            file.write(f'{substring}\n\n\n')
    except requests.exceptions.RequestException as e:
        print(f'無法獲取連結：{link}')
        print(f'錯誤訊息：{e}')

後記:取出的網頁資訊，仍保留許多HTML語法資訊，故如要估資料再次運用，可能此方法不是最佳方式，只是測試PYTHON程式，捉取網頁內容特定2關鍵字，將其抽取出至另1個文字檔D:\BrowsingString.txt

2023年12月16日 星期六

文字檔有網頁開啟清單,用PYTHON程式,將網頁全數開啟,並找出特定字串，將其另存至文字檔中

2023年12月16日星期六