把程式到別處別台 NB 執行 , 結果也差不多 , 才知要改寫 QT 語法 ,
經由搜尋後找到本篇文章 , 前後花了二週時間由來#1讀起 , 感謝樓主提供學習平台 ,
目前由內容修改原本使用的網頁 , 網址 https://mops.twse.com.tw/server-java/t164sb01?step=1&CO_ID=1723&SYEAR=2019&SSEASON=1&REPORT_ID=C
也真的如願抓出其中想要的報表 , 測試 table(0) , table(1) , table(2) ,
只有 table(1) 及 table(2) , 可以正確顯示 ,
而 table(0) , 都出現在 TempArray(i, j) = myTable(i).Cells(j).innerText 錯誤 ,
請問 table(0)問題應如何修正 ? 謝謝
程式 :
Sub Test2()
Dim Url As String, HTMLsourcecode As Object
Set HTMLsourcecode = CreateObject("htmlfile")
Application.ScreenUpdating = False
Url = "https://mops.twse.com.tw/server-java/t164sb01?step=1&CO_ID=1723&SYEAR=2019&SSEASON=1&REPORT_ID=C"
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", Url, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Mobile Safari/537.36"
.send
.waitForResponse timeout '?????U??
'?yX??~?? , ?????`?h???θ`??????
'HTMLsourcecode.body[removed] = .responseText '???|?X?{????yX
HTMLsourcecode.body[removed] = ConvertRaw(.responseBody) '?|?X?{???yX
'Debug.Print HTMLsourcecode.body[removed]
'?u??table(0 , 1 , 2 ) , ?n?P?B????????
Set myTable = HTMLsourcecode.all.tags("table")(0).Rows
ReDim TempArray(myTable.Length - 1, myTable(0).Cells.Length - 1)
For i = 0 To myTable.Length - 1
For j = 0 To myTable(i).Cells.Length - 1
TempArray(i, j) = myTable(i).Cells(j).innerText
Next j
Next i
Sheets("temp").Activate
If Err Then
Err.Clear
Sheets.Add(after:=ActiveSheet).Name = "temp"
Else
Sheets("temp").Cells.Clear
End If
Sheets("temp").Range(Cells(1, 1), Cells(myTable.Length, myTable(0).Cells.Length)) = TempArray()
End With
Set HTMLsourcecode = Nothing
Set myTable = Nothing
End Sub
oliwa wrote:
而 table(0) , 都出現在 TempArray(i, j) = myTable(i).Cells(j).innerText 錯誤 ,
請問 table(0)問題應如何修正 ? 謝謝 ...(恕刪)
因為網頁表格有合併的關係,加上您用陣列的方式加速處理
請參考101樓的說明、處理方式
也可改用逐格寫入
Cells(i + 1, j + 1) = myTable(i).Cells(j).innerText
或改用剪貼薄方式
相關範例都有,請回頭找找
請試著了解一下,這有什麼不同
debug.print myTable(0).Cells.Length
debug.print myTable(1).Cells.Length
1.網址 : http://money.finance.sina.com.cn/corp/go.php/vFD_ProfitStatement/stockid/601066/ctrl/part/displaytype/4.phtml
2.程式 :
Sub Test()
Dim Url As String, HTMLsourcecode As Object
Set HTMLsourcecode = CreateObject("htmlfile")
Application.ScreenUpdating = False
Sheets("temp").Activate
If Err Then
Err.Clear
Sheets.Add(after:=ActiveSheet).Name = "temp"
Else
Sheets("temp").Cells.Clear
End If
Ur1 = "http://money.finance.sina.com.cn/corp/go.php/vFD_ProfitStatement/stockid/601066/ctrl/part/displaytype/4.phtml"
'Url = "http://money.finance.sina.com.cn/corp/go.php/vFD_BalanceSheet/stockid/601066/ctrl/part/displaytype/4.phtml"
'Url = "http://money.finance.sina.com.cn/corp/go.php/vFD_CashFlow/stockid/601066/ctrl/part/displaytype/4.phtml"
With CreateObject("WinHttp.WinHttpRequest.5.1")
.Open "GET", Url, False
.setRequestHeader "Cache-Control", "no-cache"
.setRequestHeader "Pragma", "no-cache"
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.setRequestHeader "User-Agent", "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Mobile Safari/537.36"
.send
.waitForResponse timeout '?????U??
'?yX??~?? , ?????`?h???θ`??????
'HTMLsourcecode.body[removed] = .responseText '???|?X?{????yX
HTMLsourcecode.body[removed] = ConvertRaw(.responseBody) '?|?X?{???yX
'Debug.Print HTMLsourcecode.body[removed]
'?u??table(0 , 1 , 2 ) , ?n?P?B????????
On Error Resume Next
Set myTable = HTMLsourcecode.all.tags("table")(0).Rows '?P?B??? table(?)
'Array??g????t , ????????????@?? , ??n?t?~?B?z
ReDim TempArray(myTable.Length - 1, myTable(0).Cells.Length - 1) '?P?B??? table(?)
For i = 0 To myTable.Length - 1
For j = 0 To myTable(i).Cells.Length - 1
TempArray(i, j) = myTable(i).Cells(j).innerText
Next j
Next i
Sheets("temp").Range(Cells(1, 1), Cells(myTable.Length, myTable(0).Cells.Length)) = TempArray() '?P?B??? table(?)
End With
Set HTMLsourcecode = Nothing
Set myTable = Nothing
End Sub
3.google網站解析 , 確實是 GET , 且沒有參數
rainbowsperm wrote:
我將您抓取goodinfo網站資料的用loop變成抓取每檔個股的資料
但一下子就被goodinfo擋IP, 不知您是否有辦法解決?...(恕刪)
613樓??那是設計成單筆查詢用的範例
改寫成用迴圈大量下載,需額外加入延遲
避免擋ip,最簡單的就是,延長每次查詢的時間間隔
範例一、用sleep lib
#If VBA7 Then
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
#Else
Public Declare Sub Sleep Lib "kernel32" (ByVal Milliseconds As Long)
#End If
Sub test1()
For i = 1 To 10
Sleep 3000 '等3秒
'可加入亂數,讓時間不固定,模擬人工下載
Debug.Print i
Next
End Sub
範例二、用Application.Wait
Sub test2()
For i = 1 To 10
Debug.Print i
Application.Wait (Now + TimeValue("0:00:02")) '等2秒
'可加入亂數,讓時間不固定,模擬人工下載
Next
End Sub
範例三、348樓,另外寫副程式延遲
範例四、例如175樓、219樓…其它很多樓
程式中限制連續查詢的次數 or 無法查詢時
改用 Application.OnTime Now 排程,延後執行
以上方式各有優缺點,挑一個喜歡的用就行
難一點的,使用大量ip、proxy切換查詢,偽造headers…等等
不過,這種如何跳過擋ip用暴力下載的方式,我不打算寫範例
以前文章中我只有稍微提到(您可以回頭找一下文章)
因為網站不希望有人大量下載,才會擋ip,多少要尊重一下
或是改找一個不怕有人大量查詢、大量下載、不擋ip的網站,像是finance.yahoo.com
不然改用ie object也行,查詢慢(1秒~數10秒~有時會數分鐘),不會有擋iP的問題
(xmlhttp 每筆查詢通常在0.0x~0.x秒)
關閉廣告