Quantcast
Channel: The Official Scripting Guys Forum! forum
Viewing all articles
Browse latest Browse all 15028

Looking to parse a lot of web pages with PowerShell

$
0
0

I'm trying to compile a large number of baseball stats that are on a website, unfortunately those stats aren't displayed in an easy way to get to. What I need PowerShell to do is this:

  1. take a URL as input
  2. read the entire page
  3. find a very specific instance of a specific statistic
  4. Output that statistic into Excel

Right now I have the following script lines to grab a web page and put it in a variable:

$webClient = new-object System.Net.WebClient
$webClient.Headers.Add("user-agent", "PowerShell Script")
$output = ""
$output = $webClient.DownloadString("http://ootppatp.com//game/lgreports/players/player_3064.html")

That will pull the web page in question into the variable $output. (The webpage listed there is an example of one of the many pages I need to parse). What I need to do, given that mash of HTML, is output the player's name (in this case it's "Alex McGrath") and along with that name output the stat "Career WAR" (in this case the value is half way down the page and is "41.2").

The player name seems to be moderately easy, I just grab everything between <title> and </title> and discard the words "player report for #xx"

Any ideas on how to grab the Career WAR stat?

Another example URL is: http://ootppatp.com//game/lgreports/players/player_2421.html


zarberg@gmail.com


Viewing all articles
Browse latest Browse all 15028

Trending Articles