Quantcast
Channel: The Official Scripting Guys Forum! forum
Viewing all articles
Browse latest Browse all 15028

PS to capture data from file to use in rename process

$
0
0

I'm attempting to take a group of specifically named PDF files, capture specific pieces of data within the file (using a control file of the phrase items I need), and use these data pieces to create a new file name. I'm having some issues with properly catching the data elements within the PDF. Below is the code I've gleaned from the forums to get me as far as I am. If you can assist, I would appreciate it.

To identify the files which match the control search criteria (this works well):

Set-StrictMode -Version latest
Set-ExecutionPolicy unrestricted -Scope process

$tdydate    = get-date -format d
$path     = Split-Path -parent $MyInvocation.MyCommand.Definition
#$path     = "\\phpds\phpapda\mvh him roi"
$files    = Get-Childitem $path DB_16877_P_*.PDF -Recurse | Where-Object { !($_.psiscontainer)}
$controls = Get-Content ($path + "\control_file.DB_16877_P")
$output   = $path + "\output_DB_16877_P.log"


Function getStringMatch
{
  # Loop through all DB_16877_P_*.PDF files in the $path directory
  Foreach ($file In $files)
  {
    # Loop through the search strings in the control file
    ForEach ($control In $controls)
    {
      $result = Get-Content $file.FullName | Select-String $control -quiet -casesensitive
      If ($result -eq $True)
      {
        $match = $file.FullName
        $filedt = $file.GetCreationDate
        "Match on string :  $control  in file :  $match   date : $filedt " | Out-File $output -Append
      }
    }
  }
}

getStringMatch

I used a separate PS (for testing only) for the second step to determine the data elements and output it for review:

Set-StrictMode -Version latest
Set-ExecutionPolicy unrestricted -Scope process
#get-executionpolicy -list

$tdydate    = get-date -format d
$path     = Split-Path -parent $MyInvocation.MyCommand.Definition
#$path     = "\\phpds\phpapda\mvh him roi"
$files    = Get-Childitem $path DB_16877_P_*.PDF -Recurse | Where-Object { !($_.psiscontainer)}
$controls = Get-Content ($path + "\control_file.DB_16877_P")
$output   = $path + "\output_DB_16877_P.csv"


# Create an array for results
  $results = @()

  # Loop through the project directory
  Foreach ($file In $files) 
  { 
    # load the content once
    $content = Get-Content $file.FullName 

    # Check all keywords
    ForEach ($control In $controls) 
    { 
      # find the line containing the control string
      $result = $content | Select-String $control -casesensitive 
      If ($result) 
      { 
        # tidy up the results and add to the array
        $line = $result.Line -split ":"
        $results += New-Object PSObject -Property @{
            FileName = $file.FullName 
            Control = $line[0].Trim()
            Value = $line[1].Trim()
        }
      } 
    } 
  } 

  
  # return the results
  $results

  #Output Results array to CSV
  $results | Export-Csv $output -NoTypeInformation

Results are  (notice the garbage data in the Value column, and please ignore the poor column formatting):

Control                                                FileName                                                Value

BT /F1 220 Tf 0 g 1800 -7218 Td(MRN     X:\DB_16877_P_40.PDF                   <692>) Tj ET Q                        

BT /F1 220 Tf 0 g 1800 -6712 Td(MRN    X:\DB_16877_P_41.PDF                   <281>) Tj ET Q


All I really want for this value column is the actual MRN (the example shows a number surrounded by < >, but a correction to the file creation is going to take care of that issue).  I need to get this value put into storage, then use it to rename the file to the new naming structure.


Viewing all articles
Browse latest Browse all 15028

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>