You must re-sample the data at a new rate. To create each new data value, you must interpolate a value between the two existing data before and after.
Example... You have two seconds of audio. At 44.1 kHz this is 88,200 data.
You want to raise the pitch 2 percent. You will end up with 86436 data (or 98 percent of the amount of your initial data).
Your new data #1 is the same value as the initial data #1.
To obtain new data value #2, take the value which is 98% of the way from old data #1 to data #2.
To obtain new data value #3, take the value which is 96% of the way from old #2 to #3.
Each step the proportion is 2 points less.
After you do this 97 times you are ready for new data point #98. Then you take old data #100.
Etc.
To lower the pitch means you will gain data points. This is a trickier operation.
If things get too unwieldy, then it might be simpler to keep track of how much percent of the way a new data point is through the file. Multiply this by the number of old data points. This will tell you which of the old data points, before and after, to look at. You'll need to keep track of fractional distances...
and now I realize this is getting beyond my ability to explain the steps involved.
Hope this helps.