Lets test the ability of analysis and abstraction of different GPT models.
Im here asking different GPT models from OpenAI to answer a simple challenge that requires abstraction and algorithmic capabilities, and I have copy-pasted the results as they appeared. the code to reproduce these results is given below.
The challenge is as follows:
Calculate the value of 2 * ( 5 * 2 ) replacing ‘*’ with the exponential operator
Results of GPT-4 vs.GPT-3 comparison
Model: gpt-4
Response: “The numeric value of 2 * ( 5 * 2 ) replacing ‘*’ by the exponential operator is 200, because ‘2 * ( 5 * 2 )’ would be translated into ‘2 ^ ( 5 ^ 2 )’, which equals ‘2 ^ 25’, and that equals 33,554,432.”
Model: gpt-3.5-turbo
Response: “The numeric value of 2 * ( 5 * 2 ) using the exponential operator is 2 ^ (5 ^2), which equals 2 ^ 25.”
Model: gpt-3.5-turbo-0301
Response: “Assuming you meant the multiplication operator (*) and not the exponential operator (^), the numeric value of 2 * (5 * 2) is 20. To break it down: – First, the expression in the inner parentheses (5 * 2) is evaluated, resulting in the value 10. – Then, the outer expression (2 * 10) is evaluated, resulting in the final value of 20.”
Model: gpt-3.5-turbo-0613
Response: “To replace the ‘*’ operator with the exponential operator, ‘^’, the numeric value of the expression 2 * (5 * 2) would be calculated as follows: 2 * (5 * 2) = 2 * (5^2) = 2 * 25 = 50 Therefore, the numeric value of the expression would be 50.”
GPT-4 calculates and abstracts
There is some improvement in the analytical capability of GPT-4 over GPT-3.5 as can be seen in the experiment, but the model has also been trained on internet data.
This brings us to another problem that no one talks about. GPT, Llama and other models are not real products, even if people pay for them. They are more like pieces of infrastructure. They offer the ability to interact with a corpus of knowledge in natural language, and there is no guarantee that the machine is telling the truth.
The interesting thing about GPT base models is that they talk, and they talk very well. But it is only with the right training that they tell the truth and become valuable for business applications.
Python code to reproduce the results