Image-Based Game Automation in Python🎮
git repo: https://github.com/StanLinCat/auto_dino
1. Project Introduction and Motivation (Introduction and Motivation)
This project practically explores how to develop a Python game automation tool using basic image recognition techniques. It can serve as an interesting tool and be applied in many scenarios.
2. Core Technology Stack (Core Technology Stack)
This project utilizes the following powerful open-source libraries, which form the foundation for image recognition and automated control:
| Technology | Brief Description |
|---|---|
| OpenCV | Open-source computer vision library for image processing and pattern recognition. |
| Python | Easy-to-learn, high-level programming language with rich libraries, ideal for beginners. |
| pyautoGUI | Library for automating mouse/keyboard control and basic screenshot/image matching. |
| NumPy | Library for efficient numerical computing, used here for image matrix operations. |
💡 Key Concept: The underlying principle of image processing involves mathematical concepts, primarily matrices (from linear algebra). Combined with statistical data analysis, it can solve basic image recognition problems and further apply to machine learning and deep learning.
3. Image Recognition Method: Template Matching (Template Matching)
The core image recognition method in this project uses the cv2.matchTemplate function from the OpenCV library.
3.1 Algorithm Principle
The main function of cv2.matchTemplate is to find similar targets in an image.
- Input: The algorithm takes two images:
image(large image, i.e., screenshot) andtemplate(small target image to search for). - Computation: The program slides the
templateover theimagecontinuously and calculates a comparison value at each position, representing the similarity of the two images in that region. - Output and Positioning: The final result is a
resultimage storing the comparison values. Then, use theminMaxLocfunction to find the maximum or minimum value in the result image to locate the found target position.
❗ Positioning Note: The point found by the function is the top-left corner of the target image. If used as a click position in game automation, add half the width and height of the
templateto the coordinates to ensure clicking the effective area.
3.2 Similarity Comparison Functions and Mathematical Formulas
OpenCV provides multiple methods for calculating mathematical comparison values. This report introduces normalized formulas after averaging. These methods ensure similarity remains unchanged when pixel brightness is multiplied by the same coefficient.
(1) Squared Difference (CV_TM_SQDIFF_NORMED)
This method calculates the squared difference and normalizes it. Smaller values indicate higher similarity.
\[R(x, y) = \frac{∑_{x', y'}(T(x', y') − I(x + x', y + y'))^2}{\sqrt{∑_{x', y'}T(x', y')^2 \cdot ∑_{x', y'}I(x + x', y + y')^2}}\](2) Normalized Correlation Coefficient (CV_TM_CCORR_NORMED)
This method calculates the correlation coefficient and normalizes it. Larger values indicate higher similarity.
\[R(x, y) = \frac{∑_{x', y'}(T(x', y') \cdot I(x + x', y + y'))}{\sqrt{∑_{x', y'}T(x', y')^2 \cdot ∑_{x', y'}I(x + x', y + y')^2}}\](3) Normalized Correlation Coefficient Removing DC Component (CV_TM_CCOEFF_NORMED)
This method also calculates the correlation coefficient but subtracts the mean during computation, effectively avoiding misjudgments due to overly large image values. The resulting correlation coefficient is bounded between -1 and 1. Larger values indicate higher similarity.
Where $T’(x’, y’)$ and $I’(x + x’, y + y’)$ represent the original matrices minus their means.
$T’(x’, y’) = T(x’, y’) − \frac{1}{w \cdot h} \sum_{x’’, y’’} T(x’’, y’’)$ $I’(x + x’, y + y’) = I(x + x’, y + y’) − \frac{1}{w \cdot h} \sum_{x’’, y’’} I(x + x’’, y + y’’)$
3.3 Limitations of Basic Functionality
Using cv2.matchTemplate alone has certain limitations: it is suitable for general 2D games, but recognition difficulty is very high for 3D games. Additionally, the template in the screenshot image cannot be rotated.
4. Game Automation Program Flow (Program Flow)
The automation flow of the game bot relies on image recognition results and uses if and else logic to determine the next mouse or keyboard control action. The flow is mainly divided into three steps:
- Prepare Game Environment: Switch the interface to the game screen and add refresh functionality to wait for the game to officially start.
- Start Game: Use OpenCV to find the coordinates of the “Start Button”, control the mouse to click it, and officially enter the game.
- Automatically Play Game: Start timing (recommend adding a timer to prevent program runaway), search for target images via image recognition, and execute preset logic based on recognition results to control keyboard actions (e.g., left click or jump).
graph TD
A([Start]) --> B(Prepare Environment);
B --> C{Find Start Button};
C -- No, Continue Waiting --> B;
C -- Yes --> D(Click to Start Game);
D --> E[Start Countdown];
D --> F[Play Game];
E --> F;
F --> G{Recognize Game Image};
G -- Recognition Result 1 --> H[Left Click or Short Jump];
G -- Recognition Result 2 --> I[Right Click or Long Jump];
H --> F;
I --> F;
F --> J([End]);
%% Light red area (Start/End/Countdown) - Acceptable in light/dark modes
style A fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff
style J fill:#ff6b6b,stroke:#333,stroke-width:2px,color:#fff
style E fill:#ff8787,stroke:#333,stroke-width:2px,color:#fff
%% Light yellow/green area (Main Flow) - Changed to softer teal series
style B fill:#4ecdc4,stroke:#333,stroke-width:2px,color:#fff
style D fill:#4ecdc4,stroke:#333,stroke-width:2px,color:#fff
style F fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff
style H fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff
style I fill:#45b7aa,stroke:#333,stroke-width:2px,color:#fff
%% Orange area (Decision Points) - Changed to softer orange
style C fill:#ffa07a,stroke:#333,stroke-width:2px,color:#fff
style G fill:#ff8c6b,stroke:#333,stroke-width:2px,color:#fff
5. Practical Results and Discussion (Practical Results and Discussion)
5.1 Practical Results in Drink or Cake Recognition Game
In testing a game involving recognizing drinks or cakes, the program scans screenshots to find and click the game start button coordinates.
- Performance Optimization: During execution, set the Region of Interest (ROI) to the middle table area, which significantly reduces computation time.
- Performance: The program runs very fast, with overall recognition time less than 0.5 seconds. Since execution speed exceeds human limits, the program can “feed” the rabbit or cat to the max within time limits, even with 8 seconds remaining at the end.
5.2 Challenges and Optimization Suggestions for Chrome Dinosaur Game
The Chrome Dinosaur Game is more difficult because game speed continuously increases over time.
- Challenges: As speed increases, the preset jump reaction position reaches its limit, causing crashes into cacti due to insufficient reaction time.
- Parameter Complexity: This requires the algorithm to consider the relationship between ROI and speed, possibly needing a global variable to increase the reaction interval over time. It also needs to account for pterodactyls, sharp descents, long jumps, or short jumps, making parameter adjustment complex.
Suggested future optimization: Adopt “Sequential Control”: Set dedicated modes for different game periods (e.g., day/night, different acceleration stages). This reduces the number of images processed per computation and makes ROI adjustments easier and more precise.

5.3 Performance Improvement Solutions
For games requiring extremely high speed, consider the following optimizations:
- Convert Python program functionality to C++ or other languages.
- Monitor computation time to more effectively find suitable parameters.
- Upgrade to faster computer hardware, considering GPU and memory issues.
6. Extended Works
Hearthstone Automation Bot
Mobile Game Automation: Shining Nikki Auto Claim Rewards, Cats & Soup Automation
Leave a comment