AirfoilOptimizer: Predicting Airfoil Self-Noise with Machine Learning

6 min read11 hours ago

Earlier this year, I worked on an AI-powered biometrics identity project powered with my team at Venture Garden Group. That experience sparked my interest in diving deeper into machine learning and deep learning. To build on that foundation, I’ve decided to embark on hands-on projects that go beyond reading and taking courses.

This article showcases one of my initial projects: predicting sound pressure levels (SPL) from the aerodynamic characteristics of airplane airfoils. I’ll walk you through the implementation to provide a clear and approachable understanding of the project.

TL;DR: the code resides on GitHub in the AirfoilOptimer repository.

Introduction

The goal of this project was to predict the sound pressure level generated by airfoil self-noise, a critical consideration in aerospace engineering for designing quieter aircraft components. I used a dataset from the UCI Machine Learning Repository, which contains features such as frequency, angle of attack, chord length, free-stream velocity, and suction-side displacement thickness.

A Brief Primer on Aerodynamics

Aerodynamics is the study of how air interacts with solid surfaces, such as aircraft wings or airfoils. It is fundamental to designing efficient aircraft and predicting flight behavior.

An airfoil is the cross-sectional shape of a wing or blade that generates lift when moving through the air. By carefully shaping the airfoil, engineers reduce drag and optimize lift, enabling efficient flight.

An important aspect of aerodynamics is airfoil self-noise, which refers to the noise generated by the interaction between airflow and the airfoil surface. This noise impacts the design of quieter aircraft and components.

Key Features

Frequency: Rate at which air vibrations or oscillations occur near the airfoil, often linked to aerodynamic noise or flutter.
Angle of Attack (AoA): Tilt of a wing or airfoil relative to the direction of the oncoming air.
Chord length: The straight-line distance from the front to the back edge of an airfoil, defining its size and aerodynamic properties.
Free-stream velocity: Speed of air moving around the airfoil crucial for determining lift, drag, and flow behavior.
Suction-side displacement thickness: Thickness of the airflow layer on the airfoil’s suction (upper) side, influencing lift and efficiency.
Scaled sound pressure: Adjusted measurement of aerodynamic noise generated by the airfoil, often used in noise reduction studies.

This project leverages machine learning to predict SPL based on these features, aiming to help engineers design quieter and more efficient airfoils.

Steps in the Process

1. Data Preprocessing

The raw data was stored in a .dat file. Using pandas and train_test_split from sklearn, I split the data into training and testing sets. The features included frequency, angle of attack, chord length, free-stream velocity, and suction-side displacement thickness, while SPL was the target.

Additionally, I performed exploratory data analysis (EDA), plotting the following trends:

Distribution of frequency for different angles of attack.
Variation of angle of attack at different chord lengths.

Figure 1: Distribution of frequency per angle of attack

Figure 2: Variation of angle of attack at different chord lengths

2. Feature Engineering

I implemented feature importance ranking using a RandomForestRegressor to identify the most impactful features. This step helped optimize the model by focusing on the most relevant data. The top three features were then used for model training.

3. Model Training

I started with a simple LinearRegression model for baseline analysis and later trained a RandomForestRegressor for better performance—robustness and ensemble learning capabilities. The model was evaluated using mean squared error (MSE) and saved for future predictions.

# train_model.py

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import pickle

# Load data and prepare features and target
data = pd.read_csv('../data/airfoil_self_noise.dat', sep='\t', header=None, names=column_names)

X = data[['Frequency', 'Angle of Attack', 'Chord Length', 'Free-stream Velocity', 'Suction Side Displacement']]
y = data['Sound Pressure Level']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model and save it
model = RandomForestRegressor(random_state=42)
model.fit(X_train, y_train)
with open('../models/trained_model.pkl', 'wb') as f:
  pickle.dump(model, f)

4. Evaluation and Prediction

The model’s performance was benchmarked using MSE on the test set. I also created a predict.py script to load the trained model and predict SPL for new data.

# predict.py

import pickle
import pandas as pd
import numpy as np

# Load the trained model
with open('../models/trained_model.pkl', 'rb') as f:
model = pickle.load(f)

# Make a prediction
new_data = np.array([[1000, 5.0, 0.3, 71.3, 0.02]]).reshape(1, -1)
new_data_df = pd.DataFrame(new_data, columns=['Frequency', 'Angle of Attack', 'Chord Length', 'Free-stream Velocity', 'Suction Side Displacement'])

predicted_spl = model.predict(new_data_df)
print(f"Predicted Sound Pressure Level: {predicted_spl[0]}")

Results and Observations

1. Linear Regression Results:

MSE: 26.47
Feature Importance:

                       Feature    Importance
  0                  Frequency    0.416388
  4  Suction Side Displacement    0.406408
  2               Chord Length    0.093257
  1            Angle of Attack    0.042112
  3       Free-stream Velocity    0.041834

The top two most important features for predicting the sound pressure level are Frequency and Suction Side Displacement, with similar importance scores (around 41.6% and 40.6% respectively). The other features, while not as significant, still contribute to the model’s predictions.

Performance:

The MSE of approximately 26.47 indicates that, while the model can make predictions, there is a significant amount of error in the predictions. This could be due to the linear model not capturing non-linear relationships in the data. The model captured basic relationships but struggled with non-linear data.

2. Random Forest Regressor Results:

MSE: 4.33
Feature Importance:

                    Feature Importance
0                 Frequency 0.416388
4 Suction Side Displacement 0.406408
2 Chord Length              0.093257
1 Angle of Attack           0.042112
3 Free-stream Velocity      0.041834

The feature importance distribution is similar to that of the linear regression model, with Frequency and Suction Side Displacement being the most influential. This suggests that these features have a strong relationship with the target variable, independent of the model type.

Performance:

The Random Forest model achieved an MSE of about 4.33, which is significantly lower than the MSE for the linear model. This suggests that the Random Forest Regressor, with its ability to model complex, non-linear relationships, is better suited for this prediction task. The model handled non-linearities well, significantly improving prediction accuracy.

Conclusion

The Random Forest Regressor outperforms the Linear Regression model in terms of prediction accuracy (as seen from the MSE comparison).
Both models agree on the importance ranking of features, which validates the initial selection based on feature importance.
The lower MSE of the Random Forest model indicates that more complex, non-linear models are beneficial for this type of predictive modeling in aerospace engineering.

These results underscore the value of selecting the right model for a problem based on the data’s complexity and the performance metrics.

Next Steps

Moving forward, I plan to explore hyperparameter tuning and incorporate additional models for benchmarking. I can do this either by extending this project or pursuing a new idea that allows me to explore both concepts.

Project GitHub repository: https://github.com/entuziaz/AirfoilOptimizer

Acknowledgments

Starting out in machine learning has been a learning curve, and I’m incredibly grateful for the resources and communities that have supported my journey:

Learning basic ML: The Datacamp Machine Learning Engineer Course provided structured guidance for building foundational skills.
Staying Inspired: Engaging in Machine Learning Lagos discussions has kept me inspired and connected with like-minded learners.

If you’re just starting or curious about machine learning, I highly recommend exploring these resources.

Let’s Connect!

Thank you for reading! If you found this article helpful or thought-provoking:

👏 Clap to show your support.
💬 Comment your thoughts, reviews, ideas, or feedback — I’d love to hear from you!
🔄 Share this article with others who might find it useful.
🐦 Reach out to me on Twitter if you’d like to discuss this project or have questions.

Your feedback and ideas mean a lot and can help improve the project further! 🙌